Correlation Analysis between SNP and Expression Arrays in Gliomas Identify Potentially Relevant Targets Genes

Yuri Kotliarov; Svetlana Kotliarova; Nurdina Charong; Aiguo Li; Jennifer Walling; Elisa Aquilanti; Susie Ahn; Mary Ellen Steed; Qin Su; Angela Center; Jean C Zenklusen; Howard A Fine

doi:10.1158/0008-5472.CAN-08-2496

. Author manuscript; available in PMC: 2010 Feb 15.

Published in final edited form as: Cancer Res. 2009 Feb 3;69(4):1596–1603. doi: 10.1158/0008-5472.CAN-08-2496

Correlation Analysis between SNP and Expression Arrays in Gliomas Identify Potentially Relevant Targets Genes^¹

Yuri Kotliarov ¹, Svetlana Kotliarova ¹, Nurdina Charong ¹, Aiguo Li ¹, Jennifer Walling ¹, Elisa Aquilanti ¹, Susie Ahn ¹, Mary Ellen Steed ¹, Qin Su ¹, Angela Center ¹, Jean C Zenklusen ^1,^*, Howard A Fine ^1,^*,²

PMCID: PMC2644341 NIHMSID: NIHMS83738 PMID: 19190341

Abstract

Primary brain tumors are a major cause of cancer mortality in the United States. Therapy for gliomas, the most common type of primary brain tumors, remains suboptimal. The development of improved therapeutics will require greater knowledge of the biology of gliomas at both the genomic and transcriptional levels. We have previously reported whole genome profiling of chromosome copy number alterations (CNA) in gliomas, and now present our findings on how those changes may affect transcription of genes that may be involved in tumor induction and progression. By calculating correlation values of mRNA expression vs. DNA copy number average in a moving window around a given RNA probeset, biologically relevant information can be gained that is obscured by the analysis of a single data type. Correlation coefficients ranged from −0.6 to 0.7; highly significant when compared to previously studies. Most correlated genes are located on chromosomes 1, 7, 9, 10, 13, 14, 19, 20 and 22, chromosomes known to have genomic alterations in gliomas. Additionally, we were able to identify CNAs whose gene expression correlation suggests possible epigenetic regulation. This analysis revealed a number of interesting candidates such as CXCL12, PTER, LRRN6C, among others. The results have been verified using real-time PCR and methylation sequencing assays. These data will further help differentiate genes involved in the induction and/or maintenance of the tumorigenic process from those that are mere passenger mutations, thereby enriching for a population of potentially new therapeutic molecular targets.

INTRODUCTION

Brain tumors are a major cause of cancer mortality in children and adults in the United States (1). Current treatment for gliomas, the most common type of primary brain tumors, remains suboptimal and the promise for improved therapies rests largely on a better understanding of the underlying biology and genetics of these tumors. The majority of genomic alterations described in gliomas (2, 3) have been found in the more common epithelial tumors (EGFR, CDKN2A, TP53, RB1, PTEN). Targeted therapies (e.g., erlotinib) directed to these ubiquitous cancer-associated targets, such as the epithelial growth factor receptor (EGFR), have unfortunately met with limited success (4, 5) underlining the need for the identification of glioma-specific genomic alterations. In an attempt to solve this problem, we have created a National Cancer Institute-funded national, multi-institutional project called the Glioma Molecular Diagnostic Initiative (GMDI), the goal of which is to collect and molecularly characterize (at both RNA, DNA and protein level) over 1000 gliomas with the associated corollary clinical data. As part of the first phase of GMDI, we analyzed 178 tumors (6) for genomic alterations using Affymetrix 100K SNP arrays (7). This assay relies on the detection of the specific genotype by hybridization of enzyme-modified, PCR-amplified genomic DNA to forty probes specific for a single SNP. By using both allelic calls and signal intensity, such arrays can be successfully employed for genomic survey studies in non-familial diseases such as cancer. Although our SNP-based genomic survey gave us a significantly higher resolution survey of genomic alterations found in gliomas compared to previously performed traditional CGH-based studies, the SNP arrays still only allow the identification of relatively large areas of alteration (>100 Kbp). The list of genes found in these regions can be large, even when restricting the genes of interest to those that are closer to the peak of the alteration histogram. Not only are genes important for the tumorigenic process (“driver” genes) found within these large SNP array-determined altered chromosomal regions, but neighboring genes that have no role in tumorigenesis (“passenger genes”) may also be found. Thus, the determination of which of the many genes identified through such analysis are the relevant driver genes is a significant bioinformatic challenge.

Lists of genes potentially important for tumorigenesis have also been generated through mRNA expression profiling. Although potentially useful as a biomarker for the current transcriptional state of a given tumor, the mRNA expression profile for most genes merely reflects normal cellular processes rather than those intrinsic to the initiation and/or propagation of the tumorigenic state. The statistical probability of any one of these genes with altered expression being important in tumorigenesis is relatively low. Thus, we have attempted to address the limitations of identifying tumor-relevant genes using a single platform analysis by combining the SNP and gene expression analyses in order find genes that behave concordantly between the two different platforms (i.e. the mRNA of an amplified gene is also overexpressed). This approach, for instance, allows one to quickly disregard genes that have copy number alteration but whose mRNA expression profile is not significantly different than normal tissue. Therefore, in order to enrich for genetic alterations that may have true biological relevance, we have correlated the levels of mRNA expression of each gene to the calculated copy numbers of the genomic area associated with the gene. We believe that this approach will increase the likelihood of identifying genes that might be the primary target of a specific tumor-associated genomic event, and thus a good candidate for further study.

Additionally, we also identified some genes whose correlation with the genomic copy numbers may be partially governed by epigenetic modulation. To this end, we specifically looked for reporters that had unusually low expression compared to non-tumor samples, and were located in the areas with LOH secondary to either a deletion (1n) or a recombination event (2n) to remove the remaining active allele. Several of these identified candidate genes were validated with real-time PCR and methylation sequencing assays. We believe that this refined list of structurally and functionally altered genomic elements will provide a strong basis for the discovery of genes involved in the induction, propagation and/or maintenance of gliomas and may expand the list of potential glioma-specific therapeutic targets.

MATERIALS AND METHODS

Samples

One hundred and forty gliomas from the Hermelin Brain Tumor Center, Depts. Neurology & Neurosurgery at the Henry Ford Hospital were analyzed in this study. The samples were provided as snap frozen sections of areas immediately adjacent to the region used for the histopathological diagnosis. This set included 67 glioblastomas, 27 astrocytomas, 38 oligodendrogliomas and 8 oligoastrocytomas (mixed) tumors. Fifteen non-tumor samples (temporal lobe resection of epileptic patients) were analyzed concurrently to provide a base line for the mRNA expression values.

DNA Extraction and Hybridization

DNA from the assayed samples was extracted and hybridized to the Affymetrix 100K SNP chips (8) using the methodology previously reported. (6)

RNA Extraction and Hybridization

Approximately 50–100 mg of tissue from each tumor was used for total RNA using the Trizol reagent (Invitrogen, Carlsbad, CA), following the manufacturer’s instructions. The quality of RNA obtained was verified with the Bioanalyzer System (9) (Agilent Technologies, Palo Alto, CA) using the RNA Pico Chips. Six micrograms of RNA were processed for hybridization on the Genechip® Human Genome U133 Plus 2.0 Expression arrays (10) (Affymetrix, Inc., Santa Clara, CA), which contains over 54,000 probesets associated with over 47,000 transcripts and variants, including 38,500 well-characterized human genes. The processing was done according to manufacturer’s recommendations. After hybridization, the chips were processed using fluidics station 450, high-resolution microarray scanner 3000 and GCOS workstation version 1.3. Expression levels were determined by GCOS version 1.3, which employs the MAS5 algorithm. Expression values were scaled to a mean of 500, and only samples with a scaling factor lower than 5, present call rates > 35%, and a GAPDH 3'/5' ratio less then 3, were accepted for the analysis (11). The gene expression CEL files were normalized using dChip (12) invariant method and PM-MM difference model was used to obtain the expression values (log₂) for further analysis.

Copy Number Determination

Chromosome copy numbers were estimated as previously described (6).

Expression/Copy Number Correlation Analysis

To correlate RNA expression with DNA copy number, each HG-U133_Plus_2 probeset was mapped against all the SNPs located in a window of 1 Mbp around the center of that probeset. To improve reliability, we filtered genomic alignments of mRNA probesets using the following criteria: A) only probesets of grade A and B (according to Affymetrix annotation), which fully or partially overlap with mRNA transcripts, were used; B) we selected only probesets associated with a RefSeq gene and with genomic alignments mapped to the same single gene; C) if there were multiple alignments of the same probeset, they were checked to assess if they overlapped at least partially; and D) sequence identity of genomic alignments had to exceed 80%. Using these parameters, 21,611 probesets (representing 17,087 genes) were selected and used for further analysis. Additionally, probes whose variance of expression values (without outliers) for a particular probeset equals zero were excluded from analysis since the correlation coefficient cannot be estimated. The average copy number for those SNPs was calculated from the original CNAT copy numbers, and correlation coefficients and significance (p-value) between that average and the mRNA expression value for the associated probeset were calculated using locally developed Matlab scripts. Positions of both expression and SNP probesets were annotated using Affymetrix NetAffx annotation service ³ using data of May 2004 (NCBI build 35).

Statistical analysis

Correlation coefficients and significance p-values between averaged copy number and the mRNA expression value for an associated probeset with prior removal of outlying samples were calculated using Matlab. Samples were considered outliers for a particular probeset if its expression or copy number values differed from the mean by more than three standard deviations. Outliers were removed to reduce the detection of false positives, since a significant outlier can produce a high correlation coefficient through its distance to the bulk of the samples, creating an artificial correlation line not seen if the outlying sample is removed.

In order to select probesets with expression values in tumors demonstrating LOH in the region lower than in the non-tumor samples, standard two-sided t-tests have been performed. Probesets with p-values of less than 0.05 and fold changes higher than twice the median expression of non-tumor samples were used.

Loss of heterozygosity analysis

Loss of heterozygosity (LOH) scores for each SNP were calculated as previously described (6).To detect LOH for each expression probeset, LOH scores of all SNPs in a window of 1 Mbp around the center of probeset were averaged as described above for copy number. Samples were considered to have LOH in particular area, if corresponding mean LOH score was greater then 6. Probesets with more then 10% of LOH samples were selected for the analysis, to concentrate on highly relevant genes.

CpG island Estimation

Genomic coordinates of all RefSeq genes and all CpG islands (larger then 200 bp) were extracted from UCSC Genome Browser (2004). The percentage of 5’ CpG-island genes were calculated for randomly selected gene sample of the same size as a number of genes in the tested gene set (271 unique genes). The process was repeated 10,000 times. Z-test p-value was calculated.

Real time RT-PCR

To confirm that the expression in samples/probesets showing very low levels, mRNA expression levels were quantified by real-time RT PCR performed on an ABI Prism 7900 sequence detection system (Applied Biosystems, Foster City, CA) as previously described (13).

Methylation Specific Sequencing

Promoter methylation in the 5’ CpG island of two candidate genes (CXCL12 and PTER) was performed in six tumor samples (HF0184, HF0627, HF0992, HF1057, HF1090 and HF01458) demonstrating extremely low mRNA expression despite their gene dosage. One sample with copy number adequate expression (HF0329) was used as control. Bisulfite conversion of DNA samples was performed using Active Motif MethylDetector kit (Active Motif, Carlsbad, CA), following manufacturer’s instructions. For amplification of the CpG islands, the following primers where used: PTER: PTERmeth300F1: GAATTGTGGGTTTAGTAGGAAGAGT, and PTERmeth300B5: CAAAACCAAACCCTAAATCTAAAAA for the first PCR and PTERmeth300F1 and and PTERmeth300B1: TAAAAAAAACTTAACACAACTACCC; CXCL12: CXCL12methF1: TTGTTTTTGTGATAGGGTTTTATTG, and CXCL12methB1: ACTTTTCATTAATTCTCATTCAATTCC. PCR products where then subcloned using the TOPO TA Cloning Kit (Invitrogen, Carlsbad, CA) following manufacturer’s instructions. PCR verified 10–16 clones where then sequenced using the vectors T3 and T7 priming sites.

RESULTS

mRNA probeset mapping and Copy Number Calculation

In order to correlate mRNA expression levels with genomic copy numbers, we first mapped every probeset on the Affymetrix Human U133 Plus 2 chip to the SNPs probes. The mapping strategy is represented on Figure 1A.

A. Schema of mapping expression and SNP probesets. Red rectangles represent windows around an expression probeset to calculate mean copy number for included SNPs. It is possible that some probesets have no SNPs in a window. B. Percentage of probesets with no SNPs in a window vs. window size for both Xba and Hind arrays. C. Distribution of expression probesets by number of SNPs in a window of 1 Mbp.

Once the positions of the expression probesets were curated (as described in the Materials & Methods section), we calculated the associated genomic copy number by averaging individual SNPs copy numbers in a window of a given size. Since, depending on window size, a number of probesets may have no single SNP in a window (e.g., third probeset in Fig 1A), making them non-informative for the correlation analysis; we analyzed which window length would give us optimal coverage while removing most of the noise. Figure 1B illustrates this process for both SNP chip types. A window of 1 Mbp was selected as an acceptable compromise between resolution and number of removed probesets (about 900 probesets, ~ 3%), allowing us to investigate most of the selected probesets without averaging through such a large genomic span that would render the platform resolution meaningless. To confirm the validity of this window, we analyzed the effect in probeset selection using two other window sizes (500KB and 1.5 Mb) (Supplemental Figure 1). As seen in Figure S1A, the correlation in the distribution of probesets among the three different window sizes is quite high, with coefficients larger than 0.89. Moreover, the overlap in probeset identity found with the three conditions is quite high as demonstrated by the Venn diagrams (figure S1B and S1C). The distribution of probesets by number of SNPs in a window of 1Mbp for both array types is shown on Figure 1C; as can be seen, most probesets will have copy number information from 10–15 SNPs effectively suppressing the effect of unreliable (outlier) datapoints.

Correlation of mRNA Expression vs. Copy Number

To determine the gene expression level/copy number relationship on a genomic scale, we calculated correlation coefficients between these two different values along the genome. To test the hypothesis of no-correlation, we calculated the p-values as the probability of getting a correlation as large as that observed by random chance, when the true correlation is zero. Both correlation coefficients and p-values (for Xba chips) are represented on Supplemental Figure 2A (center and bottom plots). The data clearly show that the majority of highly correlated probesets are located in areas of common deletions or amplifications (top plot Fig. S2A), suggesting a non-random discovery. Plots for Hind chips are almost identical (data not shown). The use of both the Hind and Xba chips serves an internal control for the accuracy and the analyses from each individual chip. Figure S2B demonstrates high concordance between the Xba and Hind chip-derived correlation coefficients consistent with the robustness of the data.

In order to select highly correlated probes, we applied a threshold of ±0.3 to correlation coefficients for both chips; corresponding to p-value < 1×10⁻⁴. Although this threshold leads to a p-value cutoff substantially more stringent than the traditional 0.001 value, we opted to minimize the number of false positives by adopting a highly conservative criterion. With these parameters, 1739 positively correlated probesets (1297 genes) and 59 negatively correlated probesets (42 genes) were identified. A fraction of those genes are listed in Table 1, with complete lists shown in Supplemental Tables S1 and S2.

Table 1.

Sampling of genes found to have high positive correlation values between DNA copy number and mRNA expression analysis in tumors showing altered genomic segments.

Gene	Location	Total Probesets	Correlating Probesets	Correlation Coefficient	Genomic Alteration	Biological Annotation
XIST	Xq13.2	7	7	0.736	Deletion
ZRANB1	10q26.13	2	2	0.698	Deletion
PAPD1	10p11.13	3	3	0.685	Deletion
PTEN	10q23.31	5	3	0.620	Deletion	TSG
MAPK8	10q11.22	3	3	0.659	Deletion
KDELR1	19q13.32	2	2	0.627	Deletion
CDKN2B	9p21.3	2	1	0.471	Deletion	TSG
SFRS11	1q31.1	3	2	0.577	Deletion
ZNF12	7p22.1	3	3	0.665	Amplification
EGFR	7p11.2	9	5	0.646	Amplification	Oncogenic
MDM2	12q15	4	3	0.632	Amplification	Oncogenic
TH1L	20q13.32	4	4	0.619	Amplification
OS9	12q14.1	2	2	0.591	Amplification
CBX3	7p15.2	3	3	0.586	Amplification
IMP3	7p15.3	4	3	0.528	Amplification
FASTK	7q36.1	3	3	0.522	Amplification

Open in a new tab

Our correlation coefficients ranged between −0.6 and 0.7, a number generally higher than that reported in previous studies (14, 15). Not surprisingly, and as a good control for the accuracy of our methodology, the gene with the highest correlation coefficient (0.74) in our analysis corresponds to XIST (X inactivation-specific transcript); a gene known to participate in the silencing of the second X chromosome in females and one that separates male and females into two clusters as we have previously demonstrated (6).

Other highly correlated genes were found on a variety of chromosomes including chromosomes 1, 7, 9, 10, 31, 14, 19, 20 and 22. Expression-copy number scatter plots of eight of those genes, both positively and negatively correlated, are shown on Figure 2.

Scatterplot of several genes with both positive (top row) and negative (bottom row) correlation coefficients between expression and copy number. Genes with many samples having amplification are grouped on the two left columns; genes with many samples having deletions, on the two right columns.

Identification of epigenetically modulated genes

In accordance with the Knudson Hypothesis, tumor suppressor genes (TSG) are often inactivated in a two-step process often involving a deletion of the first allele followed by mutational or epigenetic silencing of the remaining allele. To identify epigenetically modulated genes that could be potential TSG, we looked for mRNA probesets that had substantially lower expression than expected by its genomic copy number, as compared to non-tumor samples, and located within areas of LOH. We identified 3353 probesets (2001 unique genes) within areas of LOH in at least 10% of the samples. From these, 403 probesets (305 unique genes) showed significantly lower expression (p-value< 0.05 and a larger than 2 fold change from non-tumor median expression) than non-tumor samples (Supplemental Table S3). In some of these probesets we observed a binary separation of LOH samples into two groups: one with expression similar to non-tumor samples, and another with very low or no expression, as is the case of PTER and CXCL12 (Figure 3). These genes are different from the negatively correlated genes shown in the lower panels of Figure 2, since their correlation coefficient was positive, although at a significance level less than would be expected if expression from the single remaining allele was normal.

Scatterplot of several genes selected as candidates for epigenetic regulations as determined by analysis of areas of LOH where mRNA expression levels are substantially lower than what would have been predicted based on the gene dosage leading to a lower, but positive, correlation coefficient. Red outline mark samples with Loss of Heterozygosity. Box-and-whisker diagram on the right of each plot shows expression distribution in non-tumor samples. Expression values are shown as a log2 of ratio to the median of non-tumor samples.

Validation of microarray determined mRNA levels

Due to the unreliability of assessing expression levels for any given gene using only microarray probesets, we confirmed the expression level of some of our identified genes in order to rule out false positives (especially those that had lower expression level such as the 305 putative TSGs). To this end, we performed Quantitative Real-Time RT-PCR on a variety of both low and high expressing genes (using ~10 samples in each case) in order to do a correct correlation analysis. Six of such assays are plotted in Figure 4A. As can be seen, the correlation between the microarray and RT-PCR measured values is striking (Correlation Coefficients 0.92 to 0.69) given the often-cited deficiencies of the microarray system. These assays validate the expression values we generated with the Affymetrix chips, thus reducing the likelihood that our analysis has identified a significant number of false positive candidates.

Validation of mRNA expression levels and promoter methylation for selected genes. A. Correlation between expression values from microarray experiment and real-time RT-PCR for 6 genes. Pearson correlation coefficients are shown in parenthesis on the top of each plot. B. Bisulphite sequencing of CXCL12 gene in 5 selected glioma samples (see text). Percentage of methylated colonies (more than 25% of CpG islands are methylated) was calculated from 10–16 individual colonies. Black dots indicate CpG islands methylated in more then 50% of colonies, white dots – less then 50% and grey dots – 50%. Control samples (unmethylated allele) are shown separately. Numbers below dots indicate genomic position of amplified fragment with respect to gene transcriptional start sites. Far right panel shows location of assayed samples in the context of CNA/expression correlation. Black dots represent samples with LOH and gray dots show all other samples. Black outline and label indicate samples used for methylation study. C. Bisulphite sequencing of PTER gene in 5 selected glioma samples. Control sample (unmethylated allele) is shown separately.

Validation of epigenetic modulation targets

The discrepancy between the allelic copy number and the q-PCR-confirmed expression values gives credence to the argument that a number of the 305 identified target genes may be at least partially under epigenetic regulation. Since one of the most common epigenetic regulatory mechanisms involves the methylation of promoter region CpG islands, we investigated the methylation status of the upstream region of a number of these genes. Using the UCSC Genome Browser we assessed that 80.7 % of all the target genes had a CpG island (CpG >55%, at least 300 bp long), compared to 69.4% of genes in a random sample of 305 genes (Figure S3) in our dataset; or 65.4% of all the genes in the Human Genome(16), giving us a Z-test p-value = 0. To directly demonstrate the presence of promoter region methylation, we selected two of the genes (CXCL12 and PTER) that were either clearly underexpressed or normally expressed (compared to normal tissue) in different glioma samples. As shown in Figure 4B and 4C, tumors that had very low level expression of either CXCL12 or PTER had very high rates of CpG methylation in their respective promoters, whereas tumors that had normal expression levels had relatively low rates of CpG methylation consistent with methylation-mediated downregulation of the non-deleted allele in low expressing tumors.

DISCUSSION

We recently reported our analysis of genomic alterations in 178 gliomas using the Affymetrix 100K SNP arrays (6). Although this genomic survey allowed us to probe the glioma genome at a higher resolution than had previously been accomplished, thereby allowing us to identify new areas of CNA, the areas identified were still substantially larger than 100 Kbp. As such, the number of genes found in any given region of CNA is large making the distinction between driver and passenger genes difficult. Likewise, genes identified solely by their expression profiles may be useful biomarkers for the disease state, but have a statistically low probability of being at the root of the tumorigenic process. Therefore, in order to increase the chances of identifying genes most likely of biological relevance, we correlated the levels of mRNA expression with their calculated copy numbers. Through this methodological approach we enrich for candidate genes with a higher probability of being the primary targets of the oncogenic genomic alteration, rather than being a bystander or passenger to the event.

Two prior studies attempted to similarly correlate chromosomal changes with gene expression in gliomas (14, 15). Although these were important advances at the time, we believe that the present study adds significantly to the current state of knowledge in that the prior studies used lower-resolution comparative genomic hybridization (CGH), employed a much smaller number of tumors and restricted the gene expression analysis to a limited number of genes and/or regions of interest. Although both the SNP and gene expression microarray platforms we utilized cannot be considered as quantitative assays, the combination of both these methodologies together with a large sample set greatly increases the robustness of the findings.

One of the primary general findings in our study was that the mRNA expression level of a substantial number of genes in areas of genomic change is highly dependent on gene dosage. Although not unexpected, the present work is the first to show such a strong correlation at a global level in gliomas. To this end it is important to note that any single outlier expression value that is far enough from the bulk value of the other samples will produce a correlation line to those other bulk data yielding an artificial correlation using our methodology. Thus, we chose to remove the expression outliers from the correlation coefficient calculations in an attempt to reduce the number of false positives at the expense of possibly missing a few true positive correlations.

In our study, we chose expression probe set targets as the driving element of analysis, rather than genes themselves. The vast majority of genes on the Affymetrix microarrays are represented by several target sequences (probesets), with expression greatly varying between them. Thus, by mapping every probeset to a group of SNPs, we found their correlation to be more accurate than if we combined expression levels of all probesets for a single gene into a unique value. The accuracy of our methodology was confirmed by the fact that most genes selected as having high correlation were represented well by a large percentage of their associated probesets (Table 1 & Table 2).

Table 2.

Sampling of genes found to have high negative (inverse) correlation values between DNA copy number and mRNA expression analysis in tumors showing altered genomic segments.

Gene	Location	Total Probesets	Correlating Probesets	Correlation Coefficient	Genomic Alteration
CDC2	10q21.2	3	3	−0.457	Deletion
FER1L3	10q23.33	2	2	−0.425	Deletion
ADAM12	10q26.2	5	3	−0.430	Deletion
VIM	10p12.33	2	2	−0.417	Deletion
ITGB1	10p11.22	4	4	−0.416	Deletion
BCAS1	20q13.2	1	1	−0.401	Amplification
RPIB9	7q21.12	2	2	−0.396	Amplification
EPHB6	7q34	1	1	−0.356	Amplification

Open in a new tab

Applying the described methodology, 1297 genes with positive correlation, and 42 genes with negative correlation were identified. Due to large number of samples, the statistical significance obtained is very convincing (p-value < 10⁻⁴). While a positive correlation between gene expression and copy number is easy to understand and points to genes that may be at the root of a genetic alteration, an explanation for genes with negative correlation (i.e. genes with upregulated mRNA in areas of deletions or down regulated mRNA in areas of amplification) is less evident (Table 2, Figure 2 bottom panels). A clue to their relevance may be found in the observation that such genes are found in the areas of large CNA (most of them located on the frequently gained chromosomes 7 or the frequently lost chromosome 10), suggesting that negatively correlated genes may be a response to a passenger genomic alteration. In the case of genes showing negative correlation in areas of amplification, one possible explanation is that the gene in question may have a toxic or detrimental effect on tumor development, and therefore might be down-regulated. This seems a plausible explanation for genes like EPHB6 (located in the amplified 7q34 region, near to MET), which is amplified in a number of gliomas but whose expression is down-regulated. This is consistent with the experimental observation that EPHB6 down-regulation is important for an invasive tumor phenotype, whereas its overexpression has been correlated with a good prognosis in breast cancer (17), melanoma (18) and neuroblastoma (19). Such genes, shown to be down-regulated in amplified chromosomal areas, are obviously very interesting candidates to evaluate as putative tumor suppressors. Conversely, in the case of negative correlation between copy number and expression in genes situated in deleted regions, one could speculate that these genes are vital for cell function and thus a compensatory feedback-loop may be responsible for expression levels above those expected from their gene dosage. The presence of CDC2 and vimentin in this gene list is consistent with such a model.

The Knudson Hypothesis predicts that two successive mutations to an appropriate target gene (i.e. s TSG), each to one allele, may be required for the inducing the full transforming phenotype. Thus, areas of LOH represent fertile regions for discovering potential new TSG since one allele of the gene of interest has, by definition, already been inactivated. Subsequent inactivation of the remaining allele can occur through transcriptional repression (via a number of mechanisms) or allelic mutation. Although gene expression platforms do generally detect genes functionally inactivated through DNA mutations (unless they effect mRNA stability and/or structure), one could potentially identify genes whose non-deleted alleles have been transcriptionally silenced using a gene expression microarray platform. Thus, we tested whether it was possible to detect genes in areas of LOH whose mRNA expression was substantially lower than what would have been predicted based on gene dosage (Figure 3). We identified 305 probesets that met these criteria and were present in at least 10% of the tumor samples we evaluated. Several known TSGs were identified in this analysis (CDKN2B, PTEN) thereby validating the methodological approach. We, however, additionally identified a number of very interesting new putative TSGs, such as CXCL12 (ligand to CXCR4, which is involved in chemoattraction and invasion) and HK1 (member of the hexokinase family, known to regulate apoptotic pathways). Visual inspection of the expression-CN scatter plots for these genes demonstrate that some of them segregate by tumor sample into either very low or normal expressers (Figure 3). Validation of the expression levels by quantitative RT-PCR demonstrated that these populations are real rather than an artifact, suggesting a potential epigenetic regulatory mechanism for the low expression. The appearance of a substantial number of CpG islands at the 5’ end of more than 80% of the genes identified with this method compared with 65% in a randomly selected set of genes, suggests that our methodology greatly enriched the sample set for epigenetically regulated genes. In support of this assertion, we demonstrate that two of the genes with the most clear bimodal expression level pattern (CXCL12 and PTER) had substantially higher promoter region-associated CpG island methylation in the low-expressing glioma samples compared to the normal expressing gliomas.

In summary, data from mRNA expression and DNA genotyping high-resolution arrays individually produce a wealth of molecular information, although the delineation of which events are primary to the biology of the tumor and which events are secondary or passenger events can be a daunting task. We have shown that integration of both array technologies potentially allows one to enrich for specific genes that might be involved in the initiation, propagation and/or maintenance of the tumorigenic process. This and other similar analyses should make it easier to more accurately choose which genes to experimentally validate as potential new oncogenes or TSGs and in so doing, increase the efficiency of identifying potential new targets for the treatment of gliomas and other tumors.

ACKNOWLEDGEMENTS

This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.

Footnotes

Files for the Genotyping and Expression Microarrays can be found at the GEO (accession # GSE6109 and GSE4290, respectively) and caArray [http://caarraydb.nci.nih.gov] (experiment ID 1015897589852334). Additional analysis on the data can be done in REMBRANDT (Repository of Molecular Brain Neoplasia Data) [http://Rembrandt.nci.nih.gov]

http://www.affymetrix.com/analysis/index.affx

REFERENCES

1.Cancer Statistics Branch N, NIH. Harras A, editor. Cancer Survival rates. Washington, DC: US Dept of Health & Human Services, National Institutes of Health; Cancer: Rates & Risks. 1996:28–34.
2.Hartmann C, Mueller W, von Deimling A. Pathology and molecular genetics of oligodendroglial tumors. J Mol Med. 2004;82:638–655. doi: 10.1007/s00109-004-0565-9. [DOI] [PubMed] [Google Scholar]
3.Reifenberger G, Collins VP. Pathology and molecular genetics of astrocytic gliomas. J Mol Med. 2004;82:656–670. doi: 10.1007/s00109-004-0564-x. [DOI] [PubMed] [Google Scholar]
4.Raizer JJ. HER1/EGFR tyrosine kinase inhibitors for the treatment of glioblastoma multiforme. J Neurooncol. 2005;74:77–86. doi: 10.1007/s11060-005-0603-7. [DOI] [PubMed] [Google Scholar]
5.Haas-Kogan DA, Prados MD, Tihan T, et al. Epidermal growth factor receptor, protein kinase B/Akt, and glioma response to erlotinib. J Natl Cancer Inst. 2005;97:880–887. doi: 10.1093/jnci/dji161. [DOI] [PubMed] [Google Scholar]
6.Kotliarov Y, Steed ME, Christopher N, et al. High-resolution Global Genomic Survey of 178 Gliomas Reveals Novel Regions of Copy Number Alteration and Allelic Imbalances. Cancer Res. 2006;66:9428–9436. doi: 10.1158/0008-5472.CAN-06-1691. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Matsuzaki H, Dong S, Loi H, et al. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods. 2004;1:109–111. doi: 10.1038/nmeth718. [DOI] [PubMed] [Google Scholar]
8.Huang J, Wei W, Zhang J, et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics. 2004;1:287–299. doi: 10.1186/1479-7364-1-4-287. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Miller CL, Diglisic S, Leister F, Webster M, Yolken RH. Evaluating RNA status for RT-PCR in extracts of postmortem human brain tissue. Biotechniques. 2004;36:628–633. doi: 10.2144/04364ST03. [DOI] [PubMed] [Google Scholar]
10.Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet. 1999;21:20–24. doi: 10.1038/4447. [DOI] [PubMed] [Google Scholar]
11.Seo J, Bakay M, Chen YW, Hilmer S, Shneiderman B, Hoffman EP. Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays. Bioinformatics. 2004;20:2534–2544. doi: 10.1093/bioinformatics/bth280. [DOI] [PubMed] [Google Scholar]
12.Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics. 2006;22:808–814. doi: 10.1093/bioinformatics/btk052. [DOI] [PubMed] [Google Scholar]
13.Kotliarova S, Pastorino S, Kovell LC, et al. Glycogen synthase kinase-3 inhibition induces glioma cell death through c-MYC, nuclear factor-kappaB, and glucose regulation. Cancer Res. 2008;68:6643–6651. doi: 10.1158/0008-5472.CAN-08-0850. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Nigro JM, Misra A, Zhang L, et al. Integrated array-comparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma. Cancer Res. 2005;65:1678–1686. doi: 10.1158/0008-5472.CAN-04-2921. [DOI] [PubMed] [Google Scholar]
15.Dehan E, Ben-Dor A, Liao W, et al. Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung cancer (Amsterdam, Netherlands) 2007;56:175–184. doi: 10.1016/j.lungcan.2006.12.010. [DOI] [PubMed] [Google Scholar]
16.Robinson PN, Bohme U, Lopez R, Mundlos S, Nurnberg P. Gene-Ontology analysis reveals association of tissue-specific 5' CpG-island genes with development and embryogenesis. Hum Mol Genet. 2004;13:1969–1978. doi: 10.1093/hmg/ddh207. [DOI] [PubMed] [Google Scholar]
17.Fox BP, Kandpal RP. Transcriptional silencing of EphB6 receptor tyrosine kinase in invasive breast carcinoma cells and detection of methylated promoter by methylation specific PCR. Biochem Biophys Res Commun. 2006;340:268–276. doi: 10.1016/j.bbrc.2005.11.174. [DOI] [PubMed] [Google Scholar]
18.Hafner C, Bataille F, Meyer S, et al. Loss of EphB6 expression in metastatic melanoma. Int J Oncol. 2003;23:1553–1559. [PubMed] [Google Scholar]
19.Tang XX, Robinson ME, Riceberg JS, et al. Favorable neuroblastoma genes and molecular therapeutics of neuroblastoma. Clin Cancer Res. 2004;10:5837–5844. doi: 10.1158/1078-0432.CCR-04-0395. [DOI] [PubMed] [Google Scholar]

[R1] 1.Cancer Statistics Branch N, NIH. Harras A, editor. Cancer Survival rates. Washington, DC: US Dept of Health & Human Services, National Institutes of Health; Cancer: Rates & Risks. 1996:28–34.

[R2] 2.Hartmann C, Mueller W, von Deimling A. Pathology and molecular genetics of oligodendroglial tumors. J Mol Med. 2004;82:638–655. doi: 10.1007/s00109-004-0565-9. [DOI] [PubMed] [Google Scholar]

[R3] 3.Reifenberger G, Collins VP. Pathology and molecular genetics of astrocytic gliomas. J Mol Med. 2004;82:656–670. doi: 10.1007/s00109-004-0564-x. [DOI] [PubMed] [Google Scholar]

[R4] 4.Raizer JJ. HER1/EGFR tyrosine kinase inhibitors for the treatment of glioblastoma multiforme. J Neurooncol. 2005;74:77–86. doi: 10.1007/s11060-005-0603-7. [DOI] [PubMed] [Google Scholar]

[R5] 5.Haas-Kogan DA, Prados MD, Tihan T, et al. Epidermal growth factor receptor, protein kinase B/Akt, and glioma response to erlotinib. J Natl Cancer Inst. 2005;97:880–887. doi: 10.1093/jnci/dji161. [DOI] [PubMed] [Google Scholar]

[R6] 6.Kotliarov Y, Steed ME, Christopher N, et al. High-resolution Global Genomic Survey of 178 Gliomas Reveals Novel Regions of Copy Number Alteration and Allelic Imbalances. Cancer Res. 2006;66:9428–9436. doi: 10.1158/0008-5472.CAN-06-1691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Matsuzaki H, Dong S, Loi H, et al. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods. 2004;1:109–111. doi: 10.1038/nmeth718. [DOI] [PubMed] [Google Scholar]

[R8] 8.Huang J, Wei W, Zhang J, et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics. 2004;1:287–299. doi: 10.1186/1479-7364-1-4-287. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Miller CL, Diglisic S, Leister F, Webster M, Yolken RH. Evaluating RNA status for RT-PCR in extracts of postmortem human brain tissue. Biotechniques. 2004;36:628–633. doi: 10.2144/04364ST03. [DOI] [PubMed] [Google Scholar]

[R10] 10.Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet. 1999;21:20–24. doi: 10.1038/4447. [DOI] [PubMed] [Google Scholar]

[R11] 11.Seo J, Bakay M, Chen YW, Hilmer S, Shneiderman B, Hoffman EP. Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays. Bioinformatics. 2004;20:2534–2544. doi: 10.1093/bioinformatics/bth280. [DOI] [PubMed] [Google Scholar]

[R12] 12.Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics. 2006;22:808–814. doi: 10.1093/bioinformatics/btk052. [DOI] [PubMed] [Google Scholar]

[R13] 13.Kotliarova S, Pastorino S, Kovell LC, et al. Glycogen synthase kinase-3 inhibition induces glioma cell death through c-MYC, nuclear factor-kappaB, and glucose regulation. Cancer Res. 2008;68:6643–6651. doi: 10.1158/0008-5472.CAN-08-0850. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Nigro JM, Misra A, Zhang L, et al. Integrated array-comparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma. Cancer Res. 2005;65:1678–1686. doi: 10.1158/0008-5472.CAN-04-2921. [DOI] [PubMed] [Google Scholar]

[R15] 15.Dehan E, Ben-Dor A, Liao W, et al. Chromosomal aberrations and gene expression profiles in non-small cell lung cancer. Lung cancer (Amsterdam, Netherlands) 2007;56:175–184. doi: 10.1016/j.lungcan.2006.12.010. [DOI] [PubMed] [Google Scholar]

[R16] 16.Robinson PN, Bohme U, Lopez R, Mundlos S, Nurnberg P. Gene-Ontology analysis reveals association of tissue-specific 5' CpG-island genes with development and embryogenesis. Hum Mol Genet. 2004;13:1969–1978. doi: 10.1093/hmg/ddh207. [DOI] [PubMed] [Google Scholar]

[R17] 17.Fox BP, Kandpal RP. Transcriptional silencing of EphB6 receptor tyrosine kinase in invasive breast carcinoma cells and detection of methylated promoter by methylation specific PCR. Biochem Biophys Res Commun. 2006;340:268–276. doi: 10.1016/j.bbrc.2005.11.174. [DOI] [PubMed] [Google Scholar]

[R18] 18.Hafner C, Bataille F, Meyer S, et al. Loss of EphB6 expression in metastatic melanoma. Int J Oncol. 2003;23:1553–1559. [PubMed] [Google Scholar]

[R19] 19.Tang XX, Robinson ME, Riceberg JS, et al. Favorable neuroblastoma genes and molecular therapeutics of neuroblastoma. Clin Cancer Res. 2004;10:5837–5844. doi: 10.1158/1078-0432.CCR-04-0395. [DOI] [PubMed] [Google Scholar]

PERMALINK

Correlation Analysis between SNP and Expression Arrays in Gliomas Identify Potentially Relevant Targets Genes1

Yuri Kotliarov

Svetlana Kotliarova

Nurdina Charong

Aiguo Li

Jennifer Walling

Elisa Aquilanti

Susie Ahn

Mary Ellen Steed

Qin Su

Angela Center

Jean C Zenklusen

Howard A Fine