Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Aug 18.
Published in final edited form as: Prostate Cancer Prostatic Dis. 2010 Apr 13;13(3):252–259. doi: 10.1038/pcan.2010.9

Usefulness of the top-scoring pairs (TSP) of genes for prediction of prostate cancer progression

Hongya Zhao 1, Christopher J Logothetis 1, Ivan P Gorlov 1
PMCID: PMC4136464  NIHMSID: NIHMS610680  PMID: 20386565

Abstract

Prediction of cancer progression after radical prostatectomy (RP) is one of the most challenging problems in the management of prostate cancer. Gene-expression profiling is widely used to identify genes associated with such progression. Usually candidate genes are identified according to a gene-by-gene comparison of expression. Recent reports suggested that relative expression of a gene pair more efficiently predicts cancer progression than single-gene analysis does. The top-scoring pair (TSP) algorithm classifies phenotypes according to the relative expression of a pair of genes. We applied the TSP approach to predict which patients would experience systemic tumor progression after RP. Relative expression of TPD52L2/SQLE and CEACAM1/BRCA1 gene pairs identified those patients, with more than 99% specificity but relatively low sensitivity (~10%). These two gene pairs were validated in three independent datasets. Additionally, combining two pairs of genes improved sensitivity without compromising specificity. Functional annotation of the TSP genes demonstrated that they cluster by a limited number of biologic functions and pathways, suggesting that relatively lower expression of genes from specific pathways can predict cancer progression. In conclusion, comparative analysis of the expression of two genes may be a simple and effective classifier for prediction of prostate cancer progression. The TSP approach can be used to identify patients whose prostate cancer will progress after they undergo radical prostatectomy. Two gene pairs can predict which men would experience progression to the metastatic form of the disease. However, because our analysis was based on a relatively small number of genes, a larger study will be needed to identify the best predictors of disease outcome overall.

Keywords: prostate cancer, gene co-expression, top-scoring pairs of genes, metastasis, cancer progression

Introduction

Screening for prostate cancer according to serum prostate-specific antigen (PSA) level has improved early detection of the disease, resulting in increased identification of patients with localized disease that is still curable with surgery and radiotherapy. However, 20–30% of treated patients experience a relapse.1,2 Thus, from the clinical perspective, it is important to be able to predict which patients will experience a relapse.36

Understanding the biology of prostate cancer progression is essential for enabling the development of prognostic markers and effective therapeutic targets. A number of studies have been conducted to characterize the dynamics of gene expression in prostate cancer progression by using DNA microarrays.5,712 In some studies, tumor-expression signatures associated with clinical parameters and outcome have been identified.1316

Normally, researchers use one gene at a time to find any association between gene expression and phenotype. The results of recent studies, however, suggest that assessing the expression of more than one gene (i.e., co-expression analysis) yields a better prediction of tumor progression than the analysis of individual genes does.5,6 Motivated by these findings, we analyzed whether co-expression patterns can be useful in predicting prostate cancer progression by using the paired-gene approach of the top-scoring pairs (TSP) algorithm as described by Geman et al. in 2004.17 However, we modified the algorithm by controlling the specificity and sensitivity of the TSPs. This may be particularly important when the results of classification influence the selection of treatment modality or follow-up procedures that can be associated with serious side effects. For our purpose, we took this approach to analyze gene-expression data from a recently reported large case-control microarray study.5

Materials and methods

Datasets

To overcome the disadvantageous pattern of microarray data (small sample size and large number of variables), we searched the Gene Expression Omnibus (GEO) database and selected four studies of prostate cancer progression that each had at least 10 samples per phenotype. We used one as the discovery dataset and the other three as validation datasets (Table 1). Three pairs of phenotypes were compared in the four datasets: (1) systemic progression (SYS; development of distant metastasis after radical prostatectomy [RP]) vs no evidence of disease (NED) (the GSE10645 dataset),5 (2) primary prostate tumors vs distant metastases (the GSE6752 and GSE6919 datasets),7,11 and (3) hormone-sensitive vs hormone-refractory prostate cancer (the GSE6811 dataset).10

Table 1.

Gene-expression datasets used in our analysis

GEO ID PMID Phenotypes Number of
samples
Platform Number of
probes
GSE10645 18846227 NED vs SYS 213 vs 213 GPL5858 and GPL5873 1028
GSE6752 17430594 Primary prostate tumor vs prostate tumor metastases 10 vs 21 CodeLink UniSet Human 20K I Bioarray (GPL2891) 23 572
GSE6811 17545589 HSPCs vs HRPCs 10 vs 25 YN Human 36K (sets 1–8) (GPL4747) 36 864
GSE6919 15254046 Primary prostate tumor vs prostate tumor metastases 196 vs 75 Affymetrix Human Genome U95 Version 2, U95C, and U95B Array (GPL8300, GPL93, and GPL92) 12 625
12 646
12 620

Abbreviations: NED, no evidence of disease; SYS, systemic prostate cancer progression; HSPC, hormone-sensitive prostate cancers; HRPC, hormone-refractory prostate cancers.

For our discovery set, we used gene-expression data from the study by Nakagawa et al.5 (i.e., dataset 1 as described above). Those authors analyzed the association between gene expression and outcome after initial therapy, comparing data from a group of 213 patients who had NED during the 7 years after undergoing retropubic RP (RRP) with those from a group of 213 patients who experienced SYS during the 5 years after the initial rise in PSA level. The authors analyzed the expression of 1028 genes for which previous evidence had implicated their involvement in prostate cancer progression.

For our validation studies, we did not analyze exactly the same phenotypes that we analyzed in the discovery dataset. As counterparts for the NED data, we used primary tumor data from the GSE6752 and GSE6919 datasets and hormone-sensitive prostate cancer data from the GSE6811 dataset. Our counterparts for the SYS phenotype were data on distant metastases from GSE6752 and GSE6919 and hormone-refractory prostate cancer from GSE6811.

Top-scoring pairs algorithm

The original TSP classifier algorithm was described by Geman et al.17 Here we give a simplified description of the method. Given one expression matrix X = [xgn] of G genes and N samples, xgn represents the gth gene-expression value from the nth sample. Assume that every sample can be labeled as either 1 or 2. For example, the N1 samples from 1 to N1 (N1 < N) are labeled class 1 (e.g., NED), and N2 samples from (N1 + 1) to N are labeled class 2 (e.g., SYS), in which N1 + N2 = N. The focus is on detecting “marker gene pairs” (i, j) for which there is a significant difference in the probability of observing Xi < Xj from class 1 to class 2. The conditional probabilities of observing Xi < Xj in each class are defined as follows:

pij(1)=Pr(Xi<Xj|c=1)=1N1n=1N1I(xin<xjn)pij(2)=Pr(Xi<Xj|c=2)=1N2n=N1+1NI(xin<xjn)

in which I(xin < xjn) is the indicator function, defined as

I(xin<xjn)={1,xin<xjn0,xinxjnn=1,2,,N

The typical TSP method is based on maximizing the scores of (i,j) Δij = |pij(1) − pij(2)|. This approach provides superior performance to that of support vector machines and other sophisticated methods for classifying cancer samples.18,19 Although maximizing the delta value identifies the best classifier, that classifier may be associated with relatively low sensitivity and specificity, which is a concern when the classifier is used in selecting a treatment. Because of this, we included an assessment of specificity and sensitivity in our analysis, keeping at least 99% specificity and trying to maximize the sensitivity.

The statistical significance of classifiers can be assessed by randomly permuting the class labels maintaining the sample sizes N1 and N2. From this permutation analysis, we estimated P values associated with a given conditional probability. This P value can be interpreted as the probability that the gene pair is not informative for the classification.

Functional annotation

For functional annotation, we applied Ingenuity Pathways Analysis (Ingenuity Systems, Inc, Redwood City, CA, USA; http://www.ingenuity.com/), which evaluates the distribution of the top-ranked genes by both pathway and gene ontology categories, testing the null hypothesis that the genes are uniformly distributed. P values characterize the statistical evidence for the clustering of the genes by pathway or functional categories: the lower the P value, the stronger the statistical evidence that the top-ranking genes belong to a specific pathway or functional category.

Results

Top-scoring pairs

Table 2 lists the TSPs we identified as useful for predicting which patients will develop SYS and which will have NED after they undergo RRP. These TSPs were selected on the basis of a delta value of at least 0.1 and on the condition that the specificity was ≥ 99%. We identified 22 TSPs for predicting SYS but only two for predicting NED.

Table 2.

Top-scoring pairs of genes for prediction of systemic progression (SYS) and no evidence for disease (NED). The last column is the estimated P value for pair genes by 106 permutation testing

Predict Gene1 Gene2 P(X1<X2|
NED)
P(X1<X2|
SYS)
Delta
= |P1−P2|
Sensitivity Specificity RR P value
NED F5 ESR2 0.108 0.000 0.108 0.108 1.000 0.000 <10E-6
NED COL1A1 CDKN1B 0.149 0.010 0.139 0.149 0.990 0.119 <10E-6
SYS PAGE4 ANXA2 0.000 0.130 0.130 0.130 1.000 2.115 <10E-6
SYS HLF BLM 0.000 0.105 0.105 0.105 1.000 2.081 <10E-6
SYS PASK SQLE 0.000 0.105 0.105 0.105 1.000 2.055 10E-6
SYS MYBPC1 TMEM65 0.005 0.175 0.170 0.175 0.995 2.121 <10E-6
SYS SNTB2 SQLE 0.005 0.165 0.160 0.165 0.995 2.046 <10E-6
SYS JUN MKI67 0.005 0.155 0.150 0.155 0.995 2.037 10E-6
SYS PAGE4 TAF2 0.005 0.140 0.135 0.140 0.995 2.037 5 × 10E-6
SYS PTN IMMT 0.005 0.135 0.130 0.135 0.995 2.027 <10E-6
SYS EFS WDR67 0.005 0.130 0.125 0.130 0.995 2.018 3 × 10E-6
SYS EDG7 LGR4 0.005 0.125 0.120 0.125 0.995 2.018 <10E-6
SYS TCF7L2 BIRC5 0.005 0.120 0.115 0.120 0.995 2.096 <10E-6
SYS PTEN DVL3 0.005 0.120 0.115 0.120 0.995 2.009 <10E-6
SYS GDF15 MKI67 0.005 0.120 0.115 0.120 0.995 2.009 <10E-6
SYS SLC45A3 ENY2 0.005 0.115 0.110 0.115 0.995 2.009 2 × 10E-6
SYS MADH2 NOTCH2 0.005 0.115 0.110 0.115 0.995 2.009 <10E-6
SYS PXN TK1 0.005 0.115 0.110 0.115 0.995 2.089 <10E-6
SYS CEACAM1 BRCA1 0.005 0.110 0.105 0.110 0.995 1.999 <10E-6
SYS PAGE4 LGR4 0.005 0.110 0.105 0.110 0.995 1.999 10E-6
SYS TPD52L2 SQLE 0.005 0.110 0.105 0.110 0.995 1.999 <10E-6
SYS PAGE4 ST3GAL1 0.005 0.110 0.105 0.110 0.995 1.999 <10E-6
SYS TMEM71 TPX2 0.005 0.110 0.105 0.110 0.995 1.999 10E-6

Abbreviation: RR, relative risk.

We validated the top 22 TSPs by using the three independent validation datasets (Table 3). To estimate the overall consistency between the discovery and validation sets, we looked at the correlation between proportions of positive decisions in the two sets. The discovery and validation sets tended to have a similar proportion of positive classifying decisions (i.e., Xi < Xj): the correlation coefficient was 0.44 (N = 104, P = 0.000003). We identified two pairs of genes—TPD52L2/SQLE and CEACAM1/BRCA1—that were significant in all three validation datasets. Analysis of the interactions between genes by using Pathways Studio (Ariadne, Inc, Rockville, MD, USA; http://www.ariadnegenomics.com/products/pathway-studio/) identified no interactions between the genes, which suggests that TPD52L2, SQLE, CEACAM1, and BRCA1 act independently. Figure 1 shows the classification plots for comparing the expression values of the gene pairs TPD52L2 and SQLE in the discovery and validation sets. Similar patterns can also be plotted with gene pair of CEACAM1 and BRCA1.

Table 3.

In silico validation of the 24 top-scoring pairs listed in Table 2; the two top validated pairs of genes are shown in bold

Predicted
phenotype
Dataset 1 Gene1 Gene2 P(X1<X2|NED) P(X1<X2|SYS)
SYS GSE10645 TPD52L2 SQLE 0.005128 0.11
Meta GSE6752 TPD52L2 SQLE 0 0.095238
HR GSE6811 TPD52L2 SQLE 0 0.52
Metastasis GSE6919 TPD52L2 SQLE 0.015385 0.24

SYS GSE10645 CEACAM1 BRCA1 0.005128 0.11
Meta GSE6752 CEACAM1 BRCA1 0 0.238095
HR GSE6811 CEACAM1 BRCA1 0.2 0.72
Metastasis GSE6919 CEACAM1 BRCA1 0.061538 0.64

NED GSE10645 COL1A1 CDKN1B 0.148718 0.01
HS GSE6811 COL1A1 CDKN1B 1 0.64
Primary GSE6919 COL1A1 CDKN1B 0.230769 0.08

SYS GSE10645 EDG7 LGR4 0.005128 0.125
Meta GSE6752 EDG7 LGR4 0 0.238095
HR GSE6811 EDG7 LGR4 0.6 0.68

SYS GSE10645 EFS WDR67 0.005128 0.13
Meta GSE6752 EFS WDR67 0 0.047619
HR GSE6811 EFS WDR67 0.5 0.84
Metastasis GSE6919 EFS WDR67 0.030769 0.12

NED GSE10645 F5 ESR2 0.107692 0
HR GSE6811 F5 ESR2 0.1 0.36
Primary GSE6919 F5 ESR2 0.753846 0.68

SYS GSE10645 GDF15 MKI67 0.005128 0.12
Meta GSE6752 GDF15 MKI67 0 0.142857
HR GSE6811 GDF15 MKI67 0.4 0.68

SYS GSE10645 HLF BLM 0 0.105
Meta GSE6752 HLF BLM 0.8 1
HS GSE6811 HLF BLM 0.9 0.84
Metastasis GSE6919 HLF BLM 0.507692 0.72

SYS GSE10645 JUN MKI67 0.005128 0.155
HR GSE6811 JUN MKI67 0.8 0.96
Metastasis GSE6919 JUN MKI67 0.015385 0.04

SYS GSE10645 PAGE4 ANXA2 0 0.13
HS GSE6811 PAGE4 ANXA2 0.1 0.04
Metastasis GSE6919 PAGE4 ANXA2 0.169231 0.92

SYS GSE10645 PAGE4 LGR4 0.005128 0.11
Meta GSE6752 PAGE4 LGR4 0.2 0.857143
HR GSE6811 PAGE4 LGR4 0.6 0.68

SYS GSE10645 PAGE4 ST3GAL1 0.005128 0.11
Meta GSE6752 PAGE4 ST3GAL1 0.2 1
HR GSE6811 PAGE4 ST3GAL1 0.1 0.32
Metastasis GSE6919 PAGE4 ST3GAL1 0.153846 0.96

SYS GSE10645 PAGE4 TAF2 0.005128 0.14
Meta GSE6752 PAGE4 TAF2 0.1 1
HR GSE6811 PAGE4 TAF2 0.4 0.68
Metastasis GSE6919 PAGE4 TAF2 0.353846 1

SYS GSE10645 PASK SQLE 0 0.105
HR GSE6811 PASK SQLE 0 0.84
Metastasis GSE6919 PASK SQLE 0.723077 0.92

SYS GSE10645 PTEN DVL3 0.005128 0.12
HR GSE6811 PTEN DVL3 0.4 0.48
Metastasis GSE6919 PTEN DVL3 0.984615 1

SYS GSE10645 PTN IMMT 0.005128 0.135
HR GSE6811 PTN IMMT 0.7 0.88
Metastasis GSE6919 PTN IMMT 0.076923 0.64

SYS GSE10645 PXN TK1 0.005128 0.115
Meta GSE6752 PXN TK1 0.9 0.952381
HR GSE6811 PXN TK1 0.6 0.8
Metastasis GSE6919 PXN TK1 0.261538 0.68

SYS GSE10645 SLC45A3 ENY2 0.005128 0.115
HR GSE6811 SLC45A3 ENY2 0.3 0.52

SYS GSE10645 SNTB2 SQLE 0.005128 0.165
Meta GSE6752 SNTB2 SQLE 0 0.952381
HR GSE6811 SNTB2 SQLE 0.2 0.72
Metastasis GSE6919 SNTB2 SQLE 0.266317 0.42

SYS GSE10645 TCF7L2 BIRC5 0.005128 0.12
Meta GSE6752 TCF7L2 BIRC5 0.8 0.952381
HR GSE6811 TCF7L2 BIRC5 0.4 0.76
Metastasis GSE6919 TCF7L2 BIRC5 0.230769 0.44

SYS GSE10645 TMEM71 TPX2 0.005128 0.11
HR GSE6811 TMEM71 TPX2 0.4 0.72

Figure 1.

Figure 1

Scatterplots of the expression of TPD52L2 and SQLE genes in the discovery set (GSE10645) and the three validation sets (GSE6752, GSE6811, and GSE6919). Each data point represents a patient: red dots are patients with systemic cancer progression (SYS) for the discovery dataset and corresponding counterparts for the validation datasets. The slanted lines are decision rules: above the decision line, the expression of TPD52L1 is lower than that of SQLE. NED, no evidence of disease; HRPC, hormone-refractory prostate cancer; HSPC, hormone-sensitive prostate cancer.

Double top-scoring pairs

Although the specificity for predicting SYS was > 99% for the two TSPs we identified, the sensitivity was relatively low: we could make a confident prediction for only ~10% of patients. Therefore, to increase the sensitivity, we combined two TSPs, i.e., we used double TSPs (DTSPs). The decision rule for DTSP was the condition that Xi < Xj, in which Xi is the expression level of gene i and Xj is the expression level of gene j, which was met in at least one pair of genes. Figure 2 shows the scatterplots for the combination of the two pairs, TPD52L2/SQLE and CEACAM/BRCA1. The shaded quadrants in the figure represent the decision region. Combining these two pairs of genes increased the sensitivity to 21.5% without compromising the specificity, which was 99.5%.

Figure 2.

Figure 2

Combination of TPD52L2/SQLE and CEACAM/BRCA1 pairs for predicting prostate tumor progression. The x axis is the log-transformed expression ratio log(CEACAM/BRCA1), and the y axis is that of log(TPD52L2/SQLE). The top right quadrant is the area where the conditions CEACAM > BRCA1 and TPD52L2 > SQLE were met; the top left quadrant is where the conditions CEACAM < BRCA1 and TPD52L2 > SQLE were met; the bottom right is where the conditions TPD52L2 > SQLE and CEACAM < BRCA1 were met; and the bottom left is where the conditions TPD52L2 < SQLE and CEACAM < BRCA1 were met. According to the decision criteria of the double top-scoring pairs analysis, the three shaded quadrants comprise the decision region for predicting systemic prostate cancer progression (SYS). NED, no evidence of disease.

The statistical significance of the DTSPs was estimated by using permutation testing. Figure 3 illustrates the approach we took and shows the distribution of the conditional SYS probability for at least one (Xi < Xj) of the two gene pairs. The estimated probability value of 0.215 was not found among 10 000 permutations, suggesting that the P value for the DTSPs TPD52L2/SQLE and CEACAM/BRCA1 for predicting SYS was less than 0.0001.

Figure 3.

Figure 3

Graph illustrates the permutation distribution of P (X1 < X2|SYS) for the gene pairs TPD52L2/SQLE and CEACAM/BRCA1. The arrow indicates that the estimated P value (x axis), 0.215, was not found among the 10 000 permutations (y axis).

Primary tumor vs distant metastasis

An obvious advantage of the study by Nakagawa et al.5 is its large sample size, > 200 individuals in each group. A disadvantage, though, is the limited number of genes analyzed—a total of 1028. Although those investigators selected genes that have been implicated in prostate cancer progression, it is probable that some good candidate genes were missed. The study by Chandran et al.7 was done with a smaller sample: they analyzed the gene-expression profiles of only 24 androgen-refractory metastatic samples and 64 primary tumors, but they assessed genome-wide gene expression by using Affymetrix HGU95av2, HGU95b, and HGU95c arrays (Affymetrix, Inc, Santa Clara, CA, USA). We reanalyzed the data from Chandran et al. by using all 23 572 probes from the GSE6919 dataset. The resulting TSPs for distant metastasis and primary tumors are shown in Supplementary Table S1. Three gene pairs—GRB2/ADD1, IDH3G/LARP1, and HNRNPUL1/LARP1 were perfect classifiers, with 100% sensitivity and 100% specificity. The classification plot for IDH3G/LARP1, which is representative of the three plots, is shown in Figure 4.

Figure 4.

Figure 4

Classification plot for discrimination between primary tumors and distant metastasis, using data from Chandran et al.5 This exemplary pair of genes, IDH3G/LARP1, has 100% sensitivity and 100% specificity. The x axis represents the log-expression value of IDH3G, and the y axis is that of LARP1.

Functional annotation

Functional annotation of the genes listed in Supplementary Table S1 was conducted using Ingenuity Pathways Analysis. We subdivided those genes into four groups by their relative expression: (1) high expression in the primary tumor, (2) low expression in the primary tumor, (3) high expression in the distant metastasis, and (4) low expression in the distant metastasis. Figure 5 shows the significant canonical pathways identified for each group of genes with top five P values. Among them, the most significant pathways include molecular mechanisms of cancer (group 2), insulin receptor signaling (group 2), integrin signaling (group 2), and regulation of actin-based motility by Rho (group 4).

Figure 5.

Figure 5

Ingenuity-defined canonical pathways for four groups of genes: (1) relatively low expression in the primary tumor (N = 463), (2) relatively high expression in the primary tumor (N = 341), (3) relatively low expression in the distant metastasis (N = 162), and (4) relatively high expression in the distant metastasis (N = 598). The numbers in parentheses in each quadrant are −log(P).

To find out whether the classifying pairs of genes form specific classifying pairs of pathways, we took the genes from each of the 17 different pathways listed in Figure 5 and identified the genes with which they made pairs. Then we looked for functional enrichment of those pairs of genes. We found that the relatively higher expression of genes associated with arginine and proline metabolism than that of genes associated with insulin receptor signaling predicts for primary tumor (P = 1.6E–7). We also found that the relatively lower expression of ILK-signaling genes than that of xenobiotic metabolism–signaling genes predicts for distant metastases. The genes in the arginine and proline metabolism, insulin receptor signaling, ILK signaling, and xenobiotic-metabolism pathways are listed in Supplementary Table S2.

Discussion

By applying the modified TSP approach to analyze the reported data from the largest retrospective study to date that has evaluated for an association between gene expression in primary tumor and disease progression after RRP, we identified two pairs of genes whose relative expression predicts such tumor progression. We validated their usefulness by using three independent datasets. The overall results of our analyses suggest that the TSP-based approach can be used to identify predictors of prostate cancer progression.

One major limitation associated with the TSP approach, however, is that a large number of tests can lead to a large number of false-positive results. That problem is expected to be most severe when the sample size is small and the number of probes is large, as they are in a typical microarray design. For example, if the number of samples in each of two groups is 10 and the number of probes is ~10 000, the expected number of perfect classifiers that can appear by chance is 47.6=C100002220. To overcome this problem, we used the reported data from the largest published study conducted to date on prostate cancer progression.5 In that study, the SYS and NED groups both contained 213 individuals. This, in combination with a relatively small number of probes, contributed to the low P values estimated by our permutation testing (Table 2). For validation, we used data from studies that had much smaller sample sizes, but because we validated only the top 22 gene pairs, false-positive results were not a serious concern.

The two gene pairs we identified for use in predicting which patients will develop distant metastases of prostate cancer after undergoing RRP were TPD52L2/SQLE and CEACAM1/BRCA1. All four of those genes have some connection to prostate cancer. TPD52 reportedly regulates the migration of prostate tumor cells.20 SQLE is a gene from the 8q24 region that has been strongly associated with prostate cancer risk in several genome-wide association studies.2123 Further, CEACAM1 plays an important role in the change from endothelial cells to an angiogenic phenotype during prostate cancer development,24 and BRCA1 is regulated by the androgen receptor and often mutates in prostate tumors.25

In the three studies from which we took our data for validation, the phenotypes were not exactly the same as those in the discovery set: Chandran et al.7 and Yu et al.11 both compared gene expression in distant metastases with that in primary tumors, and Tamura et al.10 used hormone-refractory vs hormone-sensitive prostate cancer. Despite the obvious differences in those analyzed phenotypes, we believe that it was reasonable to use them for validation because hormone-refractory tumors and distant metastases are likely to have a common gene-expression signature.

Whether the genes from the TSPs belong to specific pathways or have specific biologic functions is an interesting question. It was problematic to use genes from the study of Nakagawa et al.5 for functional annotation because those genes were selected on the basis of published evidence of their involvement in prostate cancer progression. Thus, we used the GSE6911 dataset and separately annotated four groups, as described in Results. We found that lower expression of genes from arginine and proline metabolism relative to the expression of genes from the insulin receptor signaling pathway may used to classify primary tumors vs distant metastases. Arginine is one of the most versatile amino acids in animal cells, serving as a precursor for the synthesis of proteins and nitric oxide (NO). NO has an important function in cancer, including prostate cancer, in that increased NO generation contributes to angiogenesis by up-regulating vascular endothelial growth factor.26 In addition, a high concentration of NO is associated with chronic inflammation and tumorigenesis. In prostate cancer, NO is involved in inhibiting androgen receptor activity and is associated with disease progression.27, 28 Our analyses also showed that the arginine and proline metabolism pathway may be coupled with the insulin receptor signaling pathway to classify primary prostate tumors. Insulin receptor signaling also plays an important role in prostate tumorigenesis. These two signaling pathways seem to be independent: analysis of the relationship between the genes from those pathways using all possible types of interactions identified no cross talk between them (data not shown). The pathways paired in predicting for distant metastasis were the xenobiotic metabolism and integrin-linked kinase (ILK) signaling pathways. Some experimental evidence exists that both of those pathways may also be associated with prostate cancer.29, 30

In summary, the TSP approach can be used to identify patients whose prostate cancer will progress after they undergo radical prostatectomy. Two gene pairs can predict which men would experience progression to the metastatic form of the disease. However, because our analysis was based on a relatively small number of genes, a larger study will be needed to identify the best predictors of disease outcome overall.

Supplementary Material

List of the genes from the pathways discriminating primary and metastatic tumors
Table S1. Top scoring pairs for classification of primary prostate tumors and distant metastases (DM).Table S1. Top scoring pairs for classification of primary prostate tumors and distant metastases (DM).

Acknowledgements

This study was supported by the David Koch Center for Applied Research in Genitourinary Cancer and by National Cancer Institute grant CA16672.

Footnotes

Conflict of interest: The authors declare no conflicts of interest.

References

  • 1.Burdick MJ, Reddy CA, Ulchaker J, Angermeier K, Altman A, Chehade N, et al. Comparison of biochemical relapse-free survival between primary Gleason score 3 and primary Gleason score 4 for biopsy Gleason score 7 prostate cancer. Int J Radiat Oncol Biol Phys. 2009;73:1439–1445. doi: 10.1016/j.ijrobp.2008.07.033. [DOI] [PubMed] [Google Scholar]
  • 2.Freedland SJ, Eastham J, Shore N. Androgen deprivation therapy and estrogen deficiency induced adverse effects in the treatment of prostate cancer. Prostate Cancer and Prostatic Diseases. 2009;12:333–338. doi: 10.1038/pcan.2009.35. [DOI] [PubMed] [Google Scholar]
  • 3.Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, et al. Delineation of prognostic biomarkers in prostate cancer. Nature. 2001;412:822–826. doi: 10.1038/35090585. [DOI] [PubMed] [Google Scholar]
  • 4.Dhanasekaran SM, Dash A, Yu J, Maine IP, Laxman B, Tomlins SA, et al. Molecular profiling of human prostate tissues: insights into gene expression patterns of prostate development during puberty. FASEB J. 2005;19:243–245. doi: 10.1096/fj.04-2415fje. [DOI] [PubMed] [Google Scholar]
  • 5.Nakagawa T, Kollmeyer TM, Morlan BW, Anderson SK, Bergstralh EJ, Davis BJ, et al. A tissue biomarker panel predicting systemic progression after PSA recurrence post-definitive prostate cancer therapy. PLoS One. 2008;3:e2318. doi: 10.1371/journal.pone.0002318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1:203–209. doi: 10.1016/s1535-6108(02)00030-2. [DOI] [PubMed] [Google Scholar]
  • 7.Chandran UR, Ma C, Dhir R, Bisceglia M, Lyons-Weiler M, Liang W, et al. Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer. 2007;7:64. doi: 10.1186/1471-2407-7-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, et al. Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res. 2002;62:4499–4506. [PubMed] [Google Scholar]
  • 9.Luo JH, Yu YP, Cieply K, Lin F, Deflavia P, Dhir R, et al. Gene expression analysis of prostate cancers. Mol Carcinog. 2002;33:25–35. doi: 10.1002/mc.10018. [DOI] [PubMed] [Google Scholar]
  • 10.Tamura K, Furihata M, Tsunoda T, Ashida S, Takata R, Obara W, et al. Molecular features of hormone-refractory prostate cancer cells by genome-wide gene expression profiles. Cancer Res. 2007;67:5117–5125. doi: 10.1158/0008-5472.CAN-06-4040. [DOI] [PubMed] [Google Scholar]
  • 11.Yu YP, Landsittel D, Jing L, Nelson J, Ren B, Liu L, et al. Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol. 2004;22:2790–2799. doi: 10.1200/JCO.2004.05.158. [DOI] [PubMed] [Google Scholar]
  • 12.Ummanni R, Teller S, Junker H, Zimmermann U, Venz S, Scharf C, et al. Altered expression of tumor protein D52 regulates apoptosis and migration of prostate cancer cells. FEBS J. 2008;275:5703–5713. doi: 10.1111/j.1742-4658.2008.06697.x. [DOI] [PubMed] [Google Scholar]
  • 13.Ding GF, Xu YF, Yang ZS, Ding YL, Fang HF, Zhao HP. Coexpression of the mutated BRCA1 mRNA and p53 mRNA and its association in Chinese prostate cancer. Urol Oncol. 2009 doi: 10.1016/j.urolonc.2009.01.002. [DOI] [PubMed] [Google Scholar]
  • 14.Schayek H, Haugk K, Sun S, True LD, Plymate SR, Werner H. Tumor suppressor BRCA1 is expressed in prostate cancer and controls insulin-like growth factor I receptor (IGF-IR) gene transcription in an androgen receptor-dependent manner. Clin Cancer Res. 2009;15:1558–1565. doi: 10.1158/1078-0432.CCR-08-1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tilki D, Irmak S, Oliveira-Ferrer L, Hauschild J, Miethe K, Atakaya H, et al. CEA-related cell adhesion molecule-1 is involved in angiogenic switch in prostate cancer. Oncogene. 2006;25:4965–4974. doi: 10.1038/sj.onc.1209514. [DOI] [PubMed] [Google Scholar]
  • 16.Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, et al. Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007;39:41–51. doi: 10.1038/ng1935. [DOI] [PubMed] [Google Scholar]
  • 17.Geman D, d'Avignon C, Naiman DQ, Winslow RL. Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol. 2004;3 doi: 10.2202/1544-6115.1071. Article 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics. 2005;21:3896–3904. doi: 10.1093/bioinformatics/bti631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xu L, Tan AC, Naiman DQ, Geman D, Winslow RL. Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics. 2005;21:3905–3911. doi: 10.1093/bioinformatics/bti647. [DOI] [PubMed] [Google Scholar]
  • 20.Ummanni R, Teller S, Junker H, Zimmermann U, Venz S, Scharf C, et al. Altered expression of tumor protein D52 regulates apoptosis and migration of prostate cancer cells. FEBS J. 2008;275:5703–5713. doi: 10.1111/j.1742-4658.2008.06697.x. [DOI] [PubMed] [Google Scholar]
  • 21.Cheng I, Plummer SJ, Jorgenson E, Liu X, Rybicki BA, Casey G, et al. 8q24 and prostate cancer: association with advanced disease and meta-analysis. Eur J Hum Genet. 2008;16:496–505. doi: 10.1038/sj.ejhg.5201959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hooker S, Hernandez W, Chen H, Robbins C, Torres JB, Ahaghotu C, et al. Replication of prostate cancer risk loci on 8q24, 11q13, 17q12, 19q33, and Xp11 in African Americans. Prostate. 2009 doi: 10.1002/pros.21061. [DOI] [PubMed] [Google Scholar]
  • 23.Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. doi: 10.1038/ng2022. [DOI] [PubMed] [Google Scholar]
  • 24.Antognelli C, Mearini L, Talesa VN, Giannantoni A, Mearini E. Association of CYP17, GSTP1, and PON1 polymorphisms with the risk of prostate cancer. Prostate. 2005;63:240–251. doi: 10.1002/pros.20184. [DOI] [PubMed] [Google Scholar]
  • 25.Schayek H, Haugk K, Sun S, True LD, Plymate SR, Werner H. Tumor suppressor BRCA1is expressed in prostate cancer and controls insulin-like growth factor I receptor (IGF-IR) gene transcription in an androgen receptor-dependent manner. Clin Cancer Res. 2009;15:1558–1565. doi: 10.1158/1078-0432.CCR-08-1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Xu W, Liu LZ, Loizidou M, Ahmed M, Charles IG. The role of nitric oxide in cancer. Cell Res. 2002;12:311–320. doi: 10.1038/sj.cr.7290133. [DOI] [PubMed] [Google Scholar]
  • 27.Cronauer MV, Ince Y, Engers R, Rinnab L, Weidemann W, Suschek CV, et al. Nitric oxide-mediated inhibition of androgen receptor activity: possible implications for prostate cancer progression. Oncogene. 2007;26:1875–1884. doi: 10.1038/sj.onc.1209984. [DOI] [PubMed] [Google Scholar]
  • 28.Kroncke KD, Suschek CV, Kolb-Bachofen V. Implications of inducible nitric oxide synthase expression and enzyme activity. Antioxid Redox Signal. 2000;2:585–605. doi: 10.1089/15230860050192341. [DOI] [PubMed] [Google Scholar]
  • 29.Moon YJ, Wang X, Morris ME. Dietary flavonoids: effects on xenobiotic and carcinogen metabolism. Toxicol In Vitro. 2006;20:187–210. doi: 10.1016/j.tiv.2005.06.048. [DOI] [PubMed] [Google Scholar]
  • 30.Okamura M, Yamaji S, Nagashima Y, Nishikawa M, Yoshimoto N, Kido Y, et al. Prognostic value of integrin β1-ILK-pAkt signaling pathway in non–small cell lung cancer. Hum Pathol. 2007;38:1081–1091. doi: 10.1016/j.humpath.2007.01.003. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

List of the genes from the pathways discriminating primary and metastatic tumors
Table S1. Top scoring pairs for classification of primary prostate tumors and distant metastases (DM).Table S1. Top scoring pairs for classification of primary prostate tumors and distant metastases (DM).

RESOURCES