Abstract
Introduction
Most analyses of high throughput cancer data represent tumors by “atomistic” single‐gene properties. Pathifier, a recently introduced method, characterizes a tumor in terms of “coarse grained” pathway‐based variables.
Methods
We applied Pathifier to study a very large dataset of 2000 breast cancer samples and 144 normal tissues. Pathifier uses known gene assignments to pathways and biological processes to calculate for each pathway and tumor a Pathway Deregulation Score (PDS). Individual samples are represented in terms of their PDSs calculated for several hundred pathways, and the samples of the data set are analyzed and stratified on the basis of their profiles over these “coarse grained”, biologically meaningful variables.
Results
We identified nine tumor subtypes; a new subclass (comprising about 7% of the samples) exhibits high deregulation in 38 PKA pathways, induced by overexpression of the gene PRKACB. Another interesting finding is that basal tumors break into two subclasses, with low and high deregulation of a cluster of immune system pathways. High deregulation corresponds to higher concentrations of Tumor Infiltrating Lymphocytes, and the patients of this basal subtype have better prognosis. The analysis used 1000 “discovery set” tumors; our results were highly reproducible on 1000 independent “validation” samples.
Conclusions
The coarse‐grained variables that represent pathway deregulation provide a basis for relevant, novel and robust findings for breast cancer. Our analysis indicates that in breast cancer reliable prognostic signatures are most likely to be obtained by treating separately different subgroups of the patients.
Keywords: Breast cancer, Pathway-based analysis, PKA pathways
Highlights
We characterized breast tumors by their pathway deregulation scores (PDS), derived from expression data.
We discovered a new subtype of luminal tumors, comprising 7% of breast cancers.
These tumors exhibit high deregulation of pathways associated with Protein Kinase A.
Basal tumors of better outcome exhibit high deregulation of immune pathways.
Using pathway‐based representation yield very robust results.
Abbreviations
- PKA
Protein Kinase A
- PDS
Pathway Deregulation Score
- ER
Estrogen Receptor
- HER2
Epidermal growth factor receptor 2
- METABRIC
Molecular Taxonomy of Breast Cancer International Consortium
- TIL
Tumor Infiltrating Lymphocyte
- KEGG
Kyoto Encyclopedia of Genes and Genomes
- PAM
Prediction Analysis of Microarrays
1. Introduction
Breast cancer is one of the most common malignancies, with about one in nine women contracting it during their lives (DeSantis et al., 2011). It is a highly heterogeneous disease (Polyak, 2011), in terms of pathological and clinical parameters of the patients (age, tumor size, node status, and histological grade); heterogeneity is reflected also in immunohistochemically measured biomarkers (Estrogen Receptor (ER), Progesterone Receptor (PR), Epidermal growth factor receptor 2 (HER2)), in the tumors' molecular characteristics, patient response to therapy and clinical outcome. Considerable effort has been invested over several years to stratify breast cancer into clinically distinct groups on the basis of the tumors' molecular signatures (TCGA, 2012). For example, understanding the underlying molecular aberrations is essential for the design of personalized drugs, and clinically meaningful stratification is expected to serve as the basis for prediction of outcome and the choice of therapeutic strategy.
Some of the earliest seminal studies (Perou et al., 2009, 2000, 2001) identified several subtypes of breast tumors, based on gene expression patterns derived from microarray analysis of several hundred “intrinsic genes” (Sorlie et al., 2001). While there is consensus regarding robustness of the Luminal A and Basal intrinsic subtypes, reproducibility of the expression signatures of the others, e.g. Luminal B, HER2‐enriched and Normal‐like (Perou et al., 2000; Sorlie et al., 2001) has been questioned (Alexe et al., 2006).
An outcome predictor method based on these intrinsic subtypes – PAM50 (Parker et al., 2009) has recently gained FDA approval; nevertheless, it is clear that this stratification does not capture the full heterogeneity and complexity of the disease. In fact, a lot of effort has been invested in designing various molecular outcome predictors and in producing biologically and clinically relevant sub‐classifications or new groupings of breast cancer (Bilal et al., 2013; Geiger et al., 2012; Sotiriou and Pusztai, 2009; van der Vegt et al., 2009). A recent study by Curtis et al. (Curtis et al., 2012) generated a very large dataset (METABRIC, Molecular Taxonomy of Breast Cancer International Consortium), of nearly 2000 tumors and 144 normal breast samples, for all of which expression and copy number were measured. Curtis et al. integrated these two types of data and using unsupervised clustering divided breast cancer into 10 new subtypes, to which they refer as iClusters. These subtypes are the representation of recurrent selection of specific somatic genomic aberrations, which in turn cause the over‐ or under‐expression of driver oncogenes and tumor suppressors, respectively, to which different tumors are addicted.
The studies mentioned above used a priori existing biological information only in a very limited manner, if at all. The prognostic gene lists of most predictors were assembled using machine learning, either utilizing no biological knowledge at all (van 't Veer et al., 2002) or minimal information, such as treating ER‐/+ tumors separately (Wang et al., 2005) or focusing on “intrinsic genes” for classification (Parker et al., 2009; Sorlie et al., 2001). A recently introduced method, Pathifier (Drier et al., 2013), advocates taking a different direction, which does make extensive use of available biological knowledge (Ideker et al., 2011). Pathifier uses the known association of each relevant biological pathway or process with a corresponding list of genes that were shown to play a role in it. First, for each individual tumor k and pathway P a Pathway Deregulation Score (PDS), denoted by D(P,k), is derived (see Supplementary File 1, Supplementary Figure 1 for a schematic presentation of the method). Next, any kind of preferred method of analysis (e.g. clustering) is performed on these variables, rather than on the “raw” expression or copy number data. The approach is phenomenological and, unlike the method of Vaske et al., 2010, requires neither knowledge of the inter‐relations between thousands of “biomolecular entities” nor measurement of their status. It was demonstrated (Drier et al., 2013) (for Glioblastoma and colon cancer) that by simple unsupervised analysis of these “coarse grained” biologically meaningful scores, Pathifier finds clinically relevant patient groups and relationships that were not captured by standard methods. Several recent studies adopted pathway‐based approaches for the analysis of cancer expression data, going beyond the simplest enrichment analysis (Huang et al., 2014; Verhaegh and Van de Stolpe, 2014; Verhaegh et al., 2014).
In the present study we apply Pathifier on the METABRIC dataset to stratify breast cancer. Since copy number variations affect the biological state of a cell primarily via the corresponding transcriptome, and transcript levels reflect directly genes' association with the activity of a pathway or biological process, we used only the expression data of Curtis et al. (Curtis et al., 2012). Hence it is not surprising that the classes that we found do not overlap with the iClusters.
We find nine tumor types with distinct PDS profiles over 7 clusters of pathways. Our stratification reproduces known partitions (clearly – into ER+/‐, and to some limited extent – into the intrinsic subtypes), but reveals also previously unreported subclasses. One of these is a group of Luminal tumors (mainly of type A) with high deregulation scores of a cluster of pathways associated with Protein Kinase A (PKA) activity (Francis et al., 2011; Johnson et al., 2001). Deregulation of the PKA pathways possibly plays a role in the malignant process (Beuschlein et al., 2014; Forlino et al., 2014); in that case this finding may have therapeutic implications.
Another interesting finding is a clear separation of basal (ER‐/HER2‐) tumors into two groups, which exhibit either high or low levels of deregulation of immune system related pathways. We first identify the biological basis of this partition in terms of the presence (absence) of Tumor Infiltrating Lymphocytes (TILs), and show that the two basal subgroups have different outcome. We have reasons to attribute this to different responses to therapy exhibited by the two groups. Association of outcome and response to therapy with the level of TILs has been noted previously for several other breast cancer subtypes: for HER2+ (Alexe et al., 2007), HER2+/ER‐ (Rody et al., 2009), and node + subjects (Loi et al., 2013). Additional papers that address association of TILs with response to therapy were discussed by S. Ganesan in the 2014 ASCO50 meeting. Our finding is in agreement also with (Calabrò et al., 2009; Teschendorff et al., 2007), and a most extensive study which established association of T‐cell infiltration with survival in ER‐breast cancer patients (Ali et al., 2014).
Our results were found on the basis of analyzing expression data from the METABRIC “discovery set” and reproduced by analyzing the “validation set” of 144 normal and 997 tumor samples. Since the high values of the PDS of the PKA pathways depended on high expression values of a single gene, PRKACB (Furuta et al., 2012), we validated the microarray results for this gene by direct measurements (qRT‐PCR), and also investigated the conditions under which a single gene governs a pathway's deregulation. As a final and most stringent test of robustness we also checked the extent to which our main findings were recapitulated in a similar analysis of the TCGA breast cancer data (TCGA, 2012), obtained by a different method (RNAseq), by different labs on different patient cohorts.
The analyses described above used prior knowledge only in the assignment of genes to pathways – the PDS were derived for sample sets without using any clinical information about them. The ensuing PDS‐based analyses were completely unsupervised, with the aim of demonstrating the method's ability to capture previously known aspects as well as novel ones. Since standard clinical care and therapy relies on stratification of tumors into three molecular classes (HER2+, HER2‐/ER+ and HER2‐/ER‐), in a second phase of the analysis we took a further step of using prior knowledge, and analyzed these three main clinical subtypes completely separately.
2. Materials and methods
2.1. Data
We used expression data from the METABRIC dataset (Curtis et al., 2012), for about 2000 primary fresh frozen breast cancer samples, divided into a discovery set of 997 tumors, and a validation set (995 samples). Data for 144 normal tissue samples were added to both sets.
Most ER+ and/or node‐negative patients didn't receive chemotherapy, ER‐negative and node + patients did (adjuvant, mostly anthracyclins). HER2+ patients did not receive trastuzumab.
All the samples underwent transcriptional profiling on the Illumina HT‐12 v3 platform and data were normalized as described (Curtis et al., 2012).
2.2. Pathway gene sets
Gene sets were downloaded from three pathways databases: KEGG (Kanehisa et al., 2010), BioCarta (Nishimura, 2001) and the NCI‐Nature curated Pathway Interaction Database (Schaefer et al., 2009).
2.3. Pathway deregulation scores
Pathifier (Drier et al., 2013) calculates for any given pathway a deregulation score (PDS) for each cancer sample, based on gene expression data. The score represents the extent to which the activity of the pathway differs in a particular tumor from the activity in normal cells of the same tissue.
A pathway P is represented by a list of d p genes that “belong to it” (Kanehisa and Goto, 2000; Nishimura, 2001; Schaefer et al., 2009), and for which expression data is available in all the samples. The method (summarized in Supplementary File 1, Supplementary Figure 1) first calculates a score, D(P,i), that measures the deviation of the behavior of pathway P in sample i from some reference expression profile. To calculate that score, the expression levels of d p genes that belong to pathway P are used. Each sample is represented as a point in this d p‐dimensional space; the samples of the dataset form a “cloud” of points. The “principal curve” (Hastie and Stuetzle, 1989) that passes through that cloud is calculated and all samples are projected onto the curve. One of the extremal projections onto the curve is used as a reference point O (usually the normal samples' projections are at one end and the extremal normal projection is used – see (Drier et al., 2013) for full details). The PDS of tumor i is defined as the distance, measured along the curve, from O to the projection of sample i.
2.4. Clustering and reordering pathways and samples
For each pathway, the PDS were normalized and transformed to z‐scores (setting the mean to 0 and the variance to 1). The pathways were clustered using average linkage, on the basis of their deregulation profiles over all the samples, with the “distance” between pathways P and P′ being 1‐ C(P,P′), where C is the Pearson correlation between the two PDS profiles.. Samples were represented as points in the space of their PDSs and were reordered in several iterations, using SPIN (Tsafrir et al., 2005) (neighborhood sorting), with the Euclidean distance measure between samples. First, all samples were sorted together, and general patterns emerged; next, groups of samples with similar PDS profiles were sorted within themselves for fine tuning. The resulting ordering was manually divided into 10 clusters (one of normal samples, and 9 tumor clusters); using the “rule” that every pair of sample clusters differs significantly in the PDSs of at least one pathway cluster. Finally, a centroid was calculated for each sample cluster, and samples were iteratively reassigned to the cluster which had the closest centroid.
2.5. Analysis of the validation set
Two different methods were used to compare the results of the validation and discovery sets. The projection method simulates using the discovery set in order to classify new samples that arrive (at a clinic, say) one by one. PDSs are assigned to each new sample (from the Validation set) on the basis of its projection onto the principal curve obtained from the discovery set. The second method, recalculation, performs the analysis on the validation set samples de novo, recalculating for each pathway the principal curves and obtaining each sample's PDS. For more details see Supplementary File 2.
2.6. Real‐time qPCR
Expression data for PRKACB measured by microarray were validated using rt‐qPCR. Total RNA from 22 breast tumors and 9 unmatched adjacent normal breast tissue was obtained and quality‐controlled as part of the METABRIC study (Curtis et al., 2012). In brief, 225 ng samples of all RNAs were reverse transcribed using Superscript III Reverse Transcriptase (Life Sciences) primed with 255 ng clamped oligo dT (T15VN) and 270 ng random hexamers (N6) in the presence of RNase Inhibitor (ABI) at 55 °C for 1 h. Following RNase H digest, samples were diluted 1:20, and triplicate 5 μl aliquots were subjected to real‐time PCR analysis using 500 nM of the indicated intron‐spanning exonic primer pairs (listed in Supplementary File 2) and FAST SYBR Green Mastermix (Applied Biosystems) following manufacturer's recommendations and cycling program on a 7900HT instrument (Applied Biosystems). Due to shortage of clinical material, the following cell line representative controls (ZR‐75‐30, OSE‐385 and HCC‐1954) were included alongside to comply with miQE guidelines (Bustin et al., 2009); 450, 225 and 112 ng of cell line RNA to demonstrate a dynamic response of the reverse transcription; a serial 5‐fold dilution curve of a single reverse transcription reaction for accurate determination of relative quantities where 1u Ct did not correspond to a 2‐fold increase in input; a reverse‐transcriptase‐free reaction to account for RNA‐independent non‐specific amplification; a template‐free reverse‐transcription and template‐free PCR to eliminate reagent contamination and dominant primer dimers; end products were analyzed by an automated thermal dissociation curve to assure a single amplified product. Calculated relative expression of genes of interest was normalised to eEF1A1, the most stably expressed microarray transcript across the entire METABRIC cohort. Correct assignment of clinical data was ascertained by cross‐checking ESR1, PGR and ERBB2 real‐time PCR data against patients' ER, PR and Her2 status, respectively.
2.7. Separate analyses for the three subtypes ER+/HER2‐; HER2+ and ER‐/HER2‐
ER and HER2 status is usually determined on the basis of Immuno‐Histo‐Chemistry (IHC). However in the METABRIC dataset IHC status for HER2 is given for less than 50% of the tumors. To assign patients to the three clinical classes, we used the “three gene‐based method” of Haibe‐Kains et al., 2012. The principal curves and all PDS values were re‐calculated for each pathway and sample, using only the data of normal tissues and the samples of the same subtype, from both discovery and validation sets (to have a reasonable number of points, for all three subtypes, to determine the principal curves). Comparisons were made between the PDSs of the discovery set samples, as derived in the original and new ways; to this end the new PDS values of each pathway were z‐score normalized for every clinical subclass.
3. Results
3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7 present results obtained by analysis of all samples together. Subsection 3.8 uses prior knowledge to split the tumors into the three clinically relevant subtypes (see Methods 2.7) and performs the entire analysis separately for each subtype.
3.1. Calculating the pathway deregulation scores (PDS) – discovery set
Breast cancer mRNA expression data were downloaded from The European Genome‐phenome Archive as three datasets: discovery, validation, normals. 48,803 Illumina probes were available for each sample in every dataset. 36,157 of the probes were mapped to known gene symbols and used further in the analysis.
The 997 tumor samples of the discovery set and 144 normal samples were merged and analyzed together, using Pathifier, to calculate the PDS for each sample‐pathway combination. Only the 5000 probes with the largest variation over these 1141 samples were used as input to the algorithm; these probes represent 4275 different genes. Gene sets (pathways) from KEGG (Kanehisa et al., 2010), BioCarta (Nishimura, 2001) and NCI (Schaefer et al., 2009) that contained at least 3 of these 5000 probes were included in the analysis, to make a total of 552 pathways (188 from BioCarta, 176 from KEGG and 188 from NCI). We then calculated the PDSs, D(P,k), for each of the P = 1,2,… 552 pathways, and the k=1,2,… 1141 samples. The resulting 552x1141 matrix of PDSs served as the basis for the analysis of the discovery set.
As the first step (Drier et al., 2013) we derived for each pathway a principal curve (Hastie and Stuetzle, 1989) that passes through the cloud formed by the 1141 data points in the corresponding space of expression. For each pathway the points that represent the samples were projected onto the curve. We present these projections for three pathways in Figure 1A–D. Note that a linear approximation (e.g. the first principal component) to the curve shown in Figure 1A, obtained for the P53 pathway in the Biocarta database, would have given a very different set of projections and PDSs for many of the samples, demonstrating the importance of using (non‐linear) principal curves. Second, note that the projected points were colored according to the PAM50 classification (Parker et al., 2009) of the samples (remember that the classes were not used to place the points on the line); evidently their relative positions on the curve reflect increased deregulation of the P53 pathway along the different subtypes. Normal samples are at one end of the curve, followed by the normal‐like tumors, LumA, LumB, HER2+ and finally the Basals. This ordering is consistent with our knowledge (Sorlie et al., 2001) that the HER2+ and Basal subtypes have a larger percentage of TP53‐mutated tumors than the luminal subtypes; these percentages are seen also in the METABRIC discovery set (Supplementary File 1, Supplementary Table 1). Thus the principal curve and the PDSs derived from it indeed capture meaningful biological features. Interestingly, the expression values of TP53 itself were not included in the analysis of this pathway. Similar ordering is seen for the KEGG gene set “Pathways in Cancer” (Figure 1B).
Figure 1.

Principal curves of selected pathways in the METABRIC dataset. The principal curve going through the cloud of points (in black) which are representing samples in the METABRIC discovery + normals set (A–C) or the validation + normals set (D). The data points and the principal curve are projected onto the three leading principal components. The samples (colored according to their PAM50 classification) are projected onto the curve. A. Principal curve of the BioCarta P53 pathway in the discovery dataset. B. Principal curve of the KEGG “Pathways in Cancer” gene set in the discovery dataset. C. Principal curve of the Biocarta CBL Pathway in the discovery dataset. D. Principal curve of the BioCarta P53 pathway in the validation dataset. The curve and points are plotted based on the scores derived using the recalculation method.
In about 75% of the analyzed pathways the normal samples cluster together on one side on the curve; the distance (measured along the curve), from the extremal projection on this side to the projection of a particular sample defines its PDS. In some pathways, however, the normal samples project onto the middle of the curve, with deregulated samples “moving away” in two directions from the normals – corresponding to two different types of deviation from normal behavior. One such example is the Biocarta CBL pathway, shown in Figure 1C. The CBL protein induces down‐regulation of the EGF Receptor gene (Soubeyran et al., 2002). Evidently, deregulation of the pathway is drastically different in the Basal (and some of the HER2+) tumors versus the Luminal samples. The same pattern is seen in other EGFR‐related pathways. Interestingly, the CBL gene itself wasn't included in the Pathifier calculation (due to low variability); possibly, post‐translational modifications have a significant effect on expression values of other genes.
These observations are consistent with known facts about deregulation of the EGF pathway in breast cancer; in particular, it is known that EGFR ligands are differently expressed in ER‐positive and ER‐negative tumors (Foley et al., 2010), and that the expression levels of EGFR itself do not differ significantly between normal and cancerous breast tissue. Hence the PDS captures complex effects, beyond simple over‐expression of a gene, as well as the different deregulation of the ER‐positive (LumA/LumB) and ER‐negative (Basal, HER2+) samples.
3.2. Clusters of pathways and of samples
The PDSs, calculated for 552 Pathways and for 1141 samples of the discovery set, served as the basis for cluster analysis. The heatmap, representing the two‐way ordered PDS matrix, is shown in Figure 2. Figure 3A summarizes the clustering results, with each entry representing the average of the PDSs in the “block” that belong to the corresponding pathway and sample clusters. For assignment of pathways and samples to clusters see Supplementary File 3.
Figure 2.

Clustering of Pathway Deregulation Scores matrix for the discovery dataset. Every row in the matrix corresponds to one of 552 pathways; every column to one of 1141 samples (either from the discovery set or a normal tissue sample). Each row is z‐score normalized. The color‐bars in the bottom indicate the PAM50 subtype of the sample (top), the ER marker status as measured by immunohistochemistry (middle), and the cluster it was assigned to (bottom).
Figure 3.

Summary of the clustered PDS for the METABRIC datasets: A. Discovery dataset B. Validation dataset. Each row corresponds to a pathway cluster and each column – to a sample cluster, displaying the mean deregulation value for the samples and pathways that belong each pair of clusters.
3.3. Pathway clusters
Seven pathway clusters A – G, see below, are shown in Figure 2 and summarized in Table 1. Six were selected by identifying clusters of at least 20 pathways with average linkage distance smaller than 0.4, and one smaller group (Cluster B) was also included, as it clearly exhibited different deregulation behavior in ER‐negative samples. Some of the clusters are characterized by certain genes appearing in all the pathways of the cluster, others – by a common biological function, as presented in Table 1.
Table 1.
The pathway clusters in the discovery dataset.
| Size | Characterization | |
|---|---|---|
| A | 38 | Pathways contain PRKACB (subunit of Protein Kinase A) |
| B | 9 | EGFR‐related pathways |
| C | 23 | Pathways contain PAK1 |
| D | 35 | Pathways related to signaling and immune system |
| E | 35 | Signaling pathways |
| F | 83 | Cancer related, cell cycle, DNA repair/replication |
| G | 76 | Immune system pathways |
Each of the pathway clusters A – G was analyzed in terms of its composition:
PKA pathway cluster: These pathways are highly deregulated in the samples of sample cluster 3 (C3). A single gene, PRKACB, appears in all the pathways of this cluster; it encodes one of the catalytic subunits (Furuta et al., 2012) of the cAMP‐dependent Protein Kinase (PKA) (Francis et al., 2011; Johnson et al., 2001). PRKACB is highly expressed in the C3 samples, leading to the deregulation pattern seen here. The special role of PRKACB in generating the high PDS values that characterize this cluster is discussed below.
EGFR pathway cluster: The pathways in this cluster show very low deregulation scores over the ER‐negative sample clusters C6, C7, C8. They all contain the EGFR gene, and the majority of the pathways are directly related to EGFR activity (e.g. BioCarta's CBL Pathway; NCI's 'EGFR‐dependent Endothelin signaling events' pathway). As demonstrated above (see, e.g., Figure 1C), these pathways have two distinct directions of deregulation with respect to the normal samples, separating ER‐positive from ER‐negative tumors. Note that only one of the pathways in this cluster (BioCarta's HER2 Pathway) actually contains the gene for ER (ESR1) – the other pathways capture the difference between the ER+ and ER‐tumors on the basis of other genes' expression.
PAK1 pathway cluster: These pathways differentiate between the LumA and LumB tumors that belong to sample cluster C2 (Denoted as “2” in the bottom colorbar in Figure 2). This cluster comprises various signaling pathways, all of which contain the PAK1 gene – a Serine/threonine‐protein kinase that regulates cell motility and morphology. Most of the pathways also contain the RAC1 gene (encodes a G‐protein whose major effector is PAK1).
Immune/Signaling cluster: The deregulation of this cluster of pathways separates the LumA tumors into sample clusters C1 and C2. About two thirds of this cluster are signaling pathways, half of which are involved in the immune system, and many of the others are active in growth factor signaling. 28 out of the 35 pathways include the gene for c‐Jun, a transcription factor involved in cAMP signaling and a known oncogene. 23 pathways include the FOS gene, whose product binds to c‐Jun to create a transcription complex involved in processes such as cell proliferation, differentiation, and transformation. Several of the pathways are EGF/ERBB‐related.
Signaling cluster: These pathways separate sample clusters C1 to C5 (LumA/LumB) from clusters C6 to C9 (ER‐negative + mixed C9), and also distinguish between the LumB samples from clusters C4 and C5. They don't have a single unifying characteristic; about a half of the pathways in the cluster are involved in signaling, with other common functions being cell metabolism, cardiac function pathways and cell adhesion. The pathways don't have common genes, either – a third of them shares the MAPK1 and MAPK3 genes, and several subunits of the PIK3 enzymes.
Cancer related cluster: The average deregulation scores for these 83 pathways (Figure 3A) reflect the severity and gradation of cancer, with the better outcome subtype (LumA) showing little deregulation, and with the PDS getting higher towards the worst outcome subtypes (HER2+, Basal), setting the latter apart. Many of the pathways in this cluster are either directly related to cancer (KEGG's “Pathways in Cancer”; BioCarta's P53 Pathway), or involve processes which tend to become deregulated with cancer progression–cell cycle, DNA replication, DNA repair. CDK2 ‐ a gene encoding a subunit of the cyclin‐dependent kinase complex, which is essential for the G1/S transition in the cell cycle ‐ is common to 23 of the pathways (along with other CDK genes); TP53 appears in 21 of them.
Immune system cluster: These pathways are highly deregulated in sample clusters C8 and C9; their PDS distinguishes Basals of C8 from those in C7. They also differentiate between the LumB in C4 (higher deregulation) and in C5. About 75% of these pathways are related to the production and activation of the immune system cells and pathways. This difference in immune response is clinically relevant, particularly for patients with basal tumors, as discussed below.
3.4. Sample clusters
The samples were ordered iteratively, as described in Methods. Figure 2 presents the final clustering, with the different sample clusters identified in the third colorbar below the heatmap.
Relating PDS sample clusters to PAM50 subtypes. The PAM50 intrinsic subtypes are represented in the top colorbar below the heatmap, Figure 2; the composition of our PDS based clusters in terms of PAM50 subtypes are shown in Table 2 and in Supplementary File 1, Supplementary Figure 2A,B.
Table 2.
Overrepresentation of PAM50 subtypes, METABRIC iClusters and clinical subtypes in PDS clusters of the discovery dataset.
| Cluster | Size | Enriched PAM50 subtypes | Enriched iClusters | Dominant clinical subtypes |
|---|---|---|---|---|
| Normal samples | 144 | Normal samples | ||
| 1 | 185 | LumA, Normal‐like tumors | 3, 4 | ER+/HER2‐ |
| 2 | 301 | LumA, LumB | 2, 7, 8 | ER+/HER2‐ |
| 3 | 70 | LumA | 3, 8 | ER+/HER2‐ |
| 4 | 99 | LumB | 1, 9 | ER+/HER2‐ |
| 5 | 97 | LumB | 1, 5, 6, 9 | ER+/HER2‐ |
| 6 | 59 | HER2+ | 5 | ER‐/HER2+ |
| 7 | 62 | Basal | 10 | ER‐/HER2‐ |
| 8 | 65 | Basal, HER2+ | 10 | ER‐/HER2‐ |
| 9 | 59 | Normal‐like tumors | 4 | ER+/HER2‐ |
Using the hyper‐geometric test, each PDS cluster was checked to see which subtypes are over‐represented in it. Using the Benjamini and Hochberg False Discovery Rate (FDR) method (Benjamini and Hochberg, 1995) to correct for multiple hypotheses testing, a corrected q‐value was calculated for each combination of sample cluster and subtype. The subtypes that were over‐represented (FDR q‐value < 0.05) are summarized in Table 2.
These findings allow us to draw some conclusions about the PDS representation of breast cancer tumors. First, there is a clear separation between the normal samples, the ER + luminal tumor subtypes (clusters C1–C5, C9) and the ER‐subtypes (HER2+/Basal, C6–C8). Next, we see a sub‐stratification of the intrinsic molecular subtypes. The Luminal‐A group is mostly divided between three of the PDS clusters (C1–C3); Luminal‐B samples are split between C4 and C5, with some of LumB tumors exhibiting LumA‐like pathway deregulation characteristics, thus belonging to C2; the majority of HER2+ tumors belong to a cluster of their own, C6; the Basal tumors are separated into C7 and C8. C9 is a mixture of different intrinsic subtypes.
Relating PDS sample clusters to iClusters. A similar analysis was performed on the composition of each PDS cluster of samples in terms of the integrated clusters defined by Curtis et al., 2012. Supplementary File 1, Supplementary Figure 3A,B present these results, and the overrepresented iClusters for each PDS cluster are listed in Table 2.
There is no clear concordance between the PDS clusters and the iClusters. This result is expected, since the two methods use different type of data (gene expression only versus a combination of gene expression and copy number). Both methods identify different subgroups of the accepted PAM50 subtypes, which generates certain similarities in the compositions of some PDS clusters and iClusters. For instance, iCluster 10 is overrepresented in PDS clusters C7 and C8, since those are all subgroups of the Basal subtype.
PDS clusters and outcome. The curves for overall survival, shown in Figure 4A, mostly recapitulate the known differences between the intrinsic types, with patient clusters containing LumA tumors having the most favorable prognosis, and HER2+ and Basal being the subtypes with the worst outcome. A striking feature, however, is the difference in survival between the two subgroups of Basal samples – C7 and C8 (log rank P value = 0.013). C8, which shows high deregulation over a large number of pathways of the immune system (2, 3A), has much better survival than C7, which doesn't show the same deregulation.
Figure 4.

Kaplan–Meier plots of overall survival across the PDS clusters, A. of the discovery dataset. Each PDS cluster is plotted separately. The two clusters of Basal tumors (7 and 8, bold) have significantly different survival (log rank p‐value = 0.013). B. for two extreme groups of validation patients with Basal tumors. Each group contains 70 patients. The groups are significantly different in terms of survival (log rank p‐value = 0.0565).
3.5. Analysis of the validation set
Expression data from 995 different tumor samples [Curtis] were combined with the same 144 normal breast samples that were used before, to form the validation set of 1139 samples. This dataset was analyzed in order to check the reproducibility and robustness of the results described above. Two different ways of analysis were used, as described in Methods and the Supplementary File 2.
The projection method calculates the PDSs of the tumor samples of the validation set, one by one, using for each pathway the principal curve that was derived for the discovery set. To make comparison more direct, the sample clusters, pathways, pathway clusters and their ordering were preserved from the discovery set figures. The similarity of the ordered PDS matrices of the discovery and validation sets, is striking (compare Figure 2 with Supplementary File 1, Supplementary Figure 4, and Figure 3A with 3B). The resulting cluster sizes (Supplementary File 1, Supplementary Table 2) are similar between the two datasets – no statistical significance was found when comparing the cluster sizes with a χ 2‐test. Thus, the distributions of the sample sets (discovery and validation) in the space of all pathway deregulation scores are similar. The compositions of the validation set sample clusters (in terms of PAM50 subtypes) are presented in Supplementary File 1, Supplementary Figure 5 and Supplementary Table 2. The average PDSs of the different sample‐pathways “blocks” of the validation set are shown in Figure 3B, which is nearly identical to Figure 3A.
The recalculation method used the validation set to filter the genes and pathways; 5000 probes with the largest variation over the validation/normal samples – representing 4269 different genes – were used as an input for the algorithm. 95% of the analyzed genes were included in both discovery and validation analyses. 538 pathways (out of 548 that passed our filters) were common to the discovery and the validation analyses. A new principal curve was obtained for each pathway and the PDSs of all validation samples were derived independently of what was done for the discovery set. The resulting 548x1139 matrix of PDSs was used to re‐derive pathway as well as sample clusters. The ordered PDS matrix is presented in Supplementary File 1, Supplementary Figure 6, and the pathway clusters are compared in Supplementary Table 3. There is excellent agreement between the results obtained from the discovery and validation sets. In particular, note that even the principal curves re‐derived for the validation set are in excellent agreement (of shape, curvature, etc.) with those of the discovery set (e.g. compare Figure 1A and D). Finally, the two different methods of analysis of the validation set are highly consistent, with 97% of the 538 shared pathways exhibiting a statistically significant positive correlation of their PDSs (as derived by the two methods), with 84% having correlation above 0.5 (see Supplementary File 1, Supplementary Figure 7).
3.6. New sample groups
We highlight here two findings that emerged from our pathway‐based analysis, that involve potentially important sub‐stratification of breast cancer patients.
3.6.1. Sample cluster 3: high deregulation of the PKA pathway cluster A
The 38 pathways of pathway cluster A showed high deregulation over all the 70 samples of C3 of the discovery dataset. All but one of these samples were luminal (with 46 of type A and 23 of type B, the exception being Her2). Similar results are seen in the validation PDS data, where out of 68 samples showing this deregulation, 32 are LumA, 29 are LumB and 7 are normal‐like tumors.
All the pathways in cluster A contain the gene PRKACB, which encodes a catalytic subunit of Protein Kinase A (PKA). This gene is highly over‐expressed in the samples of C3 (see Supplementary File 1, Supplementary Figure 8A,B) for both discovery and validation sets. C3 emerged because of the high PDS of the PKA pathways. In order to investigate the effects of PRKACB overexpression on the relevant PDSs, Pathifier was run on the pathways of cluster A twice – including and excluding PRKACB as belonging to each of the PKA pathways. Full details and results are presented in Supplementary Files 1 and 2. For the samples of C3, excluding PRKACB from the analysis had a dramatic effect on the PDSs of the PKA pathways; in fact, the samples of C3 no longer formed a clear cluster, whereas the PDSs of samples of other clusters were less affected and their relative ordering was preserved (see Supplementary File 1, Supplementary Figures 9–12). The conditions under which a single gene's expression dominates the deregulation scores of a pathway are the subject of current investigation; preliminary conclusions are presented in the Supplementary File 2.
Since identification of C3 was completely governed by the high expression of a single gene, we tested experimentally and validated the high expression levels of PRKACB in these samples.
RNA from 22 tumors and 9 unmatched adjacent normal breast tissues was obtained as part of the METABRIC study, including 5 tumor samples which belonged to sample cluster 3 (four from the discovery dataset, and one from the validation dataset). Using real‐time qPCR analysis (see Methods and Supplementary File 2), the relative expression levels of relevant genes was measured and calculated. The genes included were: PRKACB (two different probes), ESR1, PGR and ERBB2 (these genes code for ER, PR and HER2, respectively) and eEF1A1 (used as baseline for normalization). The PCR results confirmed that PRKACB is indeed expressed at higher levels in the samples of C3, with good agreement between the METABRIC microarrays results with those of rt‐qPCR, as presented in Figure 5.
Figure 5.

Gene expression of PRKACB as measured using microarrays and PCR. Twenty‐two tumor samples had gene expression measurements of both Illumina microarrays and RT‐qPCR (log base 2 of intensity of both measurements). Two different probes (F1R1, F1R2) were used in the PCR analysis; both are plotted against the microarray‐based data. Two points that represent the same sample have the same color. The top/rightmost 5 pairs of points of the plot (marked in red) are the 5 samples belonging to PDS Cluster 3.
The biology of PKA is described briefly in the Discussion section below.
3.6.2. Immune response induced stratification of tumors
The 76 pathways of cluster G showed high deregulation in the samples of C8 (Basals) and C9 (Mixed subtypes), as well as higher deregulation in C4 versus C5 (both predominantly Luminal B).
Most pathways of cluster G are directly related to the activity of the immune system. We hypothesized that the significantly better survival of patients of C8 than of C7 (both Basal ‐ see Figure 4A) was caused by increased activity of the immune system in the samples of C8. This is consistent with previous results showing that among ER‐negative tumors, those that over‐express immune response genes have better prognosis (Teschendorff et al., 2007). This over‐expression is most probably due to high levels of tumor‐infiltrating lymphocytes (TILs), which are associated with longer survival in ER‐negative patients (Calabrò et al., 2009); a most recent very extensive study demonstrated association between T‐cell infiltration and survival in ER‐breast cancer patients (Ali et al., 2014). Indeed, Curtis et al. (Curtis et al., 2012) measured the level of lymphocytic infiltration for the samples of iCluster 4, which was identified as a group with a strong immune and inflammation signature. They assigned a level of infiltration (N/A, absent, mild or severe) to each sample from iCluster 4 (see Supplementary File 1, Supplementary Figure 13A) which we used to explore the connection between the infiltration level and the PDS. Evidently the majority of samples with severe infiltration belong to PDS sample clusters C8 or C9 – those with high PDS of the immune cluster pathways. Furthermore, the Spearman correlations between the infiltration levels (represented as categories, where available) and each pathway's deregulation scores were calculated for all the pathways in the discovery analysis. Out of 552 pathways, for 293 a significantly non‐zero correlation with the TIL levels (FDR corrected q‐value < 0.05) was found. As seen in Supplementary File 1, Supplementary Figure 13B, all the pathways of the immune cluster have significantly high correlations; note that the pathways with highest TIL‐PDS correlations (>0.65) are all T‐cell related. These results (obtained for the discovery set) point to a strong association between lymphocytic infiltration and the high deregulation of immune pathways (seen in the samples of C8 and C9), which may explain the difference in survival between patients of C7 and C8. This difference in TIL levels could potentially be used as a biomarker to identify among patients with basal tumors those with better prognosis. To demonstrate this point we examined the samples of the validation set and sorted, using SPIN (Tsafrir et al., 2005), the 213 basal samples according to the PDSs of the 76 immune cluster pathways. Survival of the top (most deregulated) and bottom (least deregulated) thirds of the samples – 70 samples in each group – were compared. The result, shown in Figure 4B, demonstrates a significant difference between the two groups. Repeating the same analysis on the tumors of the ER‐/HER2‐clinical subtype yielded similar differences (see Supplementary File 4). Taken together with previous results (Alexe et al., 2007) on immune signatures in the ER‐/HER2‐subtype, our findings indicate that the higher TIL levels may be predictive of better response to chemotherapy (see Discussion).
3.7. Confirmation of the main findings in the TCGA dataset
As a first global comparison we used those 3850 unique genes that passed our filters for the Metabric analysis and appeared also in the TCGA RNA‐seq data, of 988 tumors and 106 normal breast samples (TCGA, 2012).The principal curves and PDS were calculated for 445 pathways (all were present among the pathways that passed our filters for METABRIC). The resulting PDS matrix, obtained for all 445 pathways are shown in Supplementary File 1, Supplementary Figure 14A, and in Supplementary Figure 14B we present the PDS matrix for a subset of 226 pathways with high coherence (see Supplementary File 2) between the two datasets. Both are in good qualitative agreement with the METABRIC data. Next, we turned to quantitative confirmation of our main findings.
3.7.1. Overexpression of PRKACB in a subgroup of the tumors
We focused on our finding of pronounced overexpression of the PRKACB gene in about 7% of the tumors. Supplementary File 1, Supplementary Figure 15 presents histograms of expression of this gene in both datasets; a remarkable agreement is found, with both histograms exhibiting a bi‐modal shape, with weight of about 0.07 for the high expression peak.
3.7.2. Deregulation of immune cluster pathways reflects TIL levels of basal tumors
The 150 tumors labeled “basal” by TCGA were ordered, using SPIN (Tsafrir et al., 2005), on the basis of their PDS profiles on those immune cluster pathways that passed our coherence filter (see Supplementary File 2). Since clinical follow up times of the TCGA patients is too short for obtaining significant results survival analysis, we tested the extent to which the PDS of these pathways reflects TIL levels. Comparing the reported lymphocyte infiltration percentages of the highly deregulated 1/3 vs low deregulated 1/3 of basal samples, we get a significant difference – the Wilcoxon rank sum one sided test has a p‐value of 0.015.
3.8. Separate analyses for the three subtypes ER+/HER2‐; HER2+ and ER‐/HER2‐
The entire analysis was repeated separately for the three clinical subtypes. This analysis is described in detail in Supplementary File 4, where all figures and tables mentioned here can be found. The number of samples of each subtype and of pathways that passed the filters described above are presented in Supplementary Table 4. For comparison, the corresponding set of PDS as obtained from the analysis of all samples together were also separated into the same three datasets and normalized; to our surprise, the two types of results were very similar (see Supplementary Figures 16,17), as shown in Supplementary File 4. We also searched for pathways whose (subtype‐dependent) PDSs differentiate good and bad survival when the analysis is done for each subtype separately. The number of differentiating pathways, together with results derived when all samples are used together, are presented in Supplementary Table 5 and in Supplementary File 5.
4. Discussion
We applied Pathifier, a recently introduced method for analysis of expression data, to study a very large dataset of 2000 breast cancer samples and 144 normal tissues. As opposed to unsupervised and machine‐learning based approaches, Pathifier does use prior knowledge in the form of gene assignment to pathways and biological processes. For each pathway P and tumor k a Pathway Deregulation Score is calculated, in the d P dimensional space of the expression levels of the (expressed and varying) genes that belong to P. Since typically d P≈10–20, while the number of samples is about 1000, this calculation does not suffer from the “curse of dimensionality”. Individual samples are then represented in terms of their PDSs calculated for several hundred pathways, and their analysis is carried out in this space, whose dimension is, again, less than the number of data points. The method used to determine the PDS is phenomenological – no underlying “atomistic” information (beyond assigning genes to pathways) must be known or measured. The approach achieves dimensional reduction without a feature selection process that discards most genes and keeps a few; rather, it generates a set of “coarse grained” variables (the PDSs). Each of these involves a few tens of genes, and each has a well‐defined biological meaning.
The pathways were clustered on the basis of their deregulation profiles over the samples of the discovery set, comprising 1000 tumors and the normal samples. Then the samples were stratified into groups. These divisions are to a large extent arbitrary – the final decision on where to draw the line and not divide objects into further, smaller subgroups is taken by eye, and one should not adopt a religious attitude to the precise number of identified groups. We required that any two distinct sample clusters be distinguishable by the deregulation levels of at least one pathway cluster. Our results recover established prior knowledge; ER+ and ER‐tumors have very different deregulation profiles; Luminal tumors cluster together, with a fairly smooth transition from Luminal A‐rich and Luminal B‐rich subtypes. Most Basals cluster together, and most of the HER2+ tumors also have similar deregulation profiles. Altogether, on the basis of the PDSs we ended up with nine tumor subtypes, two of which are new and interesting. The first (C3, 7% of the tumors) is characterized by high deregulation of a cluster of 38 pathways that involve PKA; all of which contain the gene PRKACB, which codes for one of the catalytic subunits of PKA. Deregulation of these pathways is due to significant overexpression of PRKACB in the samples of C3. To substantiate this single‐gene dependent finding, the expression levels of PRKACB were measured directly, using rt‐qPCR, in a group of tumors; the microarray‐based difference in expression levels between members C3 and other samples was validated. A very recent publication showed that overexpression of PRKACB was associated with Carney complex with acromegaly and myxomas (Forlino et al., 2014), and was induced by amplification of the genomic region that contained the gene.
The second finding involves basal tumors; they break into two subclasses, C7 with low and C8 – with high deregulation of a cluster of immune system pathways. High deregulation corresponds to higher expression levels of genes that are expressed by immune cells, particularly T‐cells, and a significant correlation with the concentrations of Tumor Infiltrating Lymphocytes was observed. Basal tumors of C8, with higher involvement of the immune system, have better prognosis. Hence measuring TIL levels in basal tumor samples is a reliable biomarker for better outcome. A previous study (Alexe et al., 2007) also noticed two subgroups of the ER‐/HER2‐tumors, with different immune signatures, but no significant difference in outcome. Notably, those patients did not receive chemotherapy, whereas the majority of the patients in the corresponding METABRIC cohort did (e.g. 82 of the 127 tumors of PDS clusters 7 and 8 of the Discovery set). Therefore the immune or TIL signature is most likely to indicate response to chemotherapy of the ER‐/HER2‐subtype.
We tested robustness of the results first by applying the analysis on the validation set, comprising an independent set of 1000 tumors and the same normal tissue samples that were used above. The results were highly reproducible and robust. Second, we have put our findings to a more stringent test by validating them on the TCGA data. These expression data were derived using RNA‐seq, a completely different method; nevertheless large‐scale qualitative as well as detailed quantitative agreements were observed.
Finally, the entire analysis was repeated separately on tumors of the three clinical subtypes, and the results were very similar to those obtained when the PDS were calculated using all three together, again indicating a high level of robustness of the deregulation scores.
5. Conclusions
Our work indicates that the coarse‐grained variables, that represent pathway deregulation, provide the basis for deriving relevant, novel and robust results for breast cancer. We identified a new subgroup of Luminal tumors, characterized by overexpression of the PRKACB gene, inducing deregulation of the PKA – related pathways. We found that prognostication is difficult and subgroup‐specific; basal tumors break into two subgroups of different outcome, on the basis of deregulation levels of a large set of immune system related pathways, indicative of the level of infiltration by lymphocytes.
Supporting information
The following are the supplementary data related to this article:
The Supplementary Material for this paper, to be found at http://…contains the following files: Supplementary File 1: all Supplementary Figures and their captions. Supplementary File 2: all Supplementary Text mentioned in the manuscript. Supplementary File 3: excel file with list of all samples, their assignment to discovery or validation sets and the assigned iClusters and PDS clusters, as well as a list of all pathways used in the analysis and their various characteristics. Supplementary File 4: Repeating the entire analysis separately for each of the three clinical subtypes. Supplementary File 5: The list of pathways that differentiate good from bad outcome, using both original and new derivations of the PDSs.
Supplementary data
Supplementary data
Supplementary data
Supplementary data
Supplementary data
Acknowledgments
The work at the Weizmann Institute was supported by grants from the Leir Charitable Foundation and by the Nofar program of the Israel Ministry of Industry and Commerce. The work at Cambridge was supported by Cancer Research UK. We thank Dr. Yotam Drier and Professors Shridar Ganesan and Gyan Bhanot for most helpful suggestions.
Supplementary data 1.
1.1.
Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.molonc.2015.04.006.
Livshits Anna, Git Anna, Fuks Garold, Caldas Carlos, Domany Eytan, (2015), Pathway-based personalized analysis of breast cancer expression data, Molecular Oncology, 9, doi: 10.1016/j.molonc.2015.04.006.
Contributor Information
Anna Livshits, Email: annaliv@gmail.com.
Anna Git, Email: Anna.Git@cruk.cam.ac.uk.
Garold Fuks, Email: Garold.fuks@weizmann.ac.il.
Carlos Caldas, Email: Carlos.Caldas@cruk.cam.ac.uk.
Eytan Domany, Email: eytan.domany@weizmann.ac.il.
References
- Alexe, G. , Dalgin, G.S. , Ramaswamy, R. , DeLisi, C. , Bhanot, G. , 2006. Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns. Cancer Inform. 2, 243–274. [PMC free article] [PubMed] [Google Scholar]
- Alexe, G. , Dalgin, G.S. , Scanfeld, D. , Tamayo, P. , Mesirov, J.P. , DeLisi, C. , Harris, L. , Barnard, N. , Martel, M. , Levine, A.J. , Ganesan, S. , Bhanot, G. , 2007. High expression of lymphocyte-associated genes in node-negative HER2+ breast cancers correlates with lower recurrence rates. Cancer Res. 67, 10669–10676. [DOI] [PubMed] [Google Scholar]
- Ali, H.R. , Provenzano, E. , Dawson, S.J. , Blows, F.M. , Liu, B. , Shah, M. , Earl, H.M. , Poole, C.J. , Hiller, L. , Dunn, J.A. , Bowden, S.J. , Twelves, C. , Bartlett, J.M.S. , Mahmoud, S.M.A. , Rakha, E. , Ellis, I.O. , Liu, S. , Gao, D. , Nielsen, T.O. , Pharoah, P.D.P. , Caldas, C. , 2014. Association between CD8+ T-cell infiltration and breast cancer survival in 12,439 patients. Ann. Oncol. 25, 1536–1543. [DOI] [PubMed] [Google Scholar]
- Benjamini, Y. , Hochberg, Y. , 1995. Controlling the false discovery rate – a practical and powerful approach to multiple testing. J. Roy Stat. Soc. B 57, 289–300. [Google Scholar]
- Beuschlein, F. , Fassnacht, M. , Assie, G. , Calebiro, D. , Stratakis, C.A. , Osswald, A. , Ronchi, C.L. , Wieland, T. , Sbiera, S. , Faucz, F.R. , Schaak, K. , Schmittfull, A. , Schwarzmayr, T. , Barreau, O. , Vezzosi, D. , Rizk-Rabin, M. , Zabel, U. , Szarek, E. , Salpea, P. , Forlino, A. , Vetro, A. , Zuffardi, O. , Kisker, C. , Diener, S. , Meitinger, T. , Lohse, M.J. , Reincke, M. , Bertherat, J. , Strom, T.M. , Allolio, B. , 2014. Constitutive activation of PKA catalytic subunit in adrenal Cushing's syndrome. New Engl. J. Med. 370, 1019–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bilal, E. , Dutkowski, J. , Guinney, J. , Jang, I.S. , Logsdon, B.A. , Pandey, G. , Sauerwine, B.A. , Shimoni, Y. , Moen Vollan, H.K. , Mecham, B.H. , Rueda, O.M. , Tost, J. , Curtis, C. , Alvarez, M.J. , Kristensen, V.N. , Aparicio, S. , Borresen-Dale, A.L. , Caldas, C. , Califano, A. , Friend, S.H. , Ideker, T. , Schadt, E.E. , Stolovitzky, G.A. , Margolin, A.A. , 2013. Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Comput. Biol. 9, e1003047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bustin, S.A. , Benes, V. , Garson, J.A. , Hellemans, J. , Huggett, J. , Kubista, M. , Mueller, R. , Nolan, T. , Pfaffl, M.W. , Shipley, G.L. , Vandesompele, J. , Wittwer, C.T. , 2009. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 55, 611–622. [DOI] [PubMed] [Google Scholar]
- Calabrò, A. , Beissbarth, T. , Kuner, R. , Stojanov, M. , Benner, A. , Asslaber, M. , Ploner, F. , Zatloukal, K. , Samonigg, H. , Poustka, A. , Sültmann, H. , 2009. Effects of infiltrating lymphocytes and estrogen receptor on gene expression and prognosis in breast cancer. Breast Cancer Res. Treat. 116, 69–77. [DOI] [PubMed] [Google Scholar]
- Curtis, C. , Shah, S.P. , Chin, S.F. , Turashvili, G. , Rueda, O.M. , Dunning, M.J. , Speed, D. , Lynch, A.G. , Samarajiwa, S. , Yuan, Y. , Graf, S. , Ha, G. , Haffari, G. , Bashashati, A. , Russell, R. , McKinney, S. , Langerod, A. , Green, A. , Provenzano, E. , Wishart, G. , Pinder, S. , Watson, P. , Markowetz, F. , Murphy, L. , Ellis, I. , Purushotham, A. , Borresen-Dale, A.L. , Brenton, J.D. , Tavare, S. , Caldas, C. , Aparicio, S. , 2012. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeSantis, C. , Siegel, R. , Bandi, P. , Jemal, A. , 2011. Breast cancer statistics, 2011. CA Cancer J. Clini. 61, 408–418. [DOI] [PubMed] [Google Scholar]
- Drier, Y. , Sheffer, M. , Domany, E. , 2013. Pathway-based personalized analysis of cancer. Proc. Natl. Acad. Sci. USA 110, 6388–6393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foley, J. , Nickerson, N.K. , Nam, S. , Allen, K.T. , Gilmore, J.L. , Nephew, K.P. , Riese Ii, D.J. , 2010. EGFR signaling in breast cancer: bad to the bone. Semin. Cell Dev. Biol. 21, 951–960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forlino, A. , Vetro, A. , Garavelli, L. , Ciccone, R. , London, E. , Stratakis, C.A. , Zuffardi, O. , 2014. PRKACB and Carney complex. New Engl. J. Med. 370, 1065–1067. [DOI] [PubMed] [Google Scholar]
- Francis, S.H. , Blount, M.A. , Corbin, J.D. , 2011. Mammalian cyclic nucleotide phosphodiesterases: molecular mechanisms and physiological functions. Physiol. Rev. 91, 651–690. [DOI] [PubMed] [Google Scholar]
- Furuta, K. , Arao, T. , Sakai, K. , Kimura, H. , Nagai, T. , Tamura, D. , Aomatsu, K. , Kudo, K. , Kaneda, H. , Fujita, Y. , Matsumoto, K. , Yamada, Y. , Yanagihara, K. , Sekijima, M. , Nishio, K. , 2012. Integrated analysis of whole genome exon array and array-comparative genomic hybridization in gastric and colorectal cancer cells. Cancer Sci. 103, 221–227. [DOI] [PubMed] [Google Scholar]
- Geiger, T. , Madden, S.F. , Gallagher, W.M. , Cox, J. , Mann, M. , 2012. Proteomic portrait of human breast cancer progression identifies novel prognostic markers. Cancer Res. 72, 2428–2439. [DOI] [PubMed] [Google Scholar]
- Haibe-Kains, B. , Desmedt, C. , Loi, S. , Culhane, A.C. , Bontempi, G. , Quackenbush, J. , Sotiriou, C. , 2012. A three-gene model to robustly identify breast cancer molecular subtypes. J. Natl. Cancer Inst. 104, 311–325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastie, T. , Stuetzle, W. , 1989. Principal curves. J. Am. Stat. Assoc. 84, 502–516. [Google Scholar]
- Huang, S. , Yee, C. , Ching, T. , Yu, H. , Garmire, L.X. , 2014. A novel model to combine clinical and pathway-based transcriptomic information for the prognosis prediction of breast cancer. PLoS Comput. Biol. 10, e1003851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ideker, T. , Dutkowski, J. , Hood, L. , 2011. Boosting signal-to-noise in complex biology: prior knowledge is power. Cell 144, 860–863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson, D.A. , Akamine, P. , Radzio-Andzelm, E. , Madhusudan, M. , Taylor, S.S. , 2001. Dynamics of cAMP-dependent protein kinase. Chem. Rev. 101, 2243–2270. [DOI] [PubMed] [Google Scholar]
- Kanehisa, M. , Goto, S. , 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kanehisa, M. , Goto, S. , Furumichi, M. , Tanabe, M. , Hirakawa, M. , 2010. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355–D360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loi, S. , Sirtaine, N. , Piette, F. , Salgado, R. , Viale, G. , Van Eenoo, F. , Rouas, G. , Francis, P. , Crown, J.P. , Hitre, E. , de Azambuja, E. , Quinaux, E. , Di Leo, A. , Michiels, S. , Piccart, M.J. , Sotiriou, C. , 2013. Prognostic and predictive value of tumor-infiltrating lymphocytes in a phase III randomized adjuvant breast cancer trial in node-positive breast cancer comparing the addition of docetaxel to doxorubicin with doxorubicin-based chemotherapy: BIG 02-98. J. Clin. Oncol. 31, 860–867. [DOI] [PubMed] [Google Scholar]
- Nishimura, D. , 2001. BioCarta. Biotech. Softw. Internet Rep. 2, 117–120. [Google Scholar]
- Parker, J.S. , Mullins, M. , Cheang, M.C. , Leung, S. , Voduc, D. , Vickery, T. , Davies, S. , Fauron, C. , He, X. , Hu, Z. , Quackenbush, J.F. , Stijleman, I.J. , Palazzo, J. , Marron, J.S. , Nobel, A.B. , Mardis, E. , Nielsen, T.O. , Ellis, M.J. , Perou, C.M. , Bernard, P.S. , 2009. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Perou, C.M. , Sorlie, T. , Eisen, M.B. , van de Rijn, M. , Jeffrey, S.S. , Rees, C.A. , Pollack, J.R. , Ross, D.T. , Johnsen, H. , Akslen, L.A. , Fluge, O. , Pergamenschikov, A. , Williams, C. , Zhu, S.X. , Lonning, P.E. , Borresen-Dale, A.L. , Brown, P.O. , Botstein, D. , 2000. Molecular portraits of human breast tumours. Nature 406, 747–752. [DOI] [PubMed] [Google Scholar]
- Polyak, K. , 2011. Heterogeneity in breast cancer. J. Clin. Invest. 121, 3786–3788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rody, A. , Holtrich, U. , Pusztai, L. , Liedtke, C. , Gaetje, R. , Ruckhaeberle, E. , Solbach, C. , Hanker, L. , Ahr, A. , Metzler, D. , Engels, K. , Karn, T. , Kaufmann, M. , 2009. T-cell metagene predicts a favorable prognosis in estrogen receptor-negative and HER2-positive breast cancers. Breast Cancer Res. 11, R15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaefer, C.F. , Anthony, K. , Krupa, S. , Buchoff, J. , Day, M. , Hannay, T. , Buetow, K.H. , 2009. PID: the pathway interaction database. Nucleic Acids Res. 37, D674–D679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorlie, T. , Perou, C.M. , Tibshirani, R. , Aas, T. , Geisler, S. , Johnsen, H. , Hastie, T. , Eisen, M.B. , van de Rijn, M. , Jeffrey, S.S. , Thorsen, T. , Quist, H. , Matese, J.C. , Brown, P.O. , Botstein, D. , Lonning, P.E. , Borresen-Dale, A.L. , 2001. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sotiriou, C. , Pusztai, L. , 2009. Gene-expression signatures in breast cancer. New Engl. J. Med. 360, 790–800. [DOI] [PubMed] [Google Scholar]
- Soubeyran, P. , Kowanetz, K. , Szymkiewicz, I. , Langdon, W.Y. , Dikic, I. , 2002. Cbl-CIN85-endophilin complex mediates ligand-induced downregulation of EGF receptors. Nature 416, 183–187. [DOI] [PubMed] [Google Scholar]
- TCGA, 2012. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teschendorff, A. , Miremadi, A. , Pinder, S. , Ellis, I. , Caldas, C. , 2007. An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biol. 8, R157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsafrir, D. , Tsafrir, I. , Ein-Dor, L. , Zuk, O. , Notterman, D.A. , Domany, E. , 2005. Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices. Bioinformatics 21, 2301–2308. [DOI] [PubMed] [Google Scholar]
- van 't Veer, L.J. , Dai, H. , van de Vijver, M.J. , He, Y.D. , Hart, A.A. , Mao, M. , Peterse, H.L. , van der Kooy, K. , Marton, M.J. , Witteveen, A.T. , Schreiber, G.J. , Kerkhoven, R.M. , Roberts, C. , Linsley, P.S. , Bernards, R. , Friend, S.H. , 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536. [DOI] [PubMed] [Google Scholar]
- van der Vegt, B. , de Bock, G.H. , Hollema, H. , Wesseling, J. , 2009. Microarray methods to identify factors determining breast cancer progression: potentials, limitations, and challenges. Crit. Rev. Oncol. Hematol. 70, 1–11. [DOI] [PubMed] [Google Scholar]
- Vaske, C.J. , Benz, S.C. , Sanborn, J.Z. , Earl, D. , Szeto, C. , Zhu, J. , Haussler, D. , Stuart, J.M. , 2010. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 26, i237–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verhaegh, W. , Van de Stolpe, A. , 2014. Knowledge-based computational models. Oncotarget 5, 5196–5197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Verhaegh, W. , van Ooijen, H. , Inda, M.A. , Hatzis, P. , Versteeg, R. , Smid, M. , Martens, J. , Foekens, J. , van de Wiel, P. , Clevers, H. , van de Stolpe, A. , 2014. Selection of personalized patient therapy through the use of knowledge-based computational models that identify tumor-driving signal transduction pathways. Cancer Res. 74, 2936–2945. [DOI] [PubMed] [Google Scholar]
- Wang, Y. , Klijn, J.G. , Zhang, Y. , Sieuwerts, A.M. , Look, M.P. , Yang, F. , Talantov, D. , Timmermans, M. , Meijer-van Gelder, M.E. , Yu, J. , Jatkoe, T. , Berns, E.M. , Atkins, D. , Foekens, J.A. , 2005. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The following are the supplementary data related to this article:
The Supplementary Material for this paper, to be found at http://…contains the following files: Supplementary File 1: all Supplementary Figures and their captions. Supplementary File 2: all Supplementary Text mentioned in the manuscript. Supplementary File 3: excel file with list of all samples, their assignment to discovery or validation sets and the assigned iClusters and PDS clusters, as well as a list of all pathways used in the analysis and their various characteristics. Supplementary File 4: Repeating the entire analysis separately for each of the three clinical subtypes. Supplementary File 5: The list of pathways that differentiate good from bad outcome, using both original and new derivations of the PDSs.
Supplementary data
Supplementary data
Supplementary data
Supplementary data
Supplementary data
