Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 3.
Published in final edited form as: Phys Biol. 2013 Oct 3;10(5):10.1088/1478-3975/10/5/056006. doi: 10.1088/1478-3975/10/5/056006

Hierarchy of Gene Expression Data is Predictive of Future Breast Cancer Outcome

Man Chen 1, Michael W Deem 1
PMCID: PMC3863767  NIHMSID: NIHMS529995  PMID: 24091897

Abstract

We calculate measures of hierarchy in gene and tissue networks of breast cancer patients. We find that the likelihood of metastasis in the future is correlated with increased values of network hierarchy for expression networks of cancer-associated genes, due to correlated expression of cancer-specific pathways. Conversely, future metastasis and quick relapse times are negatively correlated with values of network hierarchy in the expression network of all genes, due to dedifferentiation of gene pathways and circuits. These results suggest that hierarchy of gene expression may be useful as an additional biomarker for breast cancer prognosis.

1 Introduction

As cancer develops, there are changes in patterns of gene expression. There are several examples where a defect in a single gene causes a genetic predisposition to developing cancer, for example the BRCA1 and BRCA2 genes in breast cancer [1-3]. In general, however, the development of cancer is the result of correlated networks of gene expression networks gone awry. That is, cancer is a systemic disease, and changes in fidelity of gene expression are signatures of cancer. In some cases, changes in gene expression networks can determine disease outcome [4-12]. Thus, structural features of gene expression networks may be ‘biomarkers’ that can predict the probability of a patient developing or surviving cancer.

We here focus on the relation between metastasis and the structure of networks relevant to cancer. Metastasis is the leading cause of cancer mortality [13]. Once metastasis has occurred, the chance of patient survival drops dramatically [14]. Clinicians use prognostic factors such as age or size of tumor at the time of tumor removal to predict the risk of recurrence [14]. Here, we present an analysis of the relation between breast cancer prognosis and hierarchical structure in correlations of cancer gene expression networks. We will show that these measures of hierarchy in correlations of gene expression distinguish between non-metastatic and metastatic patient populations. We will also show that these measures of hierarchy in gene expression are predictive of average time of relapse in breast cancer patients.

We are motivated to study hierarchy of gene expression by theory that relates hierarchy to environmental stress and variability [15-17]. This theory shows that when a system is placed in a more variable environment, it will tend to become more hierarchical, if it has the ability to do so. This occurs because hierarchy will tend to increase the adaptability of the system. This theory predicts that expression networks of cancer-associated genes may be more hierarchical in more aggressive tumors or during metastasis due to increased correlations in cancer-associated gene pathways. Conversely, measures of hierarchy in the network of all genes will likely decrease for more aggressive tumors or during metastasis, since cancer progression is a dedifferentiation of the entire gene network.

Measures of modularity have been defined for cancer gene and protein interaction networks. Carro et al. identified transcriptional modules in a context-specific regulatory network that controls expression of the mesenchymal signature associated with metastatic outcome [5, 18]. This result identified a small regulatory module that was part of the mechanism that controlled an important phenotypic state in cancer cells. Chuang et al. extracted subnetworks from protein interaction databases and found subnetworks that were significantly enriched with cancer susceptibility genes [5]. Comparison of normal and colon cancer gene networks identified changes in network structure. Oslund et al. have ranked cancer genes candidates by local network structures, such as neighbor annotation [19]. Yu et al. have used signature analysis to identify multiple breast cancer modules [20]. Taylor et al. used co-expression of hub proteins and their partners to identify whether interactions are context specific, i.e. interacting proteins are not always co-expressed, or constitutive, i.e. interacting proteins are always co-expressed [4, 5]. They found that during tumor progression, hub proteins are disorganized by loss of coordinated co-expression of components. Thus, changes in the correlation of tumor interactomes were shown to be a prognostic signature in cancer. Other studies have also demonstrated that modularity in the protein-protein interaction network or cell-cell interaction network is an important indicator for cancer prognosis [4] or tumor metastasis [21].

We here quantify the hierarchical structure in cancer networks, generalizing the concept of modularity. Modularity is one measure of the structure of cancer networks. Hierarchy is a measure of the modularity that exists in cancer networks at different scales. The rest of the paper is organized as follows. In section 2 we describe how we created gene and tissue networks from gene expression data previously collected from a population of metastatic and non-metastatic patients. In section 3, we show that hierarchy in networks of cancer-associated genes is positively correlated with metastasis due to activation of cancer specific pathways. Conversely, for networks of all genes, we show that a measure of hierarchy is negatively correlated with metastasis and early recurrence times due to dedifferentiation. We discuss these results in section 4. We conclude in section 5.

2 Methods

We used gene expression profiles of breast cancer patients to construct the networks. The expression profiles were previously obtained from 286 women with lymph node negative disease who had not received adjuvant systemic treatment [22, 23]. In that experiment, total RNA of frozen tumor samples was hybridized to Affymetrix Human U133a GeneChips. Expression values were calculated by the Affymetrix GeneChip analysis software MAS 5.0. Of the many genes analyzed, 76 cancer associated genes were identified as predictive of metastasis. Relapse and metastasis of the patients were examined during follow-up visits within 5 years [22].

Of these patients, 179 did not relapse and metastasize, and 107 were identified to have developed a distant metastasis during a follow up visit within 5 years[22]. We seek to distinguish using the data at year 0, the 179 patients that were disease free after extended treatment from the 107 patients that developed distant metastases within 5 years. We constructed cancer networks with two types of nodes: cancer-associated genes or tissue types.

2.1 Gene Networks

To construct networks of the first type, we defined a network with nodes of cancer-associated genes. A total of 76 genes were previously identified as markers that discriminated patients who developed distant metastases from those remaining metastasis-free for 5 years [22]. We use these 76 cancer-associated genes as the nodes of our network. The links between pairs of nodes were defined by the Pearson correlation coefficient of the two gene expression values:

lα,β=in(Pi,αμα)(Pi,βμβ)σασβ (1)

From this definition lα,β is symmetric in α and β, and so the graph is undirected. Here Pi,α is the expression data of gene α for patient i from [22], μα is the average expression value for gene α for the n patients, and σα is the standard deviation of expression value of gene α for the n patients. The expression value, Pi,α, is a measure of the abundance of the transcript reported by Affymetrix GeneChip analysis software MAS 5.0, scaled to a standard target intensity [22]. To make comparisons between the non-metastatic and the metastatic groups, which contain different number of patients, we randomly chose 40 patients each from the non-metastatic group and metastatic group. This random selection of patients mitigates bias due to differing patient group sizes. We repeated the procedure 100 times, which gives us 100 networks for the metastatic group and 100 networks for the non-metastatic group. Error bars are calculated from this bootstrapping procedure.

2.2 Tissue Networks

Networks of the second type are based upon tissues. We used tissue expression data previously collected for 79 human tissues [24, 25]. We are motivated to study the tissue network because during metastasis cancer spreads between and through different tissue types. The systemic nature of metastasis suggests gene expression in different tissue types may be informative to cancer prognosis. The tissue network is built with tissues as nodes and correlation of gene expression between different tissues as the link values. Specifically, we treat each tissue as a node and built a 79 × 79 tissue network, where the link value between tissue i and tissue j is weighted by the expression data of patient k, Pk,α from [22] to calculate a Pearson correlation coefficient:

li,jk=(Ti,αPk,αμi,k)(Tj,αPk,αμj,k)σi,kσj,kα (2)

This definition is symmetric in i and j, and so the network is undirected. Here α is the gene, Ti,α is the expression level of gene α in tissue i from [24], μi,k is the average value of Ti,αPk,α over all the available genes, and σi,k is the standard deviation of Ti,αPk,α. The expression value, Ti,α, is a measure of the abundance of the transcript reported by Affymetrix GeneChip analysis software MAS 5.0, normalized by a global median setting [24]. We set a cutoff for using tissue expression data, from 10% to 90%. Values of Ti,α falling below the cutoff are set to zero. BioGPS was used to map the gene names from the tissue data set [24] and the breast cancer data set [22] into NCBI IDs. Eq. (2) is an approximation to an ideal of a dataset with expression data for each tissue type from each breast cancer patient. We will show that structure in the network defined by Eq. (2) has predictive power for probability of metastasis within 5 years.

2.3 The CCC

To quantify structure in these networks, we define a measure of the hierarchy in the networks. Since a tree topology is the archetypal hierarchical structure, we use a measure of hierarchy that quantifies how tree-like the network is. To calculate this measure of hierarchy, we first computed the distance matrix defined by the network. We defined the distance between node i and j, dij, by the square root of the commute time. The commute time is the expected time it takes a random walker to travel from one of the nodes to an other and back [26]. The commute time between nodes i and j depends not only on the link value but also on all the other possible paths between nodes i and j. Note that the commute time between two nodes of a weighted graph decreases when the number of paths connecting the two nodes increases. The commute time between two nodes also decreases when the length of any path connecting the nodes decreases. These properties make the commute time well suited for clustering tasks. To define the commute time, we let L denote the graph Laplacian, defined as L = DA, where A is the matrix of links, A = l in Eq. (1) or (2), and the diagonal matrix D = diag(Ai), with Ai = Σj Aij. The commute time is obtained from L+, the Moore-Penrose pseudoinverse of the graph Laplacian L by [27]

n(i,j)=VG(eiej)TL+(eiej). (3)

Here ei)j = δij and VG = Σij aij. Since L+ is symmetric and positive semidefinite, dij=n(i,j) is a Euclidean distance metric, called the Euclidean commute time (ECT) distance.

We next applied the average linkage hierarchical clustering algorithm to construct the tree topology that best fit the cancer network [28]. This method takes the matrix of distances between all nodes of the network, dij, and produces a tree topology that best reproduces those distances. The construction of the tree from the network by this algorithm is unique, and approximately optimal in reproducing the distances. The distances of the tree topology are denoted by cij. The tree topology has the same nodes as the original network, but different links. We calculate the correlation between the original data and the best fitting tree, which gave the cophenetic correlation coefficient (CCC). The greater the correlation, the more hierarchical are the data. The nodes and links of the cancer network define the original data. The tree that best fits the data defines an approximation to the original network, termed the cophenetic matrix. The elements of the cophenetic matrix are the heights where two network nodes become members of the same cluster in the tree, see Figure 1. Distance between nodes in the best fitting tree are obtained from the height of the common bifurcation point between those nodes. This height is the cophenetic element of these two nodes, cij. The correlation between this cophenetic matrix constructed from the best fitting tree and the original data distances is the CCC. The CCC is a measure of similarity between the original data and the cophenetic matrix. The CCC is defined as

CCC=Σi<j(dijd)(cijc)(Σi<j(dijd)2Σi<j(cijc)2 (4)

Figure 1.

Figure 1

An illustration of how the CCC is calculated, on two networks. Distances between each pair of nodes in the network are calculated by the Euclidean commute time distance,i.e. the square root of the average round trip time from one node to the other node and back. For each network, a tree best representing the network is constructed by the average linkage hierarchical clustering algorithm. The distance between two nodes in the tree (y-axis above) is the height above the baseline at which two nodes are joined in the tree topology. To quantify the match between the tree and the original network, we calculated the correlation between the distances in the network and the distances in the tree. This correlation is termed the CCC. The more hierarchical the network, the greater the value of the CCC.

Here d is the average of the distances in the original network, dij, and c is the average of the tree distances, cij.

3 Results

3.1 Cancer-Associated Gene Network

For the network of cancer-associated genes, we take the 76 cancer-associated genes as the nodes, with link values from Eq. 1. We built a network by constructing a bootstrap sample of patients from either the non-metastatic outcome or the metastatic outcome groups. We calculated the average CCC value for many 40-person networks extracted from the bootstrap sample. The bootstrap method was then used to calculate the overall average CCC and standard error of this average. Figure 2a shows the result: hierarchy of the cancer-associated gene network is greater in the metastatic group than in the non-metastatic group.

Figure 2.

Figure 2

a) The CCC measure of hierarchy in the network of cancer-associated genes for the metastatic and non-metastatic patient groups. We randomly choose 40 patients from each group to construct the two networks. The bootstrap method was used to calculate the averages and standard errors, shown by error bars. Data are from [22]. This result shows that the network of cancer-associated genes is more hierarchical in the metastatic group. b) The normalized CCC in the network of cancer-associated genes for the metastatic and non-metastatic patient groups.

We compared these results to those from random networks. We built 100 random networks of the same size and total number of edges as the cancer-associated network, then randomly reassigned the link values in the network. We define a normalized CCC as CCCnorm = (CCCCCCrand)/(1 − CCCrand), where CCC is the value of the real cancer-associated network, and CCCrand is the average CCC value of the randomized network. We computed the z-score of the cancer-associated network CCC relative to the distribution of CCC values of the random networks of the same size and sparsity, ZCCC = (CCCCCCrand)/σrand. We found ZCCC = 1.64 and ZCCC = 2.14 for the cancer-associated gene networks in non-metastatic and metastatic patient groups compared to randomly rewired networks.

To compare the network structure between 76 cancer-associated genes with the network structure of the other genes, we randomly chose 76 genes from a total of 12926 genes and calculated CCC for two groups, Eq. 1. In particular, we constructed a bootstrap sample of all 12926 genes, and then calculated the average CCC for networks of 76 randomly chosen genes from this bootstrap sample. The link value is the Pearson correlation coefficient for each pair of 76 genes for two groups. We repeated the procedure 1000 times. The bootstrap method was when used to calculate an average CCC for all genes and the standard error of this average. The CCC for the non-metastatic patients group of 0.925 with standard deviation 0.0125, compared to the CCC for the metastatic patients group is 0.918 with standard deviation 0.0148. The difference for a student’s t-test is significant , p-value = 1.05 × 10−3.

3.2 Tissue-Tissue Network

We built a tissue-tissue network for each patient, as a function of the expression level cutoff . Nodes are tissues, and link values are given by Eq. 2. Figure 3 shows an example of this network for a patient from the non-metastatic outcome group and a patient from the metastatic outcome group. Figure 3a shows the values of the links in the tissue-tissue network, before scaling by the expression data from the breast cancer patients, i.e. li,jk from Eq. (2) with Pk,α ≡ 1. Figure 3b shows the values of the links in the tissue-tissue network for a patient in the non-metastatic group, i.e. li,jk from Eq. (2) where k is in the non-metastatic group. Figure 3c shows the values of the links in the tissue-tissue network for a patient in the metastatic group, i.e. li,jk from Eq. (2) where k is in the metastatic group. Figure 4a shows the amount of hierarchy in the tissue-tissue network with the expression level cutoff ranging from 0.1 to 0.9 for the metastatic and non-metastatic patient groups. For each patient, we determined the time of cancer recurrence. In typical cancer analysis, recurrence within 5 years of surgery indicates ‘non-cure.’ More rapid relapse times are interpreted as more aggressive cancer recurrence. Figure 4b shows the relationship between CCC and relapse time.

Figure 3.

Figure 3

The tissue networks with nodes as tissues and links calculated from gene expression values. a) Link value from tissue-tissue database. b) Link value weighted by the gene expression data from a patient from the non-metastatic outcome group. c) Link value weighted by the gene expression data from the metastatic outcome group. Here only those genes with the tc= 0.1 highest expression values are used.

Figure 4.

Figure 4

a) The average normalized CCC calculated from the tissue-tissue network for patients in metastatic (dashed) and non-metastatic groups (solid). The CCC for metastatic patients is below that for non-metastatic patients. The error bars are one standard error. The p-value for student-t test is 0.0295 for the tc= 0.1 data point and less than 0.05 for all three tc = 0.3–0.1 data points. The insert shows the distribution of expression levels within the tissue-tissue network. We use the highest tcfraction of these data in our analysis. b) The correlation between CCC and relapse time in the metastatic patients. Cancer appears a dedifferentiation on the set of all gene values, and we here observe a correlation between shorter relapse times and lower CCC values.

The structure of the tissue-tissue network is different from that of a random network. As with the gene network, we compared the real tissue network to a randomized network, in which link values are randomly reassigned. The average CCC of the randomized network in the range tc = 0.1 to 0.9 is 0.864 with standard deviation 0.017. The corresponding average value of the real network from which Fig 4a is derived is 0.955 with standard deviation 0.011. The z-score is, therefore, ZCCC= 5.35, which indicates that there is statistically significantly more hierarchy in the real tissue network than in the randomized network

4 Discussion

The classical view of cancer is that it is a dedifferentiation of the host. A disruption of the structure of the tissue-tissue network is, indeed, observed in Figure 4. The structure of the network in patients with more aggressive, metastatic cancers is more disrupted than in patients with no metastasis. Furthermore, among the metastatic patients, the structure of the network was more disrupted in the patients with the more aggressive cancers that recurred earlier, as seen in Fig. 4b. Both of these results are consistent with the picture of cancer as a general dedifferentiation of the host tissue network. From the point of view of the host, cancer is a disruption. Structure in the host network, which endows the host with robust functioning, is destroyed by the cancer.

Thus, we expect the values of the CCC for the tissue-tissue network to be lower for patients in the metastatic group. Figure 5a shows the distribution of the normalized CCC for each patient in the non-metastatic or metastatic outcome groups. We use tc= 0.1 because for this value, there is the greatest discrimination between the metastatic and non-metastatic populations in Figure 4a. These distributions, when averaged, give the tc= 0.1 values in Fig 4a. The distribution of the metastatic outcome group is shifted to lower average normalized CCC values. In addition, the width of the metastatic population distribution is slightly larger. Figure 5b shows the probability of metastasis for a patient with a given normalized CCC value, according to the equation

p(metastatic)=NmetastaticfmetastaticNmetastaticfmetastatic+Nnonmetastaticfnonmetastatic (5)

Figure 5.

Figure 5

a) The distribution of CCC values for the non-metastatic (solid) and metastatic (dashed) outcome patient populations. We calculated the average CCC and the standard derivation of the CCC in the non-metastatic group and the metastatic groups. We used the tissue-tissue networks with tc= 0.1. The resulting, Gaussian distribution fits to the non-metastatic and metastatic groups are shown. b) The probability of metastasis for a given patient depends on the CCC value for that patient, from Eq. (5). Lower values of CCC are more likely to lead to metastasis. The thin horizontal line illustrates the 37.4% probability of metastasis in the population, e.g. 107 metastatic patients, divided by 286 total patients. c) The ROC curve for the prediction that CCC < xcutoff leads to metastasis, shown only for the 5% of the population with the smallest CCC.

Here Nmetastatic = 107, and Nnon–metastatic = 179. The values of fmetastatic and fnon–metastatic are equal to the dashed and solid curves in Fig. 4a, respectively. The quantity p(metastatic) is a biomarker. The biomarker is highly discriminating for low values of the CCC, although only a small fraction of the patients have such low values of the CCC. For example, 5% of the patients values are below CCCnorm = 0.634, here for which p(metastatic) = 0.5.

We note that the standard BRCA-1 and BRCA-2 biomarkers for cancer apply to roughly 5–p10% of women [29, 30]. Thus, the biomarker in Eq. 5, which applies to only 5% of the patients, is perhaps of more significance than it may initially appear. Female subjects with BRCA-1 biomarkers have a cumulative lifetime risk of breast cancer in the range of 50–80%, versus a background risk of 12.5% [29, 30]. The predictive power in Fig. 5b, also > 50% for the 5% of the population with CCC < CCC*, is, therefore, perhaps also of greater significance than one might initially think. The CCC may be combined with other genetic biomarkers to achieve increased predictive power [4-12].

In Fig. 4 we find that highly expressed genes contribute more to the distinction between structures of tissue-tissue networks in patients with metastatic and non-metastatic outcomes. That is, highly expressed genes may have more impact on cancer outcome than lowly expressed genes. We do find, for example, the average expression level of cancer-associated genes is 15.25% higher than the average expression level of all genes.

One view of cancer is that it is an activation of cancer-specific pathways, perhaps hijacked atavistic host pathways [31]. Therefore, by examining cancer-specific pathways, we should see the development of structure in cancer patients. Figure 2 shows that the structure of networks of cancer-associated genes is greater in patients with aggressive, metastatic cancers than in patients with non-metastatic cancers. Cancer is an activation on the network of cancer associated genes. Conversely, because cancer is a dedifferentiation on the network of all genes, the CCC is lower for the entire gene network in the more aggressive tumors of metastatic patients in Figure 4. As discussed in section 3.1, the CCC of the gene network for randomly chosen genes is lower in the metastatic patients than in non-metastatic patients. The results show that the network structure of cancer-associated genes correlated with the clinical outcome. That is, metastatic tumors dedifferentiate the structure of most genes, but build up the structure of cancer associated gene networks. Recalculating the tissue-tissue network, Eq. (2), using only the cancer associated genes confirms this result. Eq. (2) is used, but only for α within the 6 cancer associated genes from [22] that are also present in the dataset of [24]. As expected, this calculation shows the trend in Figure 4 is reversed to that of Figure 2: the CCC of the tissue-tissue network constructed from only cancer associated genes is higher for the metastatic group than for the non-metastatic group, with p-value 0.076.

The CCC provides a new perspective in studying the structure of cancer networks. A higher CCC indicates a more hierarchical network, indicative of increased structure. This increased structure often allows for greater evolvability and is often induced by environmental stress [15]. Networks of cancer associated genes in metastatic patients are more hierarchical than in non-metastatic patients.

5 Conclusion

We have defined a measure of hierarchy in cancer networks. We found a correlation between the CCC and the clinical outcome. In our study, the CCC of the cancer-associated gene network was higher for the metastatic outcome group than for the non-metastatic outcome group. We anticipated this result, partly because physics of evolution in changing environments [15, 32] suggests that increased hierarchical structure helps cancer to better adapt to the changing environments encountered in metastasis and to overcome the natural barriers to metastasis in the body.

We find highly expressed genes play a particularly important role in predicting the metastasis of breast cancer. We found that disruption of the tissue-tissue network is correlated with both metastatic potential and shorter time of recurrence. Cancer is a complex disease involving genetic, epigenetic, and environmental perturbations. Furthermore, cancer operates within and between tissues. Our study of the tissue-tissue network provides additional insights and a possible additional biomarker for breast cancer metastasis and recurrence.

Acknowledgments

This work was supported in part by the US National Institutes of Health under grant 1 R01 GM 100468–01.

References

  • [1].Balkwill PA, Liu O, Shattuck-Eidens D, Cochran C, Harshman K, Tavtigian S, Bennett LM, Haugen-Strano A, Swensen J, Miki Y. BRCA1 mutations in primary breast and ovarian carcinomas. Science. 1994;266:120–122. doi: 10.1126/science.7939630. [DOI] [PubMed] [Google Scholar]
  • [2].Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W, Bell R, Rosenthal J, Hussey C, Tran T, McClure M, Frye C, Hattier T, Phelps R, Haugen-Strano A, Katcher H, Yakumo K, Gholami Z, Sha er D, Stone S, Bayer S, Wray C, Bogden R. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science. 1994;266:66–71. doi: 10.1126/science.7545954. [DOI] [PubMed] [Google Scholar]
  • [3].Wooster R, Bignell G, Lancaster J, Swift S, Seal S S, Mangion J, Collins N, Gregory N,S, Gumbs C, Micklem G. Identification of the breast cancer susceptibility gene brca2. Nature. 1995;378:21–28. doi: 10.1038/378789a0. [DOI] [PubMed] [Google Scholar]
  • [4].Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, Wrana JL. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nature Biotechnology. 2009 Feb;27:199–204. doi: 10.1038/nbt.1522. [DOI] [PubMed] [Google Scholar]
  • [5].Chuang HY, Lee E, Liu Y-T, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Molecular Systems Biology. 2007 Jan;3:140. doi: 10.1038/msb4100180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Pavlidis P, Lewis DP, Noble WS. Exploring gene expression data with class scores. Pac. Symp. Biocomput. 2002:474485. [PubMed] [Google Scholar]
  • [7].Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR. MAPPFinder: using gene ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 2003;4:R7. doi: 10.1186/gb-2003-4-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Draghici S, Khatri S,P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81:98104. doi: 10.1016/s0888-7543(02)00021-6. [DOI] [PubMed] [Google Scholar]
  • [9].subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:1554515550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. USA. 2005;102:1354413549. doi: 10.1073/pnas.0506577102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Wei Z, Li H. A Markov random field model for network-based analysis of genomic data. Bioinformatics. 2007;23:1537–1544. doi: 10.1093/bioinformatics/btm129. [DOI] [PubMed] [Google Scholar]
  • [12].Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP. Classification of microarray data using gene networks. Bioinformatics. 2007;8:35. doi: 10.1186/1471-2105-8-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Mehlen P, Puisieux A. Metastasisl: a question of life or death. Nature Reviews Cancer. 2006;6:449–458. doi: 10.1038/nrc1886. [DOI] [PubMed] [Google Scholar]
  • [14].DeMatteo RP, Lewis JJ, Leung D, Mudan SS, Woodru JM, Brennan MF. Two hundred gastrointestinal stromal tumors:recurrence patterns and prognostic factors for survival. Annals of Surgery. 2000:231–51. doi: 10.1097/00000658-200001000-00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Sun J, Deem MW. Spontaneous emergence of modularity in a model of evolving individuals. Phys. Rev. Lett. 2007;99:228107. doi: 10.1103/PhysRevLett.99.228107. [DOI] [PubMed] [Google Scholar]
  • [16].Lorenz D, Jeng A, Deem MW. Modularity in biological system. Physics of Life Reviews. 2011;8:129–160. doi: 10.1016/j.plrev.2011.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Deem MW. Statistical mechanics of modularity and horizontal gene transfer. Annu. Rev. Conden. Matter Phys. 2013;4:4.1–4.25. [Google Scholar]
  • [18].Carro MS, Lim WK, Alvarez MJ, Bollo RJ, Zhao X, Evan YS, Erik PS, Sandrine LA, Fiona D, Howard C, Anna L, Ken A, Andrea C. The transcriptional network for mesenchymal transformation of brain tumours. Nature. 2010 Jan;463:318–25. doi: 10.1038/nature08712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Ostlund G, Lindskog M, Sonnhammer EL. Network-based Identification of novel cancer genes. Molecular & Cellular Proteomics. 2010;9:648–55. doi: 10.1074/mcp.M900227-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Yu K, Ganesan Miller K, Lance DM, Tan P. A modular analysis of breast cancer reveals a novel low-grade molecular signature in estrogen receptor-positive tumors. Clinical Cancer Research. 2006;12:3288–96. doi: 10.1158/1078-0432.CCR-05-1530. [DOI] [PubMed] [Google Scholar]
  • [21].Balkwill F. The significance of cancer cell expression of the chemokine receptor CXCR4. Semin. Cancer. Biol. 2004;14:171–179. doi: 10.1016/j.semcancer.2003.10.003. [DOI] [PubMed] [Google Scholar]
  • [22].Wang Y, Klijin JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van GME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671–9. doi: 10.1016/S0140-6736(05)17947-1. [DOI] [PubMed] [Google Scholar]
  • [23]. [accessed 15 July 2013]; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2034.
  • [24].Su AI, Wiltshire T, Batalov S, Iapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA. 2004;101:6062–6067. doi: 10.1073/pnas.0400782101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25]. [accessed 15 July 2013]; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1133.
  • [26].Saerens M, Fouss F, Yen L, Dupont P. The principal components analysis of a graph, and its relationships to spectral clustering. Proceedings of the 15th European conference on machine learning (ECML) 2004 [Google Scholar]
  • [27].Barnett S. Matrices: Methods and Applications. Oxford University Press; 1990. [Google Scholar]
  • [28].Sokal R, Michener C. A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin. 1958;38:1409–1438. [Google Scholar]
  • [29].National Cancer Institute SEER Cancer Statistics Review. 1975-2005 [Google Scholar]
  • [30].Jardines L, Goyal S, Fisher P, Weitzel J, Royce M, Goldfarb SB. Breast Cancer Overview: Risk Factors, Screening, Genetic Testing, and Prevention. Cancer management: a multidisciplinary approach. 2013 [Google Scholar]
  • [31].Davies PCW, Lineweaver CH. Cancer tumors as Metazoa 1.0: tapping genes of ancient ancestors. Physical Biology. 2011;8:015001. doi: 10.1088/1478-3975/8/1/015001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Kashtan N, Noor E, Alon U. Varying environments can speed up evolution. Proc. Natl. Acad. Sci. USA. 2007;104:137116. doi: 10.1073/pnas.0611630104. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES