Abstract
Gene expression profiles have drawn broad attention in deciphering the pathogenesis of human cancers. Cancer-related gene modules could be identified in co-expression networks and be applied to facilitate cancer research and clinical diagnosis. In this paper, a new method was proposed to identify lung cancer-risk modules and evaluate the module-based disease risks of samples. The results showed that thirty one cancer-risk modules were closely related to the lung cancer genes at the functional level and interactional level, indicating that these modules and genes might synergistically lead to the occurrence of lung cancer. Our method was proved to have good robustness by evaluating the disease risk of samples in eight cancer expression profiles (four for lung cancer and four for other cancers), and had better performance than the WGCNA method. This method could provide assistance to the diagnosis and treatment of cancers and a new clue for explaining cancer mechanisms.
Introduction
Cancer is caused by aberration of multiple genes, and thus its pathogenesis is very complex and inconclusive [1], [2], [3]. Cancer-related genes possess diverse functions [4], [5], while genes with similar functions are likely to be co-expressed [6], [7] and located in neighboring areas (known as network modules) [8], [9] in biological networks. The modules reveal the mechanism of multiple genes underlying the disease and evaluate the risk of the disease. Effective identification of cancer risk modules can assist cancer researches [10], [11], [12], [13].
Disease risk of cancer-related modules calculated from a specific biological background can be a significant measure for clinical prediction of cancer diagnosis [14], [15], [16], [17], [18]. Several computational approaches have been developed for the disease risk module analysis, including detection of differentially correlated gene clusters and gene-specific analysis based the co-expression network [19], [20], [21], [22]. For example, weighted gene co-expression network analysis (WGCNA) is a mature technique and identifies gene modules as candidate biomarkers or therapeutic targets based on the co-expression network [23], [24]. WGCNA has been used to study complex diseases, such as metabolic syndrome [25], schizophrenia [26], and heart failure [27]. The expression activities of disease risk modules were (induced or repressed) different among clinical conditions (in tumor progress)[14].
Furthermore, it is feasible to identify cancer risk modules from co-expression networks using network-based methods. The analysis of gene co-expression networks shows that genes within the same modules appear to have similar expression patterns, share common regulatory mechanisms [28], [29], [30], and thus have strong associations with specific biological functions that determine the behaviors or phenotypes of cells [31], [32]. Modules derived from co-expression network were organized into a higher-order structure correlated with clinical characteristics, which provided insights into the underlying biology of glioma [33]. Four modules of ovarian cancer from a co-expression network were distinguished to be significantly associated with biological processes such as cell cycle and DNA replication in Gene Ontology (GO) categories[34]. The co-expression modules associated with T-helper differentiation and TGF-beta pathways improved clinical outcome of hormone-insensitive breast cancers after treatment [35]. Moreover, sample signatures/labels considered in evaluation of cancer-related risk modules would offer a new clue for revealing the mechanisms of diseases [36]. Researches have revealed that it is necessary to explore the relationships between gene functions and disease risks [37], [38]. The co-expression networks taking into account of biological functions would be more robust and authentic [39], [40], and the modules obtained from these networks could better reflect the function information of the diseases.
In this paper, a new method was proposed to identify cancer-risk modules and evaluate the module-based disease risks of samples. A highly-confident co-expression network with functional similarity information was first constructed by using expression profiles in lung cancer, and then candidate modules were identified. The cancer risks of the modules were scored by introducing sample labels, then the significant cancer-risk modules were screened out by randomized trials. Finally, the disease risks of samples were evaluated based on the cancer-risk modules. These modules were expected to provide evidence for disease diagnosis, treatment and clinical analysis in the future. Identification of cancer-risk modules and evaluation of module-based disease risks were performed in the following steps (Figure 1).
Materials and Methods
Materials
Cancer gene expression data were obtained from the Gene Expression Omnibus(GEO, http://www.ncbi.nlm.nih.gov/geo/)[37]. Here, our research was based on the profile GSE7670 [41]in GPL96 including 20,995 genes of 56 samples (28 lung cancer patients and 28 normal controls), for which patients underwent surgery for lung cancer at the Taipei Veterans General Hospital. These expression profiles (GSE10072, GSE21933, GSE27262, GSE40791, GSE14520, GSE15781, GSE20437, GSE26126) (Table 1) with disease and normal samples were used to analyze the robustness of our method and compare with the WGCNA method. Gene function information was obtained from Gene Ontology (GO, http://www.geneontology.org/) [42], updated to May 2011. Protein interaction information (95537 high-confidence interactions between 12359 genes) was downloaded from iRefWeb (http://www.wodaklab.org/iRefWeb/) [43], updated to April 13, 2012 of the 9th version. The information of 1824 protein complexes was obtained from Munich Information Center for Protein Sequences (MIPS, http://mips.helmholtz-muenchen.de/genre/proj/corum, Corum Release February 2012 available).
Table 1. The number of the tumor samples and the normal samples in the expression profiles.
a. The construction of a highly confident co-expression network
A method was introduced to create a highly confident co-expression network by taking both co-expression correlation and functional similarity. This method was performed as follows:
First, the Pearson correlation coefficient [44] r was used to represent the co-expression relationship between every pair of genes and calculated as follows:
where N is the number of samples in an expression profile, xi and yi are the expression levels of genes x and y in the i-th sample.
Second, GO semantic similarity was used to represent the functional similarity between every pair of genes [45].
(1) The similarity score of GO term A was defined as:
where includes term A and all its parent terms; is the weight of edge; and it is 0.8 for ‘is-a’ relationship and 0.6 for ‘part-of’ relationship.
(2) The semantic similarity between term A and term B, , was calculated as follows:
A gene's functions were considered as a set of GO terms in Gene Ontology. Thus, functions of genes G1 and G2 corresponded to GO sets and, m and n are the number of terms in GO1 and GO2 respectively.
(3) The semantic similarity between G1 and G2 was defined as:
The robust gene pairs were retained by the function similarity. Therefore, a highly-confident co-expression network was constructed by analyzing the Pearson correlation coefficient and GO semantic similarity.
b. Differential gene selection based on Bayesian model
A Bayesian model [46], [47] was used to screen the differential genes. Bayesian approaches compare the probability of an association between a gene expression and a disease to the probability given no such association. The formula was as follows:
where n1T, n2T, n2N and n2N are the number of samples (tumor/normal and high/low expression) for one gene (Table 2). B denotes the Beta function, defined by
Table 2. The number of samples (tumor/normal and high/low expression) for one gene.
+ | − | Total | |
T | n1T | n2T | n1T + n2T |
N | n1N | n2N | n1N + n2N |
Total | n1T +n1N | n2T + n2N | n1T + n2T + n2N + n2N |
T represents tumor samples and N for normal ones, and “+” stands for high expression (above-average) and “-”for low expression (below-average). n1T and n2T refers to the number of tumor samples with high expression and low expression, and n1N and n2N for the number of normal ones with high expression and low expression.
BLn is the log value of B.
When BFLn>0, there was relationship between a disease and gene expression; when BFLn<0, no relationship.
A randomized test was designed to calculate the significance of BFLn by stochastically disturbing n1T, n2T, n2N and n2N and retaining stable sum; after 10,000 times, the p-value was the proportion when the random BFLn was larger than the real value. Genes with p<0.05 were selected as differentially expressed genes (DE-genes).
c. Identification of cancer-risk modules
The online module mining tool GraphWeb (http://biit.cs.ut.ee/graphweb/) [48] was chosen to find co-expression modules. GraphWeb is designed to analyze individual or multiple merged networks, search for conserved features across multiple species, mine large biological networks for smaller modules, and compare results of high-throughput datasets. Markov Cluster (MCL) [49] algorithm via the GraphWeb tool was applied to prune the network and to find gene modules. The MCL algorithm simulates a stochastic flow in the expression graph and removes edges that are visited infrequently, resulting in a collection of densely connected groups of genes. The parameter of Markov clustering parameter was set to a default value 1.8.
The candidate modules containing the DE-genes were selected to evaluate the disease risks. Next, Z-test[50] was applied to assess the relationship between individual tumor samples and modules (Figure 2).
Finally, the significant samples with Z-test higher than the significance threshold (α = 0.05) were picked out. To measure the risk of each module, we defined:
Mrisk could be used to assess the disease risk of a candidate module. For each candidate module, 10,000 random modules were constructed by randomly selecting genes from the background gene set with equal numbers of module genes. Then, Mrisk was calculated for each random module, and the proportion of modules with Mrisk larger than the real value (the significance p-value) was computed. Modules with p<0.05 were considered as cancer-risk modules.
d. Evaluation of the sample's disease risk
To evaluate the module-based disease risk of each sample, we defined:
where M includes all cancer-risk modules, N is the number of cancer-risk modules, means the cancer-risk of the sample i about the module j, and p is the significance of Z-test.
Cancer-risk modules were applied to evaluate samples by calculating the module-based disease risk of each sample. Then evaluation performance was estimated by a receiver operating characteristic (ROC) curve.
Results
The highly-confident co-expression network
The Pearson correlation coefficient and the GO semantic similarity of every pair of genes in the expression profile GSE7670 were calculated. After that, curve fitting was applied to analyze the variation trend of average distribution of co-expression value with GO semantic similarity at a 0.05 interval (Figure 3). Functional similarity increased when co-expression level was over the tangency point. Therefore, the pairs of genes with functional similarity over 0.582 and Pearson correlation coefficient over 0.82 (the tangent point) were selected to create the highly-confident co-expression network, which consisted of 9841 nodes and 112,605 edges.
Cancer-risk Modules
A total of 472 DE-genes were screened out by applying BFLn to the expression profile GSE7670. Then 75 candidate disease modules containing DE-genes were obtained through GraphWeb. After the randomized test, 31 lung cancer-risk modules were obtained (Table 3).
Table 3. Lung cancer-risk modules.
Risk | ID | Size | Genes | Mrisk | p-value |
high | M2 | 171 | ZEB1, CAV1, HYAL2, MMP12, CLU, TIMP3, DKK3, LPL, TCF21, FOXF1… | 1 | 0.0043 |
M72 | 9 | ASPM*,BUB1B,CCNB2,CEP55,KPNA2*,MAD2L1,PBK,TPX2,TRIP13 | 1 | 0.0036 | |
M46 | 13 | BARD1,CDT1,DLGAP5*,DONSON*,GINS1,KIF4A*, | 1 | 0.0062 | |
MCM7*,MCM3,MLF1IP*,NDC80,PAQR4,TMEM48,TTK | |||||
M39 | 14 | ADRM1,BYSL,CKS1B,CRABP2,DNAJA3,HAX1,LSM12, | 1 | 0.0067 | |
MPZL1*,MRPL17,MRPS7,NME4,RPN2,SLC2A4RG,STRA13 | |||||
M281 | 3 | CRYAB*,HSPB2*,VGLL3* | 1 | 0.0018 | |
M82 | 9 | ALG3*,EIF2S1,HSPB11,LRRC42,MCTS1,P4HA2,PSMA5,SEC61G,VARS | 1 | 0.004 | |
M61 | 11 | ADAMTS8*,CSRP1*,KCNK3*,LINC00312*,MYH11*, | 1 | 0.0058 | |
MYLK,PDE2A,PKNOX2*,RASL12*,SETBP1,TACC1* | |||||
M266 | 3 | CDCA3,GALNT6,IDH2* | 1 | 0.0017 | |
M340 | 3 | MRPS34,NUBP2,SNRNP25* | 1 | 0.0015 | |
M363 | 3 | DDR1*,FLAD1*,SPINT1 | 1 | 0.002 | |
middle | M62 | 11 | CCNB1,CKAP2,KIF11*,KIF20A*,MCM4,MELK*, | 0.9642 | 0.0187 |
NCAPG,NETO2*,PRC1*,SHCBP1,TOP2A* | |||||
M27 | 17 | CCT6A*,EIF2AK1*,EIF3B,FKBP14*,GART,GINS4,GNL3, | 0.9642 | 0.0304 | |
HEATR2*,KLHL7*,LSM5*,MRPS17*,MRPS33*,PHLDA2, | |||||
POLD2,PPP1R14B*,PSMD2,TMEM106B* | |||||
M268 | 3 | HPRT1*,SCRN1*,TPBG* | 0.9642 | 0.0065 | |
M102 | 8 | AVL9,CDK5,CORO1B,CHPF2*,ITPKA,NDUFS8,PPP1CA,SSH3 | 0.9642 | 0.0172 | |
M63 | 10 | A2M*,CASP1*,CD97*,FABP4*,GAS6*,GMFG*, | 0.9642 | 0.0171 | |
PDLIM2*,PLEKHO2*,RARRES2,TRPV2* | |||||
M54 | 12 | CLDN5*,CRIM1*,DOCK6,FGR*,ICAM2*,INPP1*, | 0.9642 | 0.0223 | |
KANK3*,LIMS2*,LRRC32,PCDH12*,PTGIR*,RASIP1* | |||||
M258 | 4 | FZR1*,CLDN4,LY6E,PRSS8 | 0.9642 | 0.0076 | |
M188 | 5 | BLVRA,KIAA0391,PSMA6,SRP54*,TFPI2 | 0.9642 | 0.0091 | |
M297 | 3 | AHCY*,PKP3,SLC38A1 | 0.9642 | 0.0088 | |
M321 | 3 | GLO1,EGFL7,PDXDC1* | 0.9642 | 0.0056 | |
M180 | 5 | DHTKD1*,MEA1,SLC35A2,TMED3*,TPMT | 0.9642 | 0.0096 | |
M86 | 9 | CDKL2,ENY2,HAND1,LY6D,ORM1,ORM2,RAB25*,S100G,TSTA3* | 0.9642 | 0.0159 | |
M387 | 3 | GALNTL2*,SAR1B*,TSPAN6* | 0.9642 | 0.0062 | |
low | M157 | 5 | DHFR,DTL,GMPS,MYBL2,RFC4* | 0.9285 | 0.0234 |
M241 | 4 | COG8,FAM158A,PDF,PSMB5* | 0.9285 | 0.0207 | |
M249 | 4 | KRT10,NIPSNAP1*,POLDIP2*,SEPHS2 | 0.9285 | 0.0173 | |
M314 | 3 | FAM65A*,GIMAP5*,SEPP1* | 0.9285 | 0.0159 | |
M280 | 3 | GYPC,PTGDS*,RPL15 | 0.9285 | 0.016 | |
M144 | 6 | BCKDK,DECR2,GALE,NDUFB11,PYCR1*,RRNAD1 | 0.9285 | 0.028 | |
M316 | 3 | CTSA,ERGIC3,PAFAH1B3* | 0.8928 | 0.0313 |
Risk is modules category, ID indicate the identifier of cancer-risk modules, size is the module scale, namely the number of genes in the module, genes is the genes in the modules and the genes which were marked * were DE-genes, Mrisk is the cancer risk of modules, p-value is significance p value of random randomized test.
Evaluation of cancer-risk modules
The cancer-risk modules were evaluated at the functional level and interactional level. On one hand, functional enrichment was performed for each lung cancer-risk module using an online tool DAVID (http://david.abcc.ncifcrf.gov/home.jsp) [51], and then significantly enriched GO terms of each module were obtained (More modules are in the Table S1). On the other hand, the interactional relationships of modules were assessed using protein interaction data from iRefWeb. The relationship network of cancer-risk modules and known lung cancer genes was constructed on the basis of functional and interactional relationships (Figure 4). The results showed that lung cancer-risk modules were closely related with the lung cancer genes, which indicated that these modules and genes might synergistically cause lung cancer. For instance, m46 was associated with cell cycle regulation and phosphorylation [52], cell proliferation and cell cycle checkpoint [53], and ATP binding [54] by interacting with known lung cancer genes KRAS, KDR and TP53, respectively. These functions were confirmed to be related to the occurrence of lung cancer. Another module m63 was significantly enriched in functions associated with the cancer, e.g. the response to corticosteroid stimulus, the response to organic substance, and glucocorticoid stimulus and steroid hormone stimulus together by interacting with known lung cancer genes KRAS, NFE2L2 and NKX2, respectively [55], [56], [57].
To further analyze the relationship network, the cancer-risk modules were classified into three types according to the risks: the high, middle and low risk modules (Table 3), and the corresponding degree distributions were calculated (Table 4). The results showed that the high risk modules tend to have high degrees. Namely, they had more connections with other modules and known disease genes at the functional and interactional levels. They played pivotal roles in the network.
Table 4. Average Degree for three types of cancer-risk modules.
Risk | D_W | D_M | D_D | D_P | D_F | D_B |
High | 13.00 | 5.222 | 7.78 | 6.33 | 9.22 | 2.50 |
Middle | 10.67 | 3.75 | 6.92 | 3.50 | 8.58 | 1.42 |
Low | 4.71 | 2.00 | 2.70 | 2.14 | 2.80 | 0.28 |
D_W stands for degree of whole net, D_M for degree only between modules, D_D for degree only considered of modules with disease-causing genes, D_P for degree of the protein interaction edges(purple edges), D_F for degree of function edges(green edges), D_B for degree of both protein interaction and function(red edges).
Evaluation of the module-based disease risk
The lung cancer risk of each sample was evaluated by considering the cancer-risk modules. By measuring the lung cancer risk (Srisk), every sample in GSE7670 was evaluated. It turned out that every sample could be successfully identified as disease (Srisk>0.8) or normal (Srisk<0.8) based on its disease risk (Figure 5).
The robustness of our method
In order to verify the robustness of this method, first, other four expression profiles (GSE10072 from GPL96, the same as GSE7670; GSE27262 and GSE40791 from GPL570; and GSE21933 from GPL6254) about lung cancer and normal were evaluated, respectively (Table 1). The results showed that the module-based disease risks of cancer samples were higher than those of normal ones (Figure 6a). ROC curves were then plotted and the AUC values (>0.97) were used to measure the evaluation performances of the cancer-risk modules which were obtained by our method (Figure 6c). The method had good performance in the expression profiles not only from the same platform, but also from different platforms.
Next, we identified risk modules of liver cancer (GSE14520), colon cancer (GSE15781), breast cancer (GSE20437), and prostate cancer (GSE26126) in the same way, respectively (More cancer-risk modules information in the four cancers are in the Table S3). The cancer-risk modules were used to evaluate the disease risks of the samples, and the corresponding ROC curves were drawn (Figure 7).
Method comparisons
The WGCNA method [24] is a widely used technique to construct gene modules within a network based on gene co-expression relationships. In this paper, the accuracy and robustness of WGCNA and our method were compared. Fifty seven lung cancer risk modules were obtained from GSE7670 using the WGCNA method. The lung cancer risk of every sample in GSE7670 itself was evaluated with the modules. Cancer risks of some cancer samples were smaller than those of normal ones (Figure 8), which indicated the WGCNA method could not completely identified samples as disease or normal as accurately, while our method could (Figure 5).
Then the evaluation of the samples' lung cancer risks was extended to other four expression profiles about lung cancer and normal (Figure 6b). It was found that the cancer risks of cancer samples were not significantly different from those of normal ones. The ROC curves were then used to evaluate the performance of the WGCNA method (Figure 6d). We found that our method had better accuracy and robustness than the WGCNA method (Figure 6).
Discussion
Studying the mechanisms of diseases by analyzing gene expression profiles appears to be a convenient and effective way. Considering the functional similarity could better reflect the function information of the disease. In this paper, a new method was proposed to identify thirty one cancer-risk modules and evaluate the module-based disease risks of samples by using a co-expression network with functional similarity information. Finally, the relationship network of cancer-risk modules and cancer genes was constructed on the functional level and interactional level.
These modules were found to be closely related to cancers in the aspects of functions, interactions, and literature. Our method was proved to be fairly robust by evaluating the disease risks of samples in four lung cancer expression profiles and in four other cancers, and had better performance than the WGCNA method.
Cancer-risk modules and the evaluation of the module-based disease risk from this study were confirmed to be credible with the following considerations. (i) Differentially expressed genes were selected by using the BFLn method, which considered both gene expression and sample label distribution so as to eliminate outliers caused by bias expression of individual gene or experiment errors. (ii) Our gene network was of high confidence, because the method was used to calculate not only the co-expression correlation, but also functional similarities between genes. The gene pairs with both high expression consistency and functional similarity were retained for building the high confident network, which was capable of avoiding biased results merely depending on expression. (iii) The cancer risks of modules were evaluated by using the proportion of significant tumor samples, which could be a new method to evaluate disease modules. The genes in cancer-risk modules could be potential disease genes, and might act as drug targets for the treatment of aggressive cancers. All genes of m46 were related with lung cancer. For instance, MCM7 is a significant subunit of MCM complex, which could be a novel therapeutic target in lung cancer [58]. Another gene BARD1, whose isoforms may be related to tumor initiation and invasive progression, was a more suitable neoteric prognostic marker for non-small-cell lung cancer [59]. KIF4A might hold a promise for the development of anticancer drugs and cancer vaccines as well as a prognostic biomarker in clinic [60]. For the genes in module m63, A2M was in limited and extended lung cancer patients compared to a nonsmoker and smoker control population [61], FABP4 was down-regulated in lung adenocarcinoma [62], and CASP1 affected the single-nucleotide polymorphisms, increasing the cancers risk [63]. (iv) The evaluation of samples' module-based disease risks is accuracy and robustness. Because our method integrated the differentially expressed genes, a co-expression network and functional similarities, the cancer-risk modules were closely related to the pathogenesis of cancer in the aspects of functions and interactions. On the functional level, the cancer-risk modules could reflect the functional classes related to diseases; on the interactional level, the cancer-risk modules could be very high correlated with the disease genes.
Additionally, we investigated the overlap between the cancer-risk modules and the protein compounds (Figure 9). The results of hypergeometric distribution analysis showed that 17 modules had significant overlap with 150 complexes (p<0.05). For example, module m46 shared genes with 24 complexes, among which 19 complexes had an overlap rate higher than 20%. The complex BRCA1_A recruited BRCA1 to DNA damage sites [64]. Partial depletion of Mcm proteins which were typically loaded in excessive number of locations led to cancers and stem cell deficiencies [65]. The expression of ubiquitin E3 ligase was associated with estrogen receptor (ER)-positive status in human breast tumors [66] (More modules and complex information are in the table S2). Our method will be more comprehensive considering protein-protein information to construct an integrated network and developing a module mining algorithm in the future.
In conclusion, this study presented a novel method to evaluate disease risks of samples based on cancer-risk modules and to analyze the relationships between the disease and modules. This method could provide assistance to the diagnosis and treatment of cancers and a new clue for revealing the cancer mechanisms.
Supporting Information
Funding Statement
Funding provided by the National Natural Science Foundation of China (No. 61272388 and No. 31301040); Oversea Scholars Project funded by Education Department of Heilongjiang Province (NO. 1155H012); and the Master Innovation Funds of Heilongjiang Province (No. YJSCX2012-209HLJ and YJSCX2012-224HLJ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Zochbauer-Muller S, Fong KM, Virmani AK, Geradts J, Gazdar AF, et al. (2001) Aberrant promoter methylation of multiple genes in non-small cell lung cancers. Cancer Res 61: 249–255. [PubMed] [Google Scholar]
- 2. Compagno M, Lim WK, Grunn A, Nandula SV, Brahmachary M, et al. (2009) Mutations of multiple genes cause deregulation of NF-kappaB in diffuse large B-cell lymphoma. Nature 459: 717–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS (2010) A census of amplified and overexpressed human cancer genes. Nat Rev Cancer 10: 59–64. [DOI] [PubMed] [Google Scholar]
- 4. Okabe H, Satoh S, Kato T, Kitahara O, Yanagawa R, et al. (2001) Genome-wide analysis of gene expression in human hepatocellular carcinomas using cDNA microarray: identification of genes involved in viral carcinogenesis and tumor progression. Cancer Res 61: 2129–2137. [PubMed] [Google Scholar]
- 5. Kettunen E, Anttila S, Seppanen JK, Karjalainen A, Edgren H, et al. (2004) Differentially expressed genes in nonsmall cell lung cancer: expression profiling of cancer-related genes in squamous cell lung cancer. Cancer Genet Cytogenet 149: 98–106. [DOI] [PubMed] [Google Scholar]
- 6. Kreiman G (2004) Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res 32: 2889–2900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Li Y, St John MA, Zhou X, Kim Y, Sinha U, et al. (2004) Salivary transcriptome diagnostics for oral cancer detection. Clin Cancer Res 10: 8442–8450. [DOI] [PubMed] [Google Scholar]
- 8. Ogata Y, Suzuki H, Sakurai N, Shibata D (2010) CoP: a database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics 26: 1267–1268. [DOI] [PubMed] [Google Scholar]
- 9. Shakhnovich BE, Reddy TE, Galinsky K, Mellor J, Delisi C (2004) Comparisons of predicted genetic modules: identification of co-expressed genes through module gene flow. Genome Inform 15: 221–228. [PubMed] [Google Scholar]
- 10. Dinu I, Potter JD, Mueller T, Liu Q, Adewale AJ, et al. (2009) Gene-set analysis and reduction. Brief Bioinform 10: 24–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A 103: 5923–5928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. West M, Ginsburg GS, Huang AT, Nevins JR (2006) Embracing the complexity of genomic data for personalized medicine. Genome Res 16: 559–566. [DOI] [PubMed] [Google Scholar]
- 13. Pittman J, Huang E, Dressman H, Horng CF, Cheng SH, et al. (2004) Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. Proc Natl Acad Sci U S A 101: 8431–8436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Segal E, Friedman N, Koller D, Regev A (2004) A module map showing conditional activity of expression modules in cancer. Nat Genet 36: 1090–1098. [DOI] [PubMed] [Google Scholar]
- 15. Hollen PJ, Gralla RJ (1996) Comparison of instruments for measuring quality of life in patients with lung cancer. Semin Oncol 23: 31–40. [PubMed] [Google Scholar]
- 16. Segal E, Friedman N, Kaminski N, Regev A, Koller D (2005) From signatures to models: understanding cancer using microarrays. Nat Genet 37 Suppl: S38–45. [DOI] [PubMed] [Google Scholar]
- 17. Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, et al. (1993) The European Organization for Research and Treatment of Cancer QLQ-C30: a quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst 85: 365–376. [DOI] [PubMed] [Google Scholar]
- 18. Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, et al. (2008) Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res 14: 5158–5165. [DOI] [PubMed] [Google Scholar]
- 19. Cho SB, Kim J, Kim JH (2009) Identifying set-wise differential co-expression in gene expression microarray data. BMC Bioinformatics 10: 109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Watson M (2006) CoXpress: differential co-expression in gene expression data. BMC Bioinformatics 7: 509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fang G, Kuang R, Pandey G, Steinbach M, Myers CL, et al.. (2010) Subspace differential coexpression analysis: problem definition and a general approach. Pac Symp Biocomput: 145–156. [PubMed]
- 22. Tesson BM, Breitling R, Jansen RC (2010) DiffCoEx: a simple and sensitive method to find differentially coexpressed gene modules. BMC Bioinformatics 11: 497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4: Article17. [DOI] [PubMed] [Google Scholar]
- 24. Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9: 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Min JL, Nicholson G, Halgrimsdottir I, Almstrup K, Petri A, et al. (2012) Coexpression network analysis in abdominal and gluteal adipose tissue reveals regulatory genetic loci for metabolic syndrome and related phenotypes. PLoS Genet 8: e1002505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. de Jong S, Boks MP, Fuller TF, Strengman E, Janson E, et al. (2012) A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One 7: e39498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Dewey FE, Perez MV, Wheeler MT, Watt C, Spin J, et al. (2011) Gene coexpression network topology of cardiac development, hypertrophy, and failure. Circ Cardiovasc Genet 4: 26–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Aoki K, Ogata Y, Shibata D (2007) Approaches for extracting practical information from gene co-expression networks in plant biology. Plant Cell Physiol 48: 381–390. [DOI] [PubMed] [Google Scholar]
- 29. Carter SL, Brechbuhler CM, Griffin M, Bond AT (2004) Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20: 2242–2250. [DOI] [PubMed] [Google Scholar]
- 30. Carlson MR, Zhang B, Fang Z, Mischel PS, Horvath S, et al. (2006) Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics 7: 40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Purmann A, Toedling J, Schueler M, Carninci P, Lehrach H, et al. (2007) Genomic organization of transcriptomes in mammals: Coregulation and cofunctionality. Genomics 89: 580–587. [DOI] [PubMed] [Google Scholar]
- 32. Michalak P (2008) Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 91: 243–248. [DOI] [PubMed] [Google Scholar]
- 33. Ivliev AE, t Hoen PA, Sergeeva MG (2010) Coexpression network analysis identifies transcriptional modules related to proastrocytic differentiation and sprouty signaling in glioma. Cancer Res 70: 10060–10070. [DOI] [PubMed] [Google Scholar]
- 34. Hong S, Dong H, Jin L, Xiong M (2011) Gene co-expression network and functional module analysis of ovarian cancer. Int J Comput Biol Drug Des 4: 147–164. [DOI] [PubMed] [Google Scholar]
- 35. Teschendorff AE, Gomez S, Arenas A, El-Ashry D, Schmidt M, et al. (2010) Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules. BMC Cancer 10: 604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: 203–209. [DOI] [PubMed] [Google Scholar]
- 37. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, et al. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37: D885–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102: 15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Reverter A, Chan EK (2008) Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks. Bioinformatics 24: 2491–2497. [DOI] [PubMed] [Google Scholar]
- 40. Shi Z, Derow CK, Zhang B (2010) Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Syst Biol 4: 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Su LJ, Chang CW, Wu YC, Chen KC, Lin CJ, et al. (2007) Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme. BMC Genomics 8: 140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, et al. (2009) Human Protein Reference Database—2009 update. Nucleic Acids Res 37: D767–772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Mao L, Van Hemert JL, Dash S, Dickerson JA (2009) Arabidopsis gene co-expression network and its functional modules. BMC Bioinformatics 10: 346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23: 1274–1281. [DOI] [PubMed] [Google Scholar]
- 46. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7: 781–791. [DOI] [PubMed] [Google Scholar]
- 47. Zhang L, Li W, Song L, Chen L (2010) A towards-multidimensional screening approach to predict candidate genes of rheumatoid arthritis based on SNP, structural and functional annotations. BMC Med Genomics 3: 38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Reimand J, Tooming L, Peterson H, Adler P, Vilo J (2008) GraphWeb: mining heterogeneous biological networks for gene modules with functional significance. Nucleic Acids Res 36: W452–459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Samuel Lattimore B, van Dongen S, Crabbe MJ (2005) GeneMCL in microarray analysis. Comput Biol Chem 29: 354–359. [DOI] [PubMed] [Google Scholar]
- 50. Zaykin DV (2011) Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. J Evol Biol 24: 1836–1841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Huang da W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Sunaga N, Shames DS, Girard L, Peyton M, Larsen JE, et al. (2011) Knockdown of oncogenic KRAS in non-small cell lung cancers suppresses tumor growth and sensitizes tumor cells to targeted therapy. Mol Cancer Ther 10: 336–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Roth JA, Nguyen D, Lawrence DD, Kemp BL, Carrasco CH, et al. (1996) Retrovirus-mediated wild-type p53 gene transfer to tumors of patients with lung cancer. Nat Med 2: 985–991. [DOI] [PubMed] [Google Scholar]
- 54. Kendall RL, Rutledge RZ, Mao X, Tebben AJ, Hungate RW, et al. (1999) Vascular endothelial growth factor receptor KDR tyrosine kinase activity is increased by autophosphorylation of two activation loop tyrosine residues. J Biol Chem 274: 6453–6460. [DOI] [PubMed] [Google Scholar]
- 55. Massarelli E, Varella-Garcia M, Tang X, Xavier AC, Ozburn NC, et al. (2007) KRAS mutation is an important predictor of resistance to therapy with epidermal growth factor receptor tyrosine kinase inhibitors in non-small-cell lung cancer. Clin Cancer Res 13: 2890–2896. [DOI] [PubMed] [Google Scholar]
- 56. Solis LM, Behrens C, Dong W, Suraokar M, Ozburn NC, et al. (2010) Nrf2 and Keap1 abnormalities in non-small cell lung carcinoma and association with clinicopathologic features. Clin Cancer Res 16: 3743–3753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Perner S, Wagner PL, Demichelis F, Mehra R, Lafargue CJ, et al. (2008) EML4-ALK fusion lung cancer: a rare acquired event. Neoplasia 10: 298–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Croce CM (2009) Causes and consequences of microRNA dysregulation in cancer. Nat Rev Genet 10: 704–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Zhang YQ, Bianco A, Malkinson AM, Leoni VP, Frau G, et al. (2012) BARD1: an independent predictor of survival in non-small cell lung cancer. Int J Cancer 131: 83–94. [DOI] [PubMed] [Google Scholar]
- 60. Taniwaki M, Takano A, Ishikawa N, Yasui W, Inai K, et al. (2007) Activation of KIF4A as a prognostic biomarker and therapeutic target for lung cancer. Clin Cancer Res 13: 6624–6631. [DOI] [PubMed] [Google Scholar]
- 61. Marchandise FX, Mathieu B, Francis C, Sibille Y (1989) Local increase of antiprotease and neutrophil elastase-alpha 1-proteinase inhibitor complexes in lung cancer. Eur Respir J 2: 623–629. [PubMed] [Google Scholar]
- 62. Wang G, Ye Y, Zheng W, Ma W (2010) [Identification of candidate genes for lung adenocarcinoma using Toppgene]. Zhongguo Fei Ai Za Zhi 13: 282–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Dong LM, Brennan P, Karami S, Hung RJ, Menashe I, et al. (2009) An analysis of growth, differentiation and apoptosis genes with risk of renal cancer. PLoS One 4: e4895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Wang B, Hurov K, Hofmann K, Elledge SJ (2009) NBA1, a new player in the Brca1 A complex, is required for DNA damage resistance and checkpoint control. Genes Dev 23: 729–739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Rusiniak ME, Kunnev D, Freeland A, Cady GK, Pruitt SC (2012) Mcm2 deficiency results in short deletions allowing high resolution identification of genes contributing to lymphoblastic lymphoma. Oncogene 31: 4034–4044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Kona FR, Stark K, Bisoski L, Buac D, Cui Q, et al. (2012) Transcriptional activation of breast cancer-associated gene 2 by estrogen receptor. Breast Cancer Res Treat 135: 495–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.