Version Changes
Revised. Amendments from Version 1
The major changes we have made to this version according to suggestions from reviewers are below: 1). Added a new paragraph in the “Results” section. 2). Added a new figure as Figure 3 for the new paragraph. 3). Added a new reference as Reference 8. 4). Changed the title to avoid confusion.
Abstract
High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
Introduction
High-throughput experiments, which generate large and complex data sets, are routinely performed in modern biological and clinical studies to unravel mechanisms underlying complex diseases, such as cancer. However, extracting reliable and meaningful results from these experiments is usually difficult and requires sophisticated computational tools and algorithms, which are challenging for experimental biologists to comprehend. A user-friendly software tool is extremely important for both bench and computational biologists to perform high-throughput data analysis related to cancer and other complex diseases.
Many studies have shown that alterations in pathways or networks are better correlated with complex disease phenotypes than any particular gene or gene product 1, 2. Pathway- and network-based data analysis approaches project information about seemingly unrelated genes and proteins onto pathway and network contexts, and create an integrated view for researchers to understand mechanisms related to phenotypes of interest.
In this paper, we describe a software tool called ReactomeFIViz (also called the Reactome FI Cytoscape app or ReactomeFIPlugIn), which can be used to perform pathway- and network-based data analysis for data generated from high-throughput experiments. This tool uses the highly reliable Reactome functional interaction (FI) network 3 for doing network-based data analysis. The FI network was constructed by merging interactions extracted from human curated pathways with interactions predicted using a machine learning approach. This tool can also be used to perform pathway-based data analysis by using high quality human-curated pathways in the Reactome database 4, the most comprehensive open source pathway database.
Implementation
Software architecture
We used conventional three-tier software architecture to implement ReactomeFIViz ( Figure 1). The back-end contains several databases hosted in the open-source MySQL database engine ( http://www.mysql.com). The middle server-side application uses hibernate ( http://hibernate.org) to access the databases storing FIs and cancer gene index data (see below). The server-side application also uses the in-house developed Reactome API for Object/Relational mapping to access pathway-related contents stored in a database using the Reactome database schema. On the server-side, a lightweight servlet container, Spring Framework ( http://projects.spring.io/spring-framework/), and a Java RESTful framework, Jersey ( https://jersey.java.net), are used to power a RESTful API for the Cytoscape front-end. The front-end Cytoscape app uses this RESTful API to communicate with the server-side application. Almost all analysis features in the app are provided by this RESTful API, which should also facilitate their use by other front-end applications, such as a web browser or tablet app.
For cancer data analysis, we imported the cancer gene index (CGI, https://wiki.nci.nih.gov/display/cageneindex) data into a MySQL database and then developed a hibernate API for the server-side application. The CGI data contains annotations for cancer-related genes. These annotations were extracted by using text-mining technologies and then validated by human curators ( https://wiki.nci.nih.gov/display/cageneindex/Creation+of+the+Cancer+Gene+Index).
The Reactome FI network is updated annually. We recommend using the latest version of the FI network. Different versions of the FI network may yield different results due to updates to gene interactions, so we have also deployed two older versions of the FI network to use for comparison of legacy data sets and to reproduce published results.
R ( http://www.r-project.org) is used in the server-side for executing network module-based survival analysis and other statistical computations. ReactomeFIViz uses Java based methods in the server-side to call functions in R. Users of our app don’t need to install R in their machines in order to perform the statistical analyses implemented in the app.
ReactomeFIViz is designed and implemented for Cytoscape 3, and includes all features in Reactome FI Cytoscape plug-in for Cytoscape 2. Users are recommended to use the latest version of our app for Cytoscape 3.
Network analysis features
ReactomeFIViz implements multiple features for users to perform network-based data analysis, including FI sub-network construction 3, network module discovery 3, functional annotation 3, HotNet mutation analysis 5, 6, and network module-based gene signature discovery from microarray data sets 7. The HotNet algorithm 5, 6 was implemented by porting python and MatLab code of HotNet _v1.0.0 (downloaded from http://compbio.cs.brown.edu/projects/hotnet/) to Java and R. For details about other algorithms and their implementations, please refer to our previous work 3, 7.
The majority of interactions in the Reactome FI network are extracted from reactions and complexes. In order to display semantic meanings (e.g. catalysis, activation and inhibition) of these interactions, we created a Reactome FI network specific visual style. This visual style is registered as a service using the OSGi API supported by Cytoscape 3, and applied to newly constructed FI sub-network automatically for network analysis.
Pathway analysis features
Since version 4.0.0.beta, released in January 2014, ReactomeFIViz allows users to explore a list of high quality, human curated Reactome pathways, visualize Reactome pathways directly in Cytoscape, and perform pathway enrichment analysis on a list of genes based on a binomial test 8. In April 2014, we added a new experimental feature for performing integrated pathway analysis for multiple genomic data types by adapting a factor graph based approach called “PARADIGM” 9 into ReactomeFIViz.
The Reactome database contains several hundred manually laid-out pathway diagrams 4. Pathway diagrams in Reactome are drawn based on biochemical reactions. A reaction usually contains multiple inputs and outputs, in addition to catalysts, inhibitors and activators. The network model in Cytoscape is designed to support simple graphs containing edges between two nodes only. In order to display Reactome pathway diagrams, we adapted the pathway diagram view in the Reactome curator tool 4 into the Cytoscape environment, and wrapped it in a JInternalFrame so that a pathway view can be displayed along with a network view in the Cytoscape desktop ( Figure 2).
Results
ReactomeFIViz provides a suite of features to assist users to perform pathway- and network-based data analyses ( Figure 3). Based on a list of genes loaded from a file, the user can construct a sub-network, perform network clustering to search for network modules related to patient clinical or other phenotypic information, annotate network modules, perform pathway enrichment analysis, and even model pathway activities based on probabilistic graphical models 9. By performing pathway- and network-based analyses using ReactomeFIViz, researchers will be able to uncover pathway and network patterns related to their studies and then link found patterns to clinical phenotypes 3, 7.
As an example, we present results generated from network module based analysis for the TCGA ovarian cancer mutation data 10 using ReactomeFIViz. The TCGA mutation data file and clinical information file were downloaded from the Broad Institute Firehose web site https://confluence.broadinstitute.org/display/GDAC, released in July 2012. The clinical information has been pre-processed.). For this data set, we chose the 2009 version of the FI network, and picked genes mutated in three or more samples to construct a FI sub-network. We performed a network clustering, followed by survival analysis for each network module by splitting samples into two groups: samples having genes mutated in the module (Group 1) and samples not having genes mutated in the module (Group 0). Our results indicate that group 1 samples ( Figure 4, green line in the Kaplan-Meier plot 11) have significantly longer overall survival times compared to group 0 samples ( Figure 4, red line in the Kaplan-Meier plot) (p-value = 3.4 × 10 -5 based on the CoxPH analysis 12) based on module 3. Pathway enrichment analysis results imply that module 3 is enriched with genes in calcium signaling pathway ( http://www.genome.jp/kegg/pathway/hsa/hsa04020.html) and mitotic G2/M transition ( http://www.reactome.org/cgi-bin/control_panel_st_id?ST_ID=REACT_2203.2). These results suggest that mutations impacting calcium signaling and the cell cycle may increase the survival of ovarian cancer patients. However, we may need more samples and independent data sets to validate our conclusion.
Using the same version of ReactomeFIViz but different versions of the FI network may yield different results because of updates of protein interactions in the FI network. We performed the same analysis with the latest version of the FI network (the 2013 version), and found that genes in module 3 from the 2009 version of the FI network have been split among several modules discovered using the newer version of the FI network. The module having the largest overlap with module 3 from the 2009 version of the FI network has the most significant p-value from the survival analysis (p-value = 1.1 × 10 -3 from CoxPH), which implies that our method is fairly robust against updates of the FI network. For details, see the supplementary results.
Discussion
Our Cytoscape app provides a suite of features for users to perform network- and pathway-based analysis for data generated from multiple experiments related to cancer and other complex diseases. Users can use our tool to search for disease-related network and pathway patterns. Our tool is built upon the Reactome database, arguably the most comprehensive human curated open source pathway database, and leverages the highly reliable functional interaction network extracted from human curated pathways. Many studies based on the FI network and this app have shown its many applications to cancer and other disease studies 13– 16.
For future development, we will focus on using probabilistic graphical models, such as factor graphs, for performing pathway modeling and linking results to patient clinical information in order to uncover cellular mechanisms related to cancer drug sensitivity, search for cancer biomarkers, and assist new drug development.
Data availability
Data files used in the example: http://reactomews.oicr.on.ca:8080/caBigR3WebApp/TCGA_OV_Firehose_MAF_CLIN_2012.zip.
Use the detailed procedures described in our user guide to reproduce the results described in the example: http://wiki.reactome.org/index.php/Reactome_FI_Cytoscape_Plugin.
Software availability
Homepage: http://wiki.reactome.org/index.php/Reactome_FI_Cytoscape_Plugin
Cytoscape app: http://apps.cytoscape.org/apps/reactomefiplugin
Latest source code: https://github.com/reactome-fi/CytoscapePlugIn
Source code as at the time of publication: https://github.com/F1000Research/CytoscapePlugIn
Archived source code as at the time of publication: http://www.dx.doi.org/10.5281/zenodo.10385 17
License: the Creative Commons Attribution 3.0 Unported License ( http://www.reactome.org/?page_id=362).
Acknowledgements
We would like to thank Irina Kalatskaya, Christina Yung and other members in Dr. Lincoln Stein’s group for software testing and many feedbacks. We also thank Peter D’Eustachio and Irina Kalatskaya for reading and editing the manuscript. We are indebted to Alexander Pico and William Longabaugh from the Cytoscape core development team for reviewing the manuscript and providing many suggestions.
Funding Statement
This project is supported by a NIH grant (2U41HG003751-05) to LS and a Genome Canada grant (OGI 5458) to LS.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
v2; ref status: indexed
Supplementary results
In this supplementary document, we describe analysis results for the example data set using the 2013 version of the FI network. Figure S1 shows two modules, module 4 and module 11, having smallest p-values from survival analyses based on the CoxPH model. Figure S2 shows the Kaplan-Meier plots for Modules 4 and 11. Overlapping analysis ( Table S1) indicated that 58 genes in the module 3 from the 2009 version of the FI network have been spread into module 4 (18 genes, p-value = 5.5 × 10 -13 based on hypergeometric test), module 3 (21 genes, p-value = 5.8 × 10 -8), module 26 (2 genes, p-value = 0.01), module 11 (5 genes, p-value = 0.02), module 1 (4 genes, p-value = 1.0), and module 0 (1 gene, p-value = 1.0). It is interesting to see that module 4 has the most significant overlap with module 3 from the 2009 version of the FI network, and also has the most significant p-value from the survival analyses (p-value = 1.1 × 10 -3 from CoxPH), which implies that our method is fairly robust against updates of the FI network.
Table S1. Distribution of module 3 genes from the 2009 version of the FI network into modules from the 2013 version of the FI network.
Module | Size | Shared | P-value |
---|---|---|---|
4 | 30 | 19 | 5.5E-13 |
3 | 60 | 21 | 5.8E-08 |
26 | 2 | 2 | 1.2E-02 |
11 | 16 | 5 | 2.2E-02 |
1 | 97 | 4 | 1.0E+00 |
0 | 110 | 1 | 1.0E+00 |
References
- 1.Barabási AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68 10.1038/nrg2918 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Vidal M, Cusick ME, Barabási AL: Interactome networks and human disease. Cell. 2011;144(6):986–998 10.1016/j.cell.2011.02.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wu G, Feng X, Stein L: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 2010;11(5):R53 10.1186/gb-2010-11-5-r53 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Croft D, Mundo AF, Haw R, et al. : The Reactome pathway knowledgebase. Nucleic Acids Res. 2014;42(Database issue):D472–7 10.1093/nar/gkt1102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vandin F, Upfal E, Raphael BJ: Algorithms for detecting significantly mutated pathways in cancer. J Comput Biol. 2011;18(3):507–22 10.1089/cmb.2010.0265 [DOI] [PubMed] [Google Scholar]
- 6.Vandin F, Clay P, Upfal E, et al. : Discovery of mutated subnetworks associated with clinical data in cancer. Pac Symp Biocomput. 2012;55–66 10.1142/9789814366496_0006 [DOI] [PubMed] [Google Scholar]
- 7.Wu G, Stein L: A network module-based method for identifying cancer prognostic signatures. Genome Biol. 2012;13(12):R112 10.1186/gb-2012-13-12-r112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucl Acids Res. 2009;37(1):1–13 10.1093/nar/gkn923 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Vaske CJ, Benz SC, Sanborn JZ, et al. : Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26(12):i237–45 10.1093/bioinformatics/btq182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474(7363):609–15 10.1038/nature10166 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kleinbaum DG, Klein M: Survival Analysis: A Self Learning Guide. New York: Springer.2005. Reference Source [Google Scholar]
- 12.Cox D: Regression Models and Life Tables (with Discussion). J R Stat Soc B. 1972;34(2):187–220 Reference Source [Google Scholar]
- 13.Shah SP, Roth A, Goya R, et al. : The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012;486(7403):395–9 10.1038/nature10933 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sawey ET, Chanrion M, Cai C, et al. : Identification of a therapeutic strategy targeting amplified FGF19 in liver cancer by Oncogenomic screening. Cancer Cell. 2011;19(3):347–58 10.1016/j.ccr.2011.01.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tamborero D, Gonzalez-Perez A, Perez-Llamas C, et al. : Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep. 2013;3:2650 10.1038/srep02650 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kalatskaya I, Wu G: Pathway and network based analysis using Functional Interaction network. Genomics II Bacteria, Viruses and Metabolic PathwaysiConcept.2013. Reference Source [Google Scholar]
- 17.Wu G, Dawson E, Duong A, et al. : Reactome FI Cytoscape plugin. Zenodo. 2014. Data Source