Version Changes
Revised. Amendments from Version 1
Incorporated Sandra Orchard's and Pablo Porras's suggestions to add a reference, to clarify that interactions between input list genes or proteins are considered in the network creation and calculations and to elucidate that CHAT can not only be applied to gene lists but also works for protein lists.
Abstract
Highly connected nodes (hubs) in biological networks are topologically important to the structure of the network and have also been shown to be preferentially associated with a range of phenotypes of interest. The relative importance of a hub node, however, can change depending on the biological context. Here, we report a Cytoscape app, the Contextual Hub Analysis Tool (CHAT), which enables users to easily construct and visualize a network of interactions from a gene or protein list of interest, integrate contextual information, such as gene expression or mass spectrometry data, and identify hub nodes that are more highly connected to contextual nodes (e.g. genes or proteins that are differentially expressed) than expected by chance. In a case study, we use CHAT to construct a network of genes that are differentially expressed in Dengue fever, a viral infection. CHAT was used to identify and compare contextual and degree-based hubs in this network. The top 20 degree-based hubs were enriched in pathways related to the cell cycle and cancer, which is likely due to the fact that proteins involved in these processes tend to be highly connected in general. In comparison, the top 20 contextual hubs were enriched in pathways commonly observed in a viral infection including pathways related to the immune response to viral infection. This analysis shows that such contextual hubs are considerably more biologically relevant than degree-based hubs and that analyses which rely on the identification of hubs solely based on their connectivity may be biased towards nodes that are highly connected in general rather than in the specific context of interest.
Availability: CHAT is available for Cytoscape 3.0+ and can be installed via the Cytoscape App Store ( http://apps.cytoscape.org/apps/chat).
Keywords: Network analysis, hypergeometric test, hubs, gene expression data, contextual hub analysis, CHAT
Introduction
Network analysis has emerged as a powerful approach to elucidate biological and disease processes 1. Biological networks (and many other types of networks) have been shown to have a power law distribution of node connectivity, with most nodes having few connections and a few nodes being highly connected 2. The identification of such highly connected nodes, termed hubs, is often of interest as hubs have been shown to be topologically and functionally important. The deletion of genes encoding hub proteins, for example, has been shown to correlate with lethality in yeast (the centrality-lethality rule) 3. Hubs have also been found to be preferentially targeted by both bacterial and viral pathogens 4 and may be master regulators of biological processes 5. Biological networks, such as the human interactome, however, are not static entities 6, and the extent to which a node acts as a hub can change depending on the biological context e.g. the network present in a specific cell type at a particular point in time 7, 8. Integrating contextual information, such as gene or protein expression data, with standard network analysis can provide insight into what are the most relevant network features in a particular study or context 9– 11.
Cytoscape has a number of applications to identify hubs in networks including cytoHubba 12, APID2Net 13, PinnacleZ 14, NetworkAnalyzer 15, 16 and CentiScaPe 17, however, only the latter two are compatible with Cytoscape 3+. All of the applications available to date identify hubs based on node connectivity (degree) in a network of interest. To construct a network, users frequently query interaction databases to identify the interactors of a list of genes of interest, e.g. differentially expressed genes, and then identify the high degree nodes in this network. This approach to constructing a network is useful because it identifies a more fully connected network for analysis than would be the case if one restricted interactions to only those that occur between nodes in the gene list. Analysis of these networks can, for example, identify subnetworks that are enriched in (but do not exclusively consist of) differentially expressed genes, or identify non-differentially expressed nodes that are topologically important in the network, both of which would otherwise not be identified. Identifying hubs in these networks, however, is biased towards identifying nodes that are highly connected in general such as promiscuous, ubiquitous or well-studied nodes, because nodes with many interactions in the query database have a higher probability of being included in the network by chance alone. Analysis of these degree-based hubs, for example identifying what biological processes or pathways these nodes are enriched in, tells us little about the experimental context of interest and more about the properties of highly connected nodes in general. A more appropriate analysis is to determine which nodes interact with relevant nodes in the network (which we term contextual nodes) more than is statistically expected.
Here, we introduce the Contextual Hub Analysis Tool (CHAT), a Cytoscape App that identifies hub nodes that interact with more "contextual" nodes (e.g. differentially expressed genes or proteins) than statistically expected in networks integrated with user-supplied contextual data (e.g. gene expression data). We term these nodes contextual hubs. We show that such contextual hubs are considerably more relevant than degree-based hubs to the specific experimental context under investigation. As such, these nodes are promising candidates for further functional validation studies and potentially represent important points in the network for drug targeting.
Methods
Implementation
CHAT was written in Java 8 as an Open Services Gateway Initiative (OSGi) bundle for Cytoscape 3.0+ 18. It adds a “CHAT” option in the “Apps” menu that launches a popup window, which allows users to adjust different network initialization parameters. CHAT prompts users to input a list of gene identifiers (the supported ID types are dependent on the database selected by the user) and any associated contextual data, e.g. gene expression data associated with the genes. While the focus of this paper is on genes, CHAT can equally be applied to proteins. The OK button triggers Cytoscape’s TaskManager to run a task that initiates the network construction and adds a tab to the results panel that provides functionality to further modify and analyze the network. To create the network, CHAT finds all the first neighbor interactors of the user-provided genes (or their encoded products). Interaction data is retrieved from one of the databases included in the PSICQUIC registry 19, which the user can select. Note that interactions between the first neighbors are considered by CHAT but these are not included in the network visualization for clarity reasons. Once the network has been constructed, CHAT performs a hypergeometric test on each node in the network to identify nodes that interact with contextual nodes more than expected by chance. The probability that a given hub has k or more contextual interactors among its n interactors is given by the hypergeometric distribution:
Where N is the number of genes with at least one interaction in the database queried and K is the number of contextually relevant nodes provided by the user (with at least one interaction in the database queried). Overrepresentation analysis heavily depends on the choice of background dataset for the determination of N. To estimate the background frequency K/ N, CHAT provides access to interaction data from databases available in the PSICQUIC registry. Databases with less than 10,000 interactions are excluded. The number of genes in the user-selected database that have at least one interaction (of the specified type) in which both interactors match the user-selected criteria for constructing the network (species, interaction type and ID type) determine the node population size N. Self-interactions are disregarded. Interactions between input genes and between their first neighbors are considered in the CHAT analysis. P-values calculated by CHAT are automatically corrected for multiple testing using the Benjamini-Hochberg procedure 20, a method widely used in bioinformatics to avoid high false discovery rates. The Bonferroni approach is widely considered to be too strict 21.
A right click on a node brings up an option to activate the “Node Analyzer” mode, which allows the user to analyze the connectivity pattern of individual hubs of interest. Using this function will display the node analyzer table on the results panel and all nodes except the selected node and its interactors will be hidden in the network visualization. The execution time of CHAT varies between a few seconds and a few minutes based on the number of user-supplied (contextual) genes, the size of the chosen database and its connection speed as well as the user-selected network layout. These factors also influence memory consumption.
Operation
The identification of the top contextual hubs consists of three primary steps: 1) input of a user-supplied gene list and contextual data, 2) network construction and statistical analysis to identify nodes that preferentially interact with contextual nodes and 3) visualization of the top contextual hubs and their interactions and comparison to the top degree-based hubs. To construct a network using CHAT, the user must provide a list of gene identifiers and associated numerical or categorical attributes in the text box in tab-delimited format, or upload the data as a csv or tab-delimited file via the upload button ( Figure 1) (.csv or .txt file types). The user can then specify which genes in the uploaded list are contextually important based on the user-provided contextual data (e.g. genes with > 2 fold-change in expression). The user then selects one of the databases in the PSICQUIC registry to query, and specifies the relevant species, ID type and interaction type for the query. The user can then choose to visualize the network using any of the layout algorithms available in Cytoscape. Clicking the OK button creates the network and a new tab in the results panel, which allows the user to visualize the network and to analyze the results further ( Figure 2). The results panel is split into several parts. In the first part, the parameters used to generate the network (database, species, id type and interaction type(s)) are displayed. The second panel allows the user to compare the top contextual hubs and the top degree-based hubs at the click of a button. By default, node size and node color are proportional to the node’s corrected p-value calculated by CHAT, such that the smaller the p-value (i.e. more statistically significant), the larger the node size and the darker the red coloring of the node. The user can customize the color scheme, however. In contrast, if the users selects “Show degree hubs”, the visualization changes and the node size and coloring will now be proportional to each node’s degree in the selected database. By default, CHAT displays the top 20 contextual hubs but the user can adjust this by using the slider provided. To investigate a single node in detail the user can employ CHAT’s “Node Analyzer” by right clicking on a node. This will limit the network view to show only the selected node and its interactors and will display a table at the bottom of the results panel tab with information on the node’s name, p-value and its interactors.
Use case
As a demonstration of its potential utility and as validation, CHAT was used to construct a network using a dataset of 462 genes that have been reported to be up-regulated during Dengue fever, a mosquito-borne viral infection 22 (Ensembl gene IDs for these 462 genes are provided in Dataset 1). These 462 genes represent the contextual data for this case study. CHAT was used to construct a network of these genes and their first neighbor interactors using interaction data that was sourced from InnateDB 23, 24 via the PSICQUIC web service (InnateDB-All). A network of 4,910 nodes was generated. CHAT was then used to identify the top 20 conventional hub nodes (based solely on degree) and the top 20 contextual hub nodes in the network ( Figure 3). No nodes were in common in the two top 20 lists. InnateDB pathway analysis 23, 24 revealed that the top 20 degree-based hubs were enriched in pathways related to the cell cycle and cancer ( Supplementary Table 1), which is likely due to the fact that proteins involved in these processes tend to be highly connected in general. In comparison to degree-based hubs, the top 20 contextual hubs were statistically enriched in pathways related to the immune response to viral infection, such as the interferon signaling pathway; the Retinoic acid inducible gene-I (RIG-I) pathway; the Toll-like receptor (TLR) pathway; and the Janus kinase (JAK) - Signal Transducer and Activator of Transcription (STAT) pathway ( Supplementary Table 2). All of these pathways have been shown to play key roles in the host response to Dengue infection 25, 26. Indeed, many of the top 20 contextual hubs (but not degree-based hubs) were well-known transcription factors involved in the host interferon response including STAT1, STAT2 and the interferon regulatory factors (IRFs); IRF1, 3, 8 and 9, which is a key cellular response to viral infection including Dengue 27, 28. Another gene identified in the contextual hub analysis but not the degree-based analysis was interferon-stimulated gene 15 (ISG15). Cells in which ISG15 has been silenced have been shown to have significantly higher Dengue viral loads 29. The results of the pathway analysis were reinforced by a Gene Ontology analysis using innatedb.com 23, 24, which identified terms including cytokine-mediated signaling pathway, type I interferon signaling pathway, and innate immune response among the top 10 enriched terms (FDR < 0.05) for the contextual hubs but not the degree-based hubs ( Supplementary Table 3 and Supplementary Table 4).
Conclusion
Through the integration of contextual information, such as gene or protein expression, contextual hub analysis as implemented in CHAT can identify context-specific hubs more relevant to the biological context under study, such as disease, treatment or cellular state. As shown in the above case study, these hubs are of more functional relevance than genes found through analysis based on degree only. Given the current emphasis on the importance of considering the network model of biological pathways and the ever-increasing abundance of high-throughput data, CHAT provides a valuable addition to the biologists’ computational toolkit in using a network-based approach to help prioritize genes of interest for further investigation or drug discovery. In the future, CHAT can be extended to include the contextual analysis of other network features such as network bottlenecks.
Data availability
The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2016 Muetze T et al.
Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). http://creativecommons.org/publicdomain/zero/1.0/
F1000Research: Dataset 1. Use case data: 462 genes that have been reported to be up-regulated during Dengue fever infection, 10.5256/f1000research.9118.d128126 30
Software availability
Software available from: http://apps.cytoscape.org/apps/chat
Latest source code: https://bitbucket.org/dynetteam/chat
Archived source code at time of publication: http://www.dx.doi.org/10.5281/zenodo.56496 31
Manual/Tutorial: https://bitbucket.org/dynetteam/chat/downloads
License: Lesser GNU Public License 3.0
Funding Statement
The research leading to these results received funding from the European Union Seventh Framework Programme (FP7/2007-2013) PRIMES project under grant agreement number FP7-HEALTH-2011-278568. The Lynn Group is also supported by EMBL Australia.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 2; referees: 2 approved]
Supplementary material
.
.
.
.
References
- 1. Barabasi AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. 10.1038/nrg2918 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Barabasi AL, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–113. 10.1038/nrg1272 [DOI] [PubMed] [Google Scholar]
- 3. Jeong H, Mason SP, Barabási AL, et al. : Lethality and centrality in protein networks. Nature. 2001;411(6833):41–42. 10.1038/35075138 [DOI] [PubMed] [Google Scholar]
- 4. Dyer MD, Murali TM, Sobral BW: The landscape of human proteins interacting with viruses and other pathogens. PLoS Pathog. 2008;4(2):e32. 10.1371/journal.ppat.0040032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Borneman AR, Leigh-Bell JA, Yu H, et al. : Target hub proteins serve as master regulators of development in yeast. Genes Dev. 2006;20(4):435–448. 10.1101/gad.1389306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Przytycka TM, Singh M, Slonim DK: Toward the dynamic interactome: It’s about time. Brief Bioinform. 2010;11(1):15–29. 10.1093/bib/bbp057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Rachlin J, Cohen DD, Cantor C, et al. : Biological context networks: a mosaic view of the interactome. Mol Syst Biol. 2006;2:66. 10.1038/msb4100103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Agarwal S, Deane CM, Porter MA, et al. : Revisiting date and party hubs: Novel approaches to role assignment in protein interaction networks. PLoS Comput Biol. 2010;6(6):e1000817. 10.1371/journal.pcbi.1000817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gao S, Wang X: Identification of highly synchronized subnetworks from gene expression data. BMC Bioinformatics. 2013;14(Suppl 9):S5. 10.1186/1471-2105-14-S9-S5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Zinman GE, Naiman S, O'Dee DM, et al. : ModuleBlast: identifying activated sub-networks within and across species. Nucleic Acids Res. 2015;43(3):e20. 10.1093/nar/gku1224 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Soul J, Hardingham TE, Boot-Handford RP, et al. : PhenomeExpress: a refined network analysis of expression datasets by inclusion of known disease phenotypes. Sci Rep. 2015;5:8117. 10.1038/srep08117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Chin CH, Chen SH, Wu HH, et al. : cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(Suppl 4):S11. 10.1186/1752-0509-8-S4-S11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Hernandez-Toro J, Prieto C, De Las Rivas J: APID2NET: Unified interactome graphic analyzer. Bioinformatics. 2007;23(18):2495–2497. 10.1093/bioinformatics/btm373 [DOI] [PubMed] [Google Scholar]
- 14. Chuang HY, Lee E, Liu YT, et al. : Network-based classification of breast cancer metastasis. Mol Syst Biol. 2007;3:140. 10.1038/msb4100180 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Assenov Y, Ramírez F, Schelhorn SE, et al. : Computing topological parameters of biological networks. Bioinformatics. 2008;24(2):282–284. 10.1093/bioinformatics/btm554 [DOI] [PubMed] [Google Scholar]
- 16. Doncheva NT, Assenov Y, Domingues FS, et al. : Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc. 2012;7(4):670–85. 10.1038/nprot.2012.004 [DOI] [PubMed] [Google Scholar]
- 17. Scardoni G, Tosadori G, Faizan M, et al. : Biological network analysis with CentiScaPe: centralities and experimental dataset integration [version 2; referees: 2 approved]. F1000Research. 2014;3:139. 10.12688/f1000research.4477.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Shannon P, Markiel A, Ozier O, et al. : Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Aranda B, Blankenburg H, Kerrien S, et al. : PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat Methods. 2011;8(7):528–529. 10.1038/nmeth.1637 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57(1):289–300. Reference Source [Google Scholar]
- 21. Noble WS: How does multiple testing correction work? Nat Biotechnol. 2009;27(12):1135–1137. 10.1038/nbt1209-1135 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Hoang LT, Lynn DJ, Henn M, et al. : The early whole-blood transcriptional signature of dengue virus and features associated with progression to dengue shock syndrome in Vietnamese children and young adults. J Virol. 2010;84(24):12982–94. 10.1128/JVI.01224-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Breuer K, Foroushani AK, Laird MR, et al. : InnateDB: systems biology of innate immunity and beyond--recent updates and continuing curation. Nucleic Acids Res. 2013;41(Database issue):D1228–1233. 10.1093/nar/gks1147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. del-Toro N, Dumousseau M, Orchard S, et al. : A new reference implementation of the PSICQUIC web service. Nucleic Acids Res. 2013;41(Web Server issue):W601–6. 10.1093/nar/gkt392 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Nasirudeen AM, Wong HH, Thien P, et al. : RIG-I, MDA5 and TLR3 synergistically play an important role in restriction of dengue virus infection. PLoS Negl Trop Dis. 2011;5(1):e926. 10.1371/journal.pntd.0000926 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Souza-Neto JA, Sim S, Dimopoulos G: An evolutionary conserved function of the JAK-STAT pathway in anti-dengue defense. Proc Natl Acad Sci U S A. 2009;106(42):17841–6. 10.1073/pnas.0905006106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. De La Cruz Hernández SI, Puerta-Guardo H, Flores-Aguilar H, et al. : A strong interferon response correlates with a milder dengue clinical condition. J Clin Virol. 2014;60(3):196–199. 10.1016/j.jcv.2014.04.002 [DOI] [PubMed] [Google Scholar]
- 28. Morrison J, García-Sastre A: STAT2 signaling and dengue virus infection. JAKSTAT. 2014;3(1):e27715. 10.4161/jkst.27715 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Dai J, Pan W, Wang P: ISG15 facilitates cellular antiviral response to dengue and west nile virus infection in vitro. Virol J. 2011;8:468. 10.1186/1743-422X-8-468 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Muetze T, Goenawan IH, Wiencko HL, et al. : Dataset 1 in: Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks. F1000Research. 2016. Data Source [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Muetze T, Goenawan IH, Wiencko HL, et al. : Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks. Zenodo. 2016. Data Source [DOI] [PMC free article] [PubMed] [Google Scholar]