Abstract
Here we present the Cytoscape app version of our association network inference tool CoNet. Though CoNet was developed with microbial community data from sequencing experiments in mind, it is designed to be generic and can detect associations in any data set where biological entities (such as genes, metabolites or species) have been observed repeatedly. The CoNet app supports Cytoscape 2.x and 3.x and offers a variety of network inference approaches, which can also be combined. Here we briefly describe its main features and illustrate its use on microbial count data obtained by 16S rDNA sequencing of arctic soil samples. The CoNet app is available at: http://apps.cytoscape.org/apps/conet.
Keywords: network generation, network construction, network inference, association networks, microbial networks, CoNet, Cytoscape
Introduction
Modern sequencing technology in combination with dedicated analysis pipelines allows determining the relative abundances of microbial community members, thereby obtaining microbial count data. Such community profiling experiments have been carried out for thousands of samples from a variety of ecosystems, ranging from the world’s oceans ( Bork et al., 2015) to the human gut ( Falony et al., 2016; The Human Microbiome Project Consortium, 2012).
The analysis of species abundance patterns has a long tradition in ecology ( Connor & Simberloff, 1979; Diamond, 1975; Gotelli & McCabe, 2002). More specifically, co-occurrence analysis detects significant co-occurrences or mutual exclusions across samples, which are interpreted as representing ecological relationships such as mutualism or competition or being due to similar responses to environmental factors. Co-occurrence analysis is an instance of network inference, an exploratory data analysis technique that attempts to unravel relationships between objects from repeated observations. The large number of microbial count tables resulting from the multitude of recent sequencing projects (e.g. Bork et al., 2015; Falony et al., 2016; Gilbert et al., 2014; The Human Microbiome Project Consortium, 2012) opens the way to unraveling the complex relationships between microorganisms from their abundances across samples. CoNet was developed to carry out microbial network inference, but its generic design makes it applicable to any data set where objects have been observed repeatedly.
Methods/Implementation
The CoNet app wraps the CoNet command line tool. The command line and Cytoscape 2.× app version are implemented in Java 1.6, whereas the Cytoscape 3.× app version requires Java 1.7.
Implementation challenges and decisions
In general, the CoNet app is designed with minimum contact to Cytoscape, to ensure consistent behavior across different Cytoscape versions and to ease porting to future Cytoscape versions. The CoNet app is linked to Cytoscape only via its main menu and graph visualization classes. The Cytoscape-version-specific implementation of the graph visualization class is loaded via reflection at run time and is entirely separated from graph generation.
A major challenge for the implementation of the CoNet app is inclusion of the large number of options available in CoNet, which allows users to customize each network inference step, from data preprocessing via threshold setting, network construction and assessment of significance. This problem was solved by implementing a single user input handling class, which collects and checks user input from the various menus and submits it to CoNet once the GO button is pushed. This design allows to export and to read in user settings files, which make experiments carried out with the CoNet app more reproducible.
Another challenge is the command line support. Network inference from large data sets is not feasible within Cytoscape and CoNet is best run on command line for these cases. To facilitate this step for the inexperienced user, the current settings of the CoNet app can be exported as a command line call, by clicking the "Generate command line call" button. This call can then be executed on command line by including the CoNet jar file in the class path. Networks generated on command line can be loaded either via Cytoscape network import functions (if saved in gml format (( Himsolt)) or more conveniently via the CoNet app (if saved in the custom gdl format). The CoNet app's manual includes a step-by-step tutorial for command line usage.
The CoNet app also integrates the popular network inference R Bioconductor package minet ( Meyer et al., 2008). We decided to integrate it loosely via Rserve, a Java-R bridge capable of transferring R objects to Java and vice versa ( http://rforge.net/Rserve/). Thus, advanced users can install and launch the Rserve server in R and configure the Rserve client settings (i.e. host and port) in CoNet app's configuration menu. The CoNet app's manual explains Rserve installation and usage.
Finally, we also implemented solutions for error and help display. The CoNet app displays help pages in html format, which allows the user to follow links within these pages. The CoNet app's pdf manual is compiled from the help pages using prince ( http://www.princexml.com/). Each menu is linked to its specific help page, easing navigation.
When an error has been captured, an error report is generated that includes the error message as well as the CoNet app's current settings.
Network inference workflow
CoNet takes a presence/absence, count or abundance matrix as input, where rows represent the objects of interest and columns their observations across locations or time points. Optionally, a second input matrix can be provided. This is of interest when two different measurements have been made for the same samples, for instance counts of microorganisms and concentrations of metabolites. CoNet's output consists of a network where significantly associated objects are connected by edges.
Depending on the data type, a number of filters needs to be applied. For instance, for 16S rDNA count data, taxa with too few non-zero observations need to be removed and the data needs to be normalized or rarefied to account for sequencing depth differences. In the next step, the user can select from a number of different correlations (Pearson, Spearman, Kendall), similarities (mutual information, Steinhaus, distance correlation etc.) or dissimilarities (Kullback Leibler, Euclidean, Bray Curtis, Jensen-Shannon etc.) to score the association strength between the objects. For presence/absence (also termed incidence) data, the hypergeometric distribution or Jaccard distance can be chosen for the same purpose. CoNet's special strength is its capability to combine multiple such measures and/or to combine these measures with other network inference algorithms, e.g. those implemented in minet. The idea behind such an ensemble approach to network inference is to exploit the fact that different methods make different mistakes. If erroneous edges predicted by one method are not supported by the others, they can be filtered out, thereby reducing the number of false positives. The thresholds for the measures can be either set manually (using sliding windows for bounded measures) or automatically, by specifying the desired number of edges in the output network. The network can then be displayed either as a multigraph (with as many edges between two objects as selected measures) or as a graph (where scores of individual measures are combined). Optionally, the significance of the associations can be computed, e.g. with a permutation test. Multiple testing correction can be performed with either Bonferroni or Benjamini-Hochberg procedures. Figure 1 summarizes this workflow.
Special features
CoNet offers a series of features that distinguish it from other network inference tools, such as its support for object groups. This feature allows a user to assign objects to different groups ( e.g. metabolites and enzymes). Relationships can then be inferred only between different object types (resulting in a bipartite network) or only within the same object type. CoNet's treatment of two input matrices is built upon this feature.
Furthermore, CoNet can handle row metadata, which allows for instance to infer links between objects at different hierarchical levels ( e.g. between order Lactobacillales and genus Ureaplasma) while preventing links between different levels of the same hierarchy (e.g. Lactobacillales and Lactobacillaceae). CoNet can also parse sample metadata such as temperature or oxygen concentration, which are then correlated with the objects in the input matrix while being excluded from normalization. In addition, CoNet recognizes abundance tables generated from biom files ( McDonald et al., 2012) and, in its Cytoscape 3.× version, reads biom files in HDF5 format directly, using the BiomIO Java library ( Ladau). Phylogenetic lineages in these tables are automatically parsed and displayed as node attributes of the resulting network. CoNet also computes a few node properties, such as a node's total edge number as well as the number of positive and negative edges, the total row sum and the number of samples in which the object was observed (e.g. was different from zero or a missing value).
To ease the selection of suitable preprocessing steps, CoNet can display input matrix properties and recommendations based on them. Importantly, CoNet can also handle missing values, by omitting sample pairs with missing values from the association strength calculation. Finally, CoNet supports a few input and output network formats absent in Cytoscape, including adjacency matrices (import), dot (the format of GraphViz ( http://www.graphviz.org/)) and VisML (VisANT's format ( Hu et al., 2013)) (both for export).
Results
Use case: microbial relationships in the arctic soil
We demonstrate the abilities of the CoNet app on a real-world example taken from the Qiita database ( The Qiita Development Team, 2015). The Qiita database, which merges the previously separated QIIME and EMP databases, is a rich resource for processed 16S rDNA sequence data: each study is accompanied by a microbial count file in biom format computed from the raw sequence data with the QIIME pipeline ( Caporaso et al., 2010).
In our example, we will demonstrate how to build an association network from microbial count data obtained from arctic soil samples ( Chu et al., 2010). This data set was chosen for its sample number (sufficient to compute associations but short run times) as well as for the biological insights that are gained from the network analysis. The example showcases the CoNet app's ability to compute associations between higher taxonomic levels and to take environmental metadata into account, which is important for the interpretation of predicted microbial relationships.
In the Qiita database, the arctic soil study can be found under the title "Soil bacterial diversity in the Arctic is not fundamentally different from that found in other biomes" (study identifier: 104, see Supplementary material). This data set consists of 52 soil samples from the arctic tundra, which were sequenced with Roche FLX using primers targeting the V1V2 region of the 16S rDNA. The processed data can be downloaded from the Qiita study page (in Data Types, click on 16S, then click on the URL appearing below, expand the Files network, click on the file object containing BIOM in its name and then download the file with suffix .biom). The study also provides a mapping file with sample metadata (on the Qiita study page, click Sample Information and then the Sample Info button). We extract the pH of each sample by loading the sample information file into Excel, selecting the sample_name and ph columns and saving them to a separate, tab-delimited file.
Combining multiple measures
The CoNet app is composed of the main window and several menus, including a "Data" menu with input and output options, a "Preprocessing and filter" menu, a "Methods" menu to select network construction methods, a "Merge" menu where the user can specify how results from different network construction methods should be merged, a "Randomization" menu for the assessment of edge significance and finally a "Config" menu for configuration.
In the following, we will build a network from the arctic tundra biom file. First, in the "Data" menu, the arctic tundra biom file is selected and the option "Biom file in HDF5" is enabled (direct biom file parsing is only supported in the Cytoscape 3.× version of the CoNet app). In the sub-menu "Metadata and Features", the option "explore links between higher-level taxa" is enabled together with the option "Parent-child exclusion" to compute correlations between higher-level taxa while preventing edges between taxa within the same lineage (e.g. Lactobacillales and Lactobacillaceae). Sample metadata (pH in this case) are passed to the CoNet app via the "Select file" button in the "Features" corner of the "Metadata and Features" sub-menu. Both "Transpose" and "Match samples" need to be enabled to convert sample metadata into rows and to match sample metadata identifiers to biom file identifiers.
In the "Preprocessing and filtering" menu, the parameter "row_minocc" is set to 20 to discard taxa with less than 20 non-zero values across samples. The sum of the discarded rows can be kept by enabling "Keep sum of filtered rows". In addition, "col_norm" is activated to divide each matrix entry by the sum of its corresponding column, thus avoiding the inference of spurious links due to sequencing depth differences.
In the "Methods" menu, Pearson, Spearman, Bray Curtis, Kullback Leibler and mutual information are selected. Their thresholds can be automatically set such that 1,000 top-scoring and 1,000 bottom-scoring edges (for anti-correlations) are included for each measure in the initial network, by typing "1000" as the value of the edge selection parameter and enabling "Top and bottom" in the "Threshold setting" sub-menu. At this stage, pushing "GO" will result in a multigraph, where microbial taxa are connected by up to five different measure-specific edges.
Assessment of edge significance
The significance of edges, that is their p-values, is computed in two CoNet launches, the first of which generates the permutation distributions and an intermediate network and the second the bootstrap distributions and the final network.
For the first launch, the user selects the "edgeScores" routine in the "Randomization" menu, with "shuffle_rows" as resampling parameter, and enables "Renormalize" (for details on renormalization, see Faust et al., 2012). The user then specifies a folder and a file name to export permutation scores and enables "Save randomizations" in the "Save" corner of the "Randomization" menu. Pushing "GO" will then launch the computation of edge- and measure-specific permutation distributions. Permutation alone is sufficient to set p-values on the edges, but we found that a combination of permutation and bootstrap is more stringent ( Faust et al., 2012). Thus, the network generated in this first step should be considered as an intermediate result.
In order to compute bootstrap distributions and the final network, the user prepares a second CoNet launch, by selecting the "bootstrap" resampling method and a p-value merging method, for instance "brown" (Brown 1975), in the "Randomization" menu. P-value merging will unite measure-specific p-values for the same edge into a single edge-specific p-value. "Renormalize" is disabled and "benjaminihochberg" is selected as the multiple testing correction method. In the "Save" corner of the "Randomization" menu, another file name should be specified to store bootstrap distributions in a separate file. P-values of the final network are computed from both permutation and bootstrap distributions, thus previously generated permutation distributions have to be loaded into the CoNet app. This is done by selecting the permutation file generated in the previous step with the "Load null distributions" button. Pushing "GO" will then result in the final network, shown in Figure 2A.
The CoNet app does not layout resulting networks, to leave the choice of the (potentially time-consuming) layout algorithm to the user. Here, the "Organic" layout from yFiles was applied and nodes were colored according to their class using Cytoscape's node coloring functionality.
Once permutation and bootstrap distributions have been computed, network generation can be quickly repeated by loading both distributions via the "Load null distributions" and "Load randomization file" buttons, respectively. Figure 2B shows the same network re-generated from pre-computed distributions, but with "positive edges only" enabled in the "Preprocessing and filter" menu. Figure 2C displays the neighbors of the pH node, which were selected and instantiated as a separate network using Cytoscape's node selection function "First neighbors of selected nodes" for undirected networks.
The computation of permutation and bootstrap distributions took ~5 minutes each for 100 iterations on a standard laptop.
Input and settings files for the use case can be found in the Supplementary material.
Discussion
Insights into arctic soil microbiota
After removal of negative edges, the arctic soil network forms two prominent clusters ( Figure 2B), which are enriched with representatives of different classes, such that one cluster features mostly members of the Solibacteres and Acidobacteria, whereas the other consists mostly of Alphaproteobacteria and Chloracidobacteria. When examining the neighbors of the pH node ( Figure 2C), members of the former cluster are found to be anti-correlated to pH, whereas members of the latter are correlated to it. Thus, network analysis helps to identify pH as a major driving factor for microbial soil communities, as has been found previously ( Fierer & Jackson, 2006). The correlations with pH have also been described by the authors of the soil study ( Chu et al., 2010). However, network analysis adds more details (correlations are computed on lower taxonomic levels) and discovers additional taxonomic groups impacted by pH, e.g. Chloracidobacteria. Furthermore, network inference suggests candidates for cross-feeding. For instance, the neighboring nodes of Bradyrhizobium, a nitrogen fixer that produces ammonium, may represent taxa that depend on ammonium as main nitrogen source.
Related apps
The CoNet app offers mostly similarity-based network inference. Complementary apps that implement various Bayesian network inference algorithms are Cyni Toolbox ( http://www.proteomics.fr/Sysbio/CyniProject), bayelviraApp ( http://apps.cytoscape.org/apps/bayelviraapp) and MONET ( Lee & Lee, 2005). ARACNE ( http://apps.cytoscape.org/apps/aracne) exploits mutual information to build networks ( Margolin et al., 2006). ExpressionCorrelation ( http://www.baderlab.org/Software/ExpressionCorrelation) and MetaNetter ( http://apps.cytoscape.org/apps/metanetter) also offer similarity-based network inference techniques, in case of the former specialized to gene expression and in the latter to metabolomics data. Results from these different network inference approaches could be combined with Cytoscape tools such as Merge Networks.
Conclusion
In this article, we have demonstrated the CoNet app on a typical 16S data set. Alternative use cases are for instance the inference of function networks ( i.e. co-occurrence of orthologous gene groups) from metagenomics or metatranscriptomics data or taxon-metabolite networks from 16S and metabolomics data.
We hope that CoNet's integration into Cytoscape will lower the barrier for its employment by users less familiar with the command line version. Due to its flexibility and comprehensiveness, CoNet can be useful in a variety of applications and we thus hope it will find a broad user base.
Software availability
CoNet app page: http://apps.cytoscape.org/apps/conet
CoNet tool web page: http://systemsbiology.vub.ac.be/conet
Latest source code: http://sourceforge.net/projects/conet/
Archived source code as at the time of publication: Zenodo, Biological network inference in Cytoscape, doi: 10.5281/zenodo.55715 ( Faust & Raes, 2016)
License: GNU General Public License version 2.0
Acknowledgements
We would like to thank Gipsi Lima-Mendez and other members of the Raes lab, as well as all users of the CoNet app that have sent us constructive feedback or error reports that helped to improve this app. We further are indebted to Fah Sathirapongsasuti, Curtis Huttenhower and Jean-Sébastien Lerat, who significantly contributed to the command line version of CoNet.
Funding Statement
K. F. and J.R. are supported by the Research Foundation Flanders (FWO) and the Flemish agency for Innovation by Science and Technology (IWT).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 1; referees: 2 approved with reservations]
Supplementary material
Use case data in CoNet app: inference of biological association networks using Cytoscape.
This file contains microbial count data, sample metadata, permutation settings and bootstrap settings associated with this submission. Description of each dataset is provided in the text file.
.
References
- Bork P, Bowler C, de Vargas C, et al. : Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. Science. 2015;348(6237):873. 10.1126/science.aac5605 [DOI] [PubMed] [Google Scholar]
- Caporaso JG, Kuczynski J, Stombaugh J, et al. : QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–336. 10.1038/nmeth.f.303 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chu H, Fierer N, Lauber CL, et al. : Soil bacterial diversity in the Arctic is not fundamentally different from that found in other biomes. Environ Microbiol. 2010;12(11):2998–3006. 10.1111/j.1462-2920.2010.02277.x [DOI] [PubMed] [Google Scholar]
- Connor EF, Simberloff D: The Assembly of Species Communities: Chance or Competition? Ecology. 1979;60(6):1132–1140. 10.2307/1936961 [DOI] [Google Scholar]
- Diamond JM: Assembly of species communities. In Ecology and evolution of communities Cody M, Diamond JM eds., Harvard University Press,1975;342–444. Reference Source [Google Scholar]
- Falony G, Joossens M, Vieira-Silva S, et al. : Population-level analysis of gut microbiome variation. Science. 2016;352(6285):560–564. 10.1126/science.aad3503 [DOI] [PubMed] [Google Scholar]
- Faust K, Sathirapongsasuti JF, Izard J, et al. : Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol. 2012;8(7):e1002606. 10.1371/journal.pcbi.1002606 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faust K, Raes J: Biological network inference in Cytoscape. Zenodo. 2016. Data Source
- Fierer N, Jackson RB: The diversity and biogeography of soil bacterial communities. Proc Natl Acad Sci U S A. 2006;103(3):626–631. 10.1073/pnas.0507535103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert JA, Jansson JK, Knight R: The Earth Microbiome project: successes and aspirations. BMC Biol. 2014;12:69. 10.1186/s12915-014-0069-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gotelli NJ, McCabe DJ: Species Co-Occurrence: A Meta-Analysis of J. M. Diamond's Assembly Rules Model. Ecology. 2002;83(8):2091–2096. 10.2307/3072040 [DOI] [Google Scholar]
- Himsolt M: GML: A portable Graph File Format [Online]. Reference Source [Google Scholar]
- Hu Z, Chang YC, Wang Y, et al. : VisANT 4.0: Integrative network platform to connect genes, drugs, diseases and therapies. Nucleic Acids Res. 2013;41(Web Server issue):W225–W231. 10.1093/nar/gkt401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ladau J: Lightweight, portable library for working with HDF5 BIOM files using Java [Online]. Reference Source [Google Scholar]
- Lee PH, Lee D: Modularized learning of genetic interaction networks from biological annotations and mRNA expression data. Bioinformatics. 2005;21(11):2739–2747. 10.1093/bioinformatics/bti406 [DOI] [PubMed] [Google Scholar]
- Margolin AA, Nemenman I, Basso K, et al. : ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(Suppl 1):S7. 10.1186/1471-2105-7-S1-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald D, Clemente JC, Kuczynski J, et al. : The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience. 2012;1(1):7. 10.1186/2047-217X-1-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics. 2008;9:461. 10.1186/1471-2105-9-461 [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Human Microbiome Project Consortium: A framework for human microbiome research. Nature. 2012;486(7402):215–221. 10.1038/nature11209 [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Qiita Development Team: Qiita: report of progress towards an open access microbiome data analysis and visualization platform. In: 14th Python in Science Conference (SCIPY 2015),2015. Reference Source [Google Scholar]