Skip to main content
Advances in Bioinformatics logoLink to Advances in Bioinformatics
. 2017 Jul 18;2017:1278932. doi: 10.1155/2017/1278932

Empirical Comparison of Visualization Tools for Larger-Scale Network Analysis

Georgios A Pavlopoulos 1,*, David Paez-Espino 1, Nikos C Kyrpides 1, Ioannis Iliopoulos 2,*
PMCID: PMC5540468  PMID: 28804499

Abstract

Gene expression, signal transduction, protein/chemical interactions, biomedical literature cooccurrences, and other concepts are often captured in biological network representations where nodes represent a certain bioentity and edges the connections between them. While many tools to manipulate, visualize, and interactively explore such networks already exist, only few of them can scale up and follow today's indisputable information growth. In this review, we shortly list a catalog of available network visualization tools and, from a user-experience point of view, we identify four candidate tools suitable for larger-scale network analysis, visualization, and exploration. We comment on their strengths and their weaknesses and empirically discuss their scalability, user friendliness, and postvisualization capabilities.

1. Background

Health and natural sciences have become protagonists in the big-data world as high-throughput advances continuously contribute to the exponential growth of data volumes. Nowadays, biological repositories expand every day by hosting various entities such as proteins, genes, drugs, chemicals, ontologies, functions, articles, and the interactions between them, often leading to large-scale networks of thousands or even millions of nodes and connections. As such networks are characterized by different properties and topologies, graph theory comes to play a very important role by providing ways to efficiently store, analyze, and subsequently visualize them [15].

Visualization and exploration of biological networks at such scale are a computationally challenging task and many efforts in this direction have failed over the years. Recent review articles [3, 4, 6] discuss the challenges in the biological data visualization field and list a catalog of standalone and web-based visualization tools as well as the visual concepts they are implemented to serve. While these resources are valuable to capture the big picture in the field, get a sense of the available tools, and spot the strengths and the weaknesses of a tool of interest at a glance, no empirical feedback on the tools' scalability was obvious.

To shortly mention representative tools in the field, 2D standalone applications like graphVizdb [7], Ondex [8], Proviz [9], VizANT [10], GUESS [11], UCINET [12], MAPMAN [13], PATIKA [14], Medusa [15], or Osprey [16] as well as 3D visualization tools such as Arena3D [17, 18] and BioLayout Express [19] already exist. Each of them is designed to serve a different purpose. For example, Ondex is implemented to gather and manage data from diverse and heterogeneous datasets, Proviz is dedicated to handle protein-protein interaction datasets, VizANT focuses on metabolic networks and ecosystems, Medusa is able to show semantic networks and multiedged connections, GUESS supports dynamic and time sensitive data, Osprey is implemented to annotate biological networks, Arena3D is targeting multilayered graphs, and BioLayout Express is designed for generic advanced 3D network visualizations.

Despite the fact that such tools are widely used and have great potential for further development, to our experience, they are not recommended for large-scale network analysis in their current versions. UCINET windows application could potentially be used for just visualization purposes. Its absolute maximum network size is about 2 million nodes but, in practice, most of its procedures are too slow to run networks larger than about 5,000 nodes.

Among several existing tools that we tested, we find Cytoscape (v3.5.1) [20], Tulip (v4.10.0) [21], Gephi (v0.9.1) [22], and Pajek (v5.01) [23, 24] standalone applications to be the top four candidates for visualization, manipulation, exploration, and analysis of very big networks. For these four tools, we empirically evaluate their pros and cons, we comment on their scalability, user friendliness, layout speed, offered analyses, profiling, memory efficiency, and visual styles, and we provide tips and advice on which of their features can scale and which of them is better to avoid.

In order to show a representative visualization generated by these four tools, we constructed a graph consisting of 202,424 nodes and 354,468 edges showing the habitat distribution of 202,417 protein families across 7 habitats. Data was collected from the IMG integrated genome and metagenome comparative data analysis system [25] whereas protein families originate from public metagenomes only.

A step-by-step protocol describing how these images were generated is presented as Supplementary Material, available online at https://doi.org/10.1155/2017/1278932. Comments on problems which occurred during our analysis as well as drawbacks and strengths of the visualization tools used for the purposes of this review are extensively discussed.

2. Top Four Candidates for Large-Scale Network Visualization

2.1. Gephi (Version 0.9.1)

Gephi is free open-source, leading visualization and exploration software for all kinds of networks and runs on Windows, Mac OS X, and Linux. It is our top preference as it is highly interactive and users can easily edit the node/edge shapes and colors to reveal hidden patterns. The aim of the tools is to assist users in pattern discovery and hypothesis making through efficient dynamic filtering and iterative visualization routines. As a generic tool, it is applicable to exploratory data analysis, link analysis, social network analysis, biological network analysis, and poster creation.

2.1.1. Scalability

Gephi comes with a very fast rendering engine and sophisticated data structures for object handling, thus making it one of the most suitable tools for large-scale network visualization. It offers very highly appealing visualizations and, in a typical computer, it can easily render networks up to 300,000 nodes and 1,000,000 edges. Compared to other tools, it comes with a very efficient multithreading scheme, and thus users can perform multiple analyses simultaneously without suffering from panel “freezing” issues.

2.1.2. Layouts

In large-scale network analysis, fast layout is a bottleneck as most sophisticated layout algorithms become CPU and memory greedy by requiring long running time to be completed. While Gephi comes with a great variety of layout algorithms, OpenOrd [26] and Yifan-Hu [27] force-directed algorithms are mostly recommended for large-scale network visualization. OpenOrd, for example, can scale up to over a million nodes in less than half an hour while Yifan-Hu is an ideal option to apply after the OpenOrd layout. Notably, Yifan-Hu layout can give aesthetically comparable views to the ones produced by the widely used but conservative and time-consuming Fruchterman and Reingold [28]. Other algorithms offered by Gephi are the circular, contraction, dual circle, random, MDS, Geo, Isometric, GraphViz, and Force atlas layouts. While most of them can run in an affordable running time, the combination of OpenOrd and Yifan-Hu seems to give the most appealing visualizations. Descent visualization is also offered by OpenOrd layout algorithm if a user stops the process when ~50–60% of the progress has been completed. Of course, efficient parametrization of any chosen layout algorithm will affect both the running time and the visual result.

2.1.3. Postvisualization Analysis

Edge-bundling and famous clustering algorithms such as the MCL [29] do not come by default with Gephi but can be downloaded from Gephi's plugin library (~100 plugins). In addition, GeoLayout Gephi's plugin is very suitable to plot a network with geographical information. Coming to dynamic network visualization, Gephi is the forefront of innovation with dynamic graph analysis. Users can visualize how a network evolves over time by manipulating its embedded timeline. While visualization of a network over time is something very useful, its current algorithms are not suitable for large-scale networks. Similarly, for large-scale networks it is highly recommended for users to apply clustering algorithms using external command line applications and then import the clustering results to a visualization tool.

To study a network's topology, Gephi comes with a very basic but high quality network profiler showing basic statistics about the network such as the number of nodes, the number of edges, its density, its clustering coefficient, and other metrics. Automatically calculated node attributes such as node connectivity, clustering coefficient, betweenness centrality, or edge weight and similarly are trivial tasks and do not require long time to be calculated.

2.1.4. Editing

Gephi is highly interactive and provides clever shortcuts to highlight communities, and shortest paths or relative distances of any node to a node of interest are offered. Moreover, users can easily adjust or interactively filter the shapes and colors of the network's edges and nodes according to their attributes in order to reveal hidden patterns. It is not the purpose of this review to tutor how to use such applications as this can be found in the tool's relevant help pages. While Gephi is a great option for large-scale network visualization, manual network importing, multiple network handling, and manual node/edge/label editing can be tricky as many options are well hidden in Gephi's user interface or supported by specific plugins.

2.1.5. File Formats

Gephi can load networks in GEXF, GDF, GML, GraphML, Pajek (NET), GraphViz (DOT), CSV, UCINET (DL), Tulip (TPL), Netdraw (VNA), and Excel spreadsheets. Similarly, Gephi can export networks in JSON, CSV, Pajek (NET), GUESS (GDF), Gephi (GEFX), GML, and GraphML [30] files. The easiest way to talk with Cytoscape is through GraphML formats, with Tulip through GEFX files and with Pajek through NET files. Unfortunately, in its current version, communication with other tools through other common file formats such as JSON fails.

2.1.6. Availability

Regardless of its very limited documentation, Gephi is a great, generic, nondedicated to biology, 2D network visualization tool. It mainly emphasizes fast and smooth rendering, fast layouting, efficient filtering, and interactive data exploration and we believe that it remains one of the best options for generic large-scale network visualization. A network example visualized by Gephi is shown in Figure 1. Gephi is available at: https://gephi.org/.

Figure 1.

Figure 1

Gephi visualization of a network consisting of 202,424 nodes and 354,468 edges showing the distribution of 202,417 protein families across 7 habitats. A combination of OpenOrd and Yifan-Hu force-directed layout algorithm was used to calculate the node coordinates. Each habitat and its adjacent edges are colored uniquely. A step-by-step guide regarding the methods and the parametrization that were used is extensively described in the supplementary file.

2.2. Tulip (Version 4.10.0)

Tulip is one of the easiest-to-use network visualization tools and a decent option for visualization of larger-scale networks. Due to its simplicity, it is highly recommended for nonexperts as it comes with an easy-to-use interface. It is written in C++ and enables the development of algorithms, visual encodings, interaction techniques, data models, and domain-specific visualizations. Compared to other tools, it offers very appealing visualizations especially after enabling its great edge-bundling algorithm.

2.2.1. Scalability

In its current version, it is able to visualize thousands of nodes with hundreds of thousands of edges in an average computer and aims at becoming a great mediator between graph analysis and visualization. While Tulip is a top preference for medium-scale networks, to our experience it is not as scalable as Gephi.

2.2.2. Layouts

Its great plethora of layout algorithms makes it one of the best options for graph layout. At the moment, it supports simple (circular, random), force-directed (i.e., Fruchterman and Reingold [28], Kamada and Kawai [31]), hierarchical, multilevel, planar, and tree layout algorithms, most of them optimized and implemented within the Open Graph Drawing Framework (OGDF) [32]. As opposed to the more conservative force-directed layout algorithms, Fast Multipole Multilevel Layout is highly recommended for large-scale networks. While its layouts are of a great quality, in order to save time, the strategy to first calculate the nodes layout with Gephi or Pajek and then import to Tulip is highly recommended.

2.2.3. Postvisualization Analysis

By trying to bridge the gap between the analysis and visualization, Tulip comes with a rich pool of clustering and network topology analysis algorithms. Among others, Tulip currently implements the memory greedy but widely accepted greedy Markov Clustering (MCL) [29] as well as the fast and memory efficient Louvain Clustering [33] for unweighted graphs. In addition, Tulip incorporates various traditional algorithms for network exploration like algorithms to find the biconnected or the strongly connected components or algorithms dedicated to finding spanning trees or loops. Like before, for large-scale network analysis, running clustering algorithms externally is recommended.

In addition, Tulip comes with a very simple interface to ask topological questions. K-core decomposition of a graph, eccentricity centrality, degree, page rank, and betweenness centrality are few of the offered options and nodes' size or color can be adjusted according to a selected topological feature.

2.2.4. Editing

While Tulip does not come with a great variety of predefined color schemes, users can manually change the color, the size, and the shape of any node, label, or edge and save and reload the status of a network. Unfortunately, it can process one network per session and users must be careful as sometimes the visualization and the editing panels do not coordinate. Unfortunately, simple tasks such as interactively selecting the in/out edges of a node directly from the visualization can take significant amount of time.

2.2.5. Edge Bundling

While Tulip's renderer does not reach Gephi's or Cytoscape's resolution, it comes with one of the most appealing edge-bundling algorithms. Unfortunately, for large-scale network analysis, its edge-bundling algorithm can often become memory and CPU greedy so users must be patient. Finally saving the status of a bundled view compared to an unbundled view can lead to significantly higher storage requirements (see supplementary file for examples).

2.2.6. File Formats

It accepts as input simple tab delimited, Pajek, GEFX, GML, GraphViz, JSON, TLPB, and UCINET files and exports to TLP, SVG, JSON, and GML formats. The easiest way to talk with Pajek is through NET files, with Cytoscape through GML or GraphML files, and with Gephi through GEFX files. Finally, Tulip comes with a very powerful generator of graphs of a user-defined size and topology.

2.2.7. Availability

Overall, Tulip is a generic 2D network visualization tool with a self-explanatory user interface and is suitable for large-scale node and edge layouting and analysis. A network example visualized by Tulip is shown in Figure 2. Tulip is available at: http://tulip.labri.fr/TulipDrupal/.

Figure 2.

Figure 2

Tulip visualization of the same network like in Figure 1. The 7 habitats are highlighted and resized accordingly. An example of the same network after applying edge bundling is presented in the supplementary file. Nodes' coordinates were calculated using the Yifan-Hu layout algorithm from Gephi application.

2.3. Cytoscape (Version 3.5.1)

Cytoscape open-source Java application is the most widely used 2D network visualization tool in biology and health sciences. It supports all kinds of networks (e.g., weighted unweighted, bipartite, directed, undirected, and multiedged) and comes with an enormous library of additional plugins (>250). It was initially implemented to analyze molecular interaction networks and biological pathways and was aiming at integrating these networks with annotations, gene expression profiles, and other state data. Although Cytoscape was originally designed for biorelated research, now it serves as a generic platform for complex network analysis and visualization by providing a basic set of features for data integration, analysis, and visualization.

2.3.1. Scalability

Cytoscape implementations after version 3.0.0 come with tremendous rendering improvements, thus allowing Cytoscape to visualize large networks of hundred thousand nodes and edges. Despite these improvements, Cytoscape does not rank first for large-scale network analysis as it cannot scale significantly when it comes to analysis. Often Cytoscape's clustering and layout routines need great amount of memory and time. Therefore, for large-scale network analysis, it is suggested to run such processes in command line outside Cytoscape platform and load the results as node/edge attributes (groups in the case of clustering or coordinates in the case of a layout). In addition, Cytoscape is subject to Java's memory and running time limitations as most of its routines are implemented in Java.

2.3.2. Layouts

Like other tools, it comes with a very rich variety of simple (grid, random, and circular) or more sophisticated (force-directed, hierarchical) layout algorithms. Notably, for large-scale network analysis, users must be careful and change the default layout algorithm before creating a view. A simple grid or a simple circular layout is recommended as Cytoscape's force-directed layouts are memory and CPU greedy and the application might “hang.” Another alternative could be OpenCL, one of the fastest layouts algorithms in Cytoscape. After version 3.2.0 OpenCL-based version is incorporated as a basic application. This layout is up to 100 times faster than the standard Prefuse layout and depends on the CyCL core app for OpenCL support. Nevertheless, calculating a first layout with Gephi or Pajek and then importing its results in Cytoscape can save time.

2.3.3. Postvisualization Analysis

Cytoscape is the most successful tool for bridging the gap between analysis and visualization and it comes with a great plethora of layout, clustering, and topological network analysis algorithms. ClusterMaker plugin [34], for example, includes attribute cluster algorithms such as AutoSOME Clustering [35] and Eisen's hierarchical and k-Means clustering [36] as well as topology-based clustering algorithms such as affinity propagation [37], community clustering (GLay) [38], MCODE [39], MCL, SCPS (Spectral Clustering of Protein Sequences) [40], and transitivity clustering [41]. Most clustering results can be visualized as a newly constructed network preserving the original edges, or as a heatmap. Like before, for large-scale network analysis, users are encouraged to run such algorithms externally.

In addition, Cytoscape incorporates one of the most advanced network profilers to explore network topological features. Users are able to view simple statistics like the average connectivity, betweenness centrality, clustering coefficient, and others. While such calculations are trivial for large-scale networks, plotting a topological feature against any other could be slow.

Finally, Cytoscape's latest versions incorporate a rather useful but slow and memory inefficient edge-bundling algorithm, not recommended for large-scale analysis.

2.3.4. Editing

Cytoscape is a protagonist in offering predefined visual styles and color schemes to create high quality and aesthetically beautiful visualizations. Its zooming and panning capabilities are very advanced and Cytoscape's satellite viewer makes it very easy for users to navigate and orient when the network is drawn outside the main canvas, something that is not trivial with Gephi. Finally, choosing adjacent nodes and edges from the UI is very responsive.

2.3.5. File Formats

Cytoscape accepts many different input file formats such as its own CYS format, tab delimited, simple interaction file format (SIF), nested network format (NNF), graph markup language (GML), extensible graph markup and modelling language (XGMML), SBML [42], BioPAX [43], PSI-MI [44], GraphML, excel workbooks (.xls, .xlsx), and JSON. The easiest way to talk with Tulip and Gephi is through a GML format.

2.3.6. Availability

Overall, Cytoscape is the best visualization tool today for biological network analyses. Despite its user friendliness, its rich documentation, and the tremendous improvement of its user interface after version 3.0, familiarity with the tool and its available plugins still requires a steep learning curve for more advanced tasks. Cytoscape store currently hosts more than 250 plugins, specifically designed to address and automate complicated biological analyses. Plugins for functional enrichment, Gene Ontology annotations [45], gene name mapping, integration with biological public repositories, efficient online data retrieval, pathway analysis, direct network comparisons, differential expression, and statistical analysis make Cytoscape unique of its kind and therefore today it currently is and expected to remain the number-one player for biological network analysis. A network visualized by Cytoscape is shown in Figure 3. Cytoscape is available at http://www.cytoscape.org/.

Figure 3.

Figure 3

Cytoscape visualization of the same network like in Figure 1. The network consists of 202,424 nodes and 354,468 edges. The 7 habitats are colored accordingly. Like in Figure 2, coordinates were calculated using the Yifan-Hu layout algorithm from Gephi application.

Finally, CytoscapeWeb [46] and Cytoscape.js are separate projects. They are two very strong efforts aiming to incorporate Cytoscape's main visual functionalities in browser-based applications, something that of course is not suitable for large-scale network analysis. Users can use Cytoscape and export the networks in JSON format for Cytoscape.js.

2.4. Pajek (Version 5.01)

Pajek is a generic, more than 20 years old, Microsoft Windows based network visualization tool, initially implemented for social network analysis, yet a very powerful application for analysis and visualization of massive networks.

2.4.1. Scalability

Pajek can easily visualize million nodes with billion connections in an average computer by outperforming any other available tool in the field. Pajek-XXL is a special implementation of Pajek with emphasis on huge scale network analysis. It needs at least 2-3 times less physical memory than Pajek and most of Pajek's memory intensive operations are optimized to be much faster. The main philosophy of Pajek-XXL is to extract smaller but most interesting and informative parts of a larger network which can be further analyzed and visualized with more advanced tools. The highest possible number of vertices that Pajek64-XXL can handle has been increased to 2 billion as for ordinary Pajek the limit is 100 million. Pajek-XXL uses 32-bit (4 bytes) integers for vertices numbers. Thus, the highest number of vertices that Pajek-XXL can handle is set to two billion. If network contains more vertices Pajek-3XL must be used. Pajek-3XL uses 64-bit (8 bytes) integers for vertices numbers. The highest number of vertices that Pajek-3XL can handle is currently set to 10 billion but can easily be further increased. Notably, the space needed to store a network in Pajek-3XL and Pajek-XXL is exactly the same.

2.4.2. Layouts

Graph layout, node merging, neighborhood detection, identification of strongly connected components, clique finding, manipulation of bipartite graphs, searching for shortest paths or maximum flows, clustering (i.e., Louvain), and computing centralities of vertices and centralizations of networks such as degree, closeness, betweenness, hubs and authorities, clustering coefficients, and Laplacian centrality are few of Pajek's capabilities. Notably, Pajek is memory efficient and very suitable for fast sparse network multiplication.

2.4.3. File Format

Pajek accepts very strict file input formats. The easiest way to talk with Tulip and Gephi is through a .net file

Pajek's user interface is simple, easy to get familiar with, and very responsive when it comes to analysis of massive networks. It was never intended to be the most advanced visualizer but it offers tremendous graph analyses methodologies thus making it a great candidate for analysis of massive networks and a great complement to the existing tools. A network example visualized by Pajek is shown in Figure 4. Pajek can be found at http://mrvar.fdv.uni-lj.si/pajek/.

Figure 4.

Figure 4

Pajek basic visualization of the same network like in Figure 1. The network consists of 202,424 nodes and 354,468 edges. Like in Figures 2 and 3, coordinates were calculated using the Yifan-Hu layout algorithm from Gephi application. Notably for a massive network it is highly recommended to first use Pajek layouting.

3. Discussion

Despite the great plethora of available network visualization tools, due to the continuous increase of the data volume in health sciences, visualization and manipulation of large-scale networks with million nodes and edges still remain a bottleneck. While noninteractive libraries such as the Stanford Network Analysis Project (SNAP) [47], the outdated Large Graph Layout (LGL) [48], NetworkX [49], or the GraphViz [50] are preferred for backend calculations and large-scale static visualizations and while alternative network visualizations such as the ones offered by the Circos [51], HivePlots [52], or BioFabric [53] can partially solve the hairball effect, the implementation of user friendly interactive tools to handle and visualize such large graphs still remains a very complicated task. Therefore, for the purposes of this review article, we tested several available standalone applications and concluded that Pajek, Tulip Gephi, and Cytoscape are top candidates for large-scale network visualization and analysis.

In conclusion, while Cytoscape is the best and the most preferred tool for biological analyses, it has scalability and memory issues and therefore it is not our top pick for large-scale network visualization. On the contrary, we rank it first for biological analyses as it is accompanied by a great plethora of more than 200 plugins. Compared to Tulip, Gephi, and Pajek, it has the richest palette of predefined color styles, the most efficient collection of clustering algorithms, and the best network profiler for intranetwork comparison of topological features.

Gephi clearly outperforms Cytoscape in terms of scalability and memory efficiency and, in our opinion, it is the best generic visualization tool for layouting large-scale networks. While it is fairly straightforward to use, sometimes node/edge editing options are well hidden in its user interface thus making it a bit confusing for the user. On the other hand, Gephi offers very advanced visualizations by allowing users to perform multiple tasks simultaneously, something that is not always easy with Cytoscape or Tulip. Overall, we rank Gephi as first when it comes to balance between large-scale network visualization and basic analysis.

Tulip is our third best option for large-scale network visualization. Its best characteristics are (i) the edge-bundling layout and (ii) its simplicity in editing the node's/edge's colors, labels, and attributes. Tulip is highly recommended for beginners due to its self-explanatory user interface.

Finally, Pajek and Pajek-XXL are the most scalable tools and highly recommended for basic visualizations of massive networks with >10 billion nodes, network sizes that Cytoscape, Tulip, and Gephi cannot handle in their current versions. Unfortunately, the lack of operating system interoperability as well as the lack of input file format flexibility and the lack of appealing visualizations prevent Pajek from being the top tool for advanced visualizations.

All the aforementioned observations are summarized in Table 1. Even though they may vary from user to user depending on the expertise and the case study, in our opinion, Cytoscape, Tulip, Pajek, and Gephi still remain the best large-scale network visualization and analysis tools in systems and network biology.

Table 1.

Empirical evaluation of our top four interactive network visualization tools (Cytoscape, Gephi, Tulip, and Pajek) for large-scale biological network analysis.

Cytoscape Tulip Gephi Pajek
Scalability ∗∗ ∗∗∗ ∗∗∗∗
User friendliness ∗∗ ∗∗∗∗ ∗∗∗
Visual styles ∗∗∗∗ ∗∗ ∗∗∗
Edge bundling ∗∗∗ ∗∗∗∗ ∗∗
Relevance to biology ∗∗∗∗ ∗∗ ∗∗∗
Memory efficiency ∗∗ ∗∗∗ ∗∗∗∗
Clustering ∗∗∗∗ ∗∗∗ ∗∗
Manual node/edge editing ∗∗∗ ∗∗∗∗ ∗∗∗
Layouts ∗∗∗ ∗∗ ∗∗∗∗
Network profiling ∗∗∗∗ ∗∗ ∗∗∗
File formats ∗∗ ∗∗∗ ∗∗∗∗
Plugins ∗∗∗∗ ∗∗ ∗∗∗
Stability ∗∗∗ ∗∗∗∗ ∗∗∗
Speed ∗∗ ∗∗∗ ∗∗∗∗
Documentation ∗∗∗∗ ∗∗ ∗∗∗

= weaker; ∗∗ = medium; ∗∗∗ = good; ∗∗∗∗ = strongest.

4. Conclusion

It is unfair and not straightforward to directly compare visualization tools with each other as they are implemented to serve different purposes. Nevertheless, as biological network sizes increase over time, combining the complementary advantages from different tools is a good strategy. While several file formats to describe the structure of network have been standardized, our experience showed that many of them cannot be properly exported or imported across several tools. In addition, even in the best cases where such an import/export problem is absent, often node and edge attributes cannot be transferred. Therefore, we believe that a catholic network converted to accurately convert a file format into any other by simultaneously keeping the maximum information about the network's components is mandatory. This way, switching between tools and various visualizations will become easier and more straightforward.

Supplementary Material

A step-by-step guide demonstrating how the networks in the figures can be generated.

1278932.f1.docx (2MB, docx)

Acknowledgments

This work was supported by the US Department of Energy, Joint Genome Institute, a DOE Office of Science User Facility, under Contract no. DE-AC02-05CH11231, and used resources of the National Energy Research Scientific Computing Center, supported by the Office of Science of the US Department of Energy.

Additional Points

Availability of Data and Materials. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors' Contributions

Georgios A. Pavlopoulos wrote the article, David Paez-Espino and Nikos C. Kyrpides provided the data for the figures, and Ioannis Iliopoulos supervised the study. All authors read and approved the final manuscript.

References

  • 1.Pavlopoulos G. A., Secrier M., Moschopoulos C. N., et al. Using graph theory to analyze biological networks. BioData Mining. 2011;4(1, article 10) doi: 10.1186/1756-0381-4-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pavlopoulos G. A., Malliarakis D., Papanikolaou N., Theodosiou T., Enright A. J., Iliopoulos I. Visualizing genome and systems biology: Technologies, tools, implementation techniques and trends, past, present and future. GigaScience. 2015;4(1, article no. 38) doi: 10.1186/s13742-015-0077-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gehlenborg N., O’Donoghue S. I., Baliga N. S., et al. Visualization of omics data for systems biology. Nature Methods. 2010;7(3):S56–S68. doi: 10.1038/nmeth.1436. [DOI] [PubMed] [Google Scholar]
  • 4.O’Donoghue S. I., Gavin A.-C., Gehlenborg N., et al. Visualizing biological data—now and in the future. Nature Methods. 2010;7(3):S2–S4. doi: 10.1038/nmeth.f.301. [DOI] [PubMed] [Google Scholar]
  • 5.Pavlopoulos G. A., Iacucci E., Iliopoulos I., Bagos P. Interpreting the Omics 'era' Data. Smart Innovation, Systems and Technologies. 2013;25:79–100. doi: 10.1007/978-3-319-00375-7_6. [DOI] [Google Scholar]
  • 6.Pavlopoulos G. A., Wegener A. L., Schneider R. A survey of visualization tools for biological network analysis. BioData Mining. 2008;1:12. doi: 10.1186/1756-0381-1-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bikakis N., Liagouris J., Krommyda M., Papastefanatos G., Sellis T. GraphVizdb: A scalable platform for interactive large graph visualization. Proceedings of the 32nd IEEE International Conference on Data Engineering, ICDE 2016; May 2016; Helsinki, Finland. pp. 1342–1345. [DOI] [Google Scholar]
  • 8.Köhler J., Baumbach J., Taubert J., et al. Graph-based analysis and visualization of experimental results with ONDEX. Bioinformatics. 2006;22(11):1383–1390. doi: 10.1093/bioinformatics/btl081. [DOI] [PubMed] [Google Scholar]
  • 9.Iragne F., Nikolski M., Mathieu B., Auber D., Sherman D. ProViz: Protein interaction visualization and exploration. Bioinformatics. 2005;21(2):272–274. doi: 10.1093/bioinformatics/bth494. [DOI] [PubMed] [Google Scholar]
  • 10.Hu Z., Hung J.-H., Wang Y., et al. VisANT 3.5: Multi-scale network visualization, analysis and inference based on the gene ontology. Nucleic Acids Research. 2009;37(2):W115–W121. doi: 10.1093/nar/gkp406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Adar E. GUESS: a language and interface for graph exploration. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; 2006; Montreal, CA, USA. pp. 791–800.1124889 [Google Scholar]
  • 12.Borgatti S. P., Everett M. G., Freeman L. C. Ucinet for Windows: Software for Social Network Analysis. Harvard, Mass, USA: Analytic Technologies; 2002. [Google Scholar]
  • 13.Thimm O., Bläsing O., Gibon Y., et al. MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant Journal. 2004;37(6):914–939. doi: 10.1111/j.1365-313X.2004.02016.x. [DOI] [PubMed] [Google Scholar]
  • 14.Demir E., Babur O., Dogrusoz U., et al. PATIKA: An integrated visual environment for collaborative construction and analysis of cellular pathways. Bioinformatics. 2002;18(7):996–1003. doi: 10.1093/bioinformatics/18.7.996. [DOI] [PubMed] [Google Scholar]
  • 15.Pavlopoulos G. A., Hooper S. D., Sifrim A., Schneider R., Aerts J. Medusa: A tool for exploring and clustering biological networks. BMC Research Notes. 2011;4, article no. 384 doi: 10.1186/1756-0500-4-384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Breitkreutz B. J., Stark C., Tyers M. Osprey: a network visualization system. Genome Biology. 2003;4, article R22(3) doi: 10.1186/gb-2003-4-3-r22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Secrier M., Pavlopoulos G. A., Aerts J., Schneider R. Arena3D: visualizing time-driven phenotypic differences in biological systems. BMC Bioinformatics. 2012;13(1, article 45) doi: 10.1186/1471-2105-13-45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pavlopoulos G. A., O'Donoghue S. I., Satagopam V. P., Soldatos T. G., Pafilis E., Schneider R. Arena3D: visualization of biological networks in 3D. BMC Systems Biology. 2008;2, article 104 doi: 10.1186/1752-0509-2-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Theocharidis A., van Dongen S., Enright A. J., Freeman T. C. Network visualization and analysis of gene expression data using BioLayout Express (3D) Nature Protocols. 2009;4(10):1535–1550. doi: 10.1038/nprot.2009.177. [DOI] [PubMed] [Google Scholar]
  • 20.Shannon P., Markiel A., Ozier O., et al. Cytoscape: a software Environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13(11):2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Auber D. Tulip —a huge graph visualization framework. In: Jünger M., Mutzel P., editors. Graph Drawing Software. Berlin, Germany: Springer; 2004. pp. 105–126. (Mathematics and Visualization). [DOI] [Google Scholar]
  • 22.Jacomy M., Venturini T., Heymann S., Bastian M. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE. 2014;9(6) doi: 10.1371/journal.pone.0098679.e98679 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mrvar A., Batagelj V. Analysis and visualization of large networks with program package Pajek. Complex Adaptive Systems Modeling. 2016;4(6) [Google Scholar]
  • 24.Batagelj V., Mrvar A. Pajeka— program for large network analysis. Connections. 1998;21(2):47–57. [Google Scholar]
  • 25.Chen I. A., Markowitz V. M., Chu K., et al. IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Research. 2016 doi: 10.1093/nar/gkw929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Martin S., Brown W. M., Klavans R., Boyack K. W. OpenOrd: An open-source toolbox for large graph layout. Proceedings of the Visualization and Data Analysis 2011; January 2011; San Francisco Airport, Calif, USA. [DOI] [Google Scholar]
  • 27.Yifan H. Efficient, high-quality force-directed graph drawing. The Mathematica Journal. 2006;10(1) [Google Scholar]
  • 28.Fruchterman T. M. J., Reingold E. M. Graph drawing by force-directed placement. Software—Practice and Experience. 1991;21(11):1129–1164. doi: 10.1002/spe.4380211102. [DOI] [Google Scholar]
  • 29.Enright A. J., Van Dongen S., Ouzounis C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research. 2002;30(7):1575–1584. doi: 10.1093/nar/30.7.1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Brandes U., Eiglsperger M., Lerner J., Pich C. Handbook of Graph Drawing and Visualization. 1999. Graph markup language (GraphML) pp. 517–541. [Google Scholar]
  • 31.Kamada T., Kawai S. An algorithm for drawing general undirected graphs. Information Processing Letters. 1989;31(1):7–15. doi: 10.1016/0020-0190(89)90102-6. [DOI] [Google Scholar]
  • 32.Chimani M., Gutwenger C., Jünger M., Klau G. W., Klein K. In: The Open Graph Drawing Framework (OGDF) Tamassia R., editor. London, UK: Chapman & Hall; 2014. [Google Scholar]
  • 33.Blondel V. D., Guillaume J., Lambiotte R., Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;2008(10) doi: 10.1088/1742-5468/2008/10/P10008.P10008 [DOI] [Google Scholar]
  • 34.Morris J. H., Apeltsin L., Newman A. M., et al. ClusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics. 2011;12, article 436 doi: 10.1186/1471-2105-12-436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Newman A. M., Cooper J. B. AutoSOME: A clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics. 2010;11, article no. 117 doi: 10.1186/1471-2105-11-117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Eisen M. B., Spellman P. T., Brown P. O., Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 1998;95(25):14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Frey B. J., Dueck D. Clustering by passing messages between data points. American Association for the Advancement of Science. Science. 2007;315(5814):972–976. doi: 10.1126/science.1136800. [DOI] [PubMed] [Google Scholar]
  • 38.Newman M. E. J., Girvan M. Finding and evaluating community structure in networks. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics. 2004;69(2):1–26113. doi: 10.1103/PhysRevE.69.026113.026113 [DOI] [PubMed] [Google Scholar]
  • 39.Bader G. D., Hogue C. W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics. 2003;4(2) doi: 10.1186/1471-2105-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nepusz T., Sasidharan R., Paccanaro A. SCPS: A fast implementation of a spectral method for detecting protein families on a genome-wide scale. BMC Bioinformatics. 2010;11, article no. 120 doi: 10.1186/1471-2105-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Wittkop T., Emig D., Lange S., et al. Partitioning biological data with transitivity clustering. Nature Methods. 2010;7(6):419–420. doi: 10.1038/nmeth0610-419. [DOI] [PubMed] [Google Scholar]
  • 42.Hucka M., Finney A., Sauro H. M., et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015. [DOI] [PubMed] [Google Scholar]
  • 43.Luciano J. S., Stevens R. D. E-Science and biological pathway semantics. BMC Bioinformatics. 2007;8(3, article no. S3) doi: 10.1186/1471-2105-8-S3-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hermjakob H., Montecchi-Palazzi L., Bader G., et al. The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data. Nature Biotechnology. 2004;22(2):177–183. doi: 10.1038/nbt926. [DOI] [PubMed] [Google Scholar]
  • 45.Ashburner M., Ball C. A., Blake J. A., et al. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lopes C. T., Franz M., Kazi F., Donaldson S. L., Morris Q., Bader G. D. Cytoscape web: An interactive web-based network browser. Bioinformatics. 2010;26(18):2347–2348. doi: 10.1093/bioinformatics/btq430.btq430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Leskovec J., Sosi R. SNAP: a general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology. 2016;8(1):1–20. doi: 10.1145/2898361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Adai A. T., Date S. V., Wieland S., Marcotte E. M. LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks. Journal of Molecular Biology. 2004;340(1):179–190. doi: 10.1016/j.jmb.2004.04.047. [DOI] [PubMed] [Google Scholar]
  • 49.Hagberg A., Schult D., Swart P. Exploring Network Structure, Dynamics, and Function using Network. Proceedings of the 7th Python in Science Conference (SciPy 2008); 2008; pp. 11–15. [Google Scholar]
  • 50.Gansner E. R., North S. C. An open graph visualization system and its applications to software engineering. Software—Practice & Experience. 2000;30(11):1203–1233. [Google Scholar]
  • 51.Krzywinski M., Schein J., Birol I., et al. Circos: An information aesthetic for comparative genomics. Genome Research. 2009;19(9):1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Krzywinski M., Birol I., Jones S. J., Marra M. A. Hive plots-rational approach to visualizing networks. Briefings in Bioinformatics. 2012;13(5):627–644. doi: 10.1093/bib/bbr069. [DOI] [PubMed] [Google Scholar]
  • 53.Longabaugh W. J. R. Combing the hairball with BioFabric: A new approach for visualization of large networks. BMC Bioinformatics. 2012;13(1, article no. 275) doi: 10.1186/1471-2105-13-275. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

A step-by-step guide demonstrating how the networks in the figures can be generated.

1278932.f1.docx (2MB, docx)

Articles from Advances in Bioinformatics are provided here courtesy of Wiley

RESOURCES