Skip to main content
F1000Research logoLink to F1000Research
. 2018 May 14;6:2120. Originally published 2017 Dec 11. [Version 2] doi: 10.12688/f1000research.13254.2

Simulation and visualization of multiple KEGG pathways using BioNSi

Adva Yeheskel 1, Adam Reiter 2, Metsada Pasmanik-Chor 1, Amir Rubinstein 2,a
PMCID: PMC6008849  PMID: 29946422

Version Changes

Revised. Amendments from Version 1

Some details were clarified and emphasized in response to the referees' reports. The following changes were made in the 2nd version:

  • Additional references added to the Introduction. 

  • The following was emphasized: BioNSi preserves all the alternative names for a KEGG entity. When merging pathways, even if two nodes have different alternative names, they will be fused into a single node. Furthermore, names of entities in the expression level file are matched against any of the alternative names of a node in the network.

  • An explanation about the scaling of nodes' states to the range 0-9 was added.

  • An explanation about the meaning of steady states in the 2 test cases was added to the Methods section ( Simulation)

  • Typo fixed: BooleSim instead of Boolsim. 

  • We changed "asynchronous" to "non-simultaneous", and added this column to Table 1.

  • We added a brief description to box 1 about additional regulation types in BioNSi.

  • We deleted the redundant paragraph from the Results.

Abstract

Motivation: Many biologists are discouraged from using network simulation tools because these require manual, often tedious network construction. This situation calls for building new tools or extending existing ones with the ability to import biological pathways previously deposited in databases and analyze them, in order to produce novel biological insights at the pathway level.

Results: We have extended a network simulation tool (BioNSi), which now allows merging of multiple pathways from the KEGG pathway database into a single, coherent network, and visualizing its properties. Furthermore, the enhanced tool enables loading experimental expression data into the network and simulating its dynamics under various biological conditions or perturbations. As a proof of concept, we tested two sets of published experimental data, one related to inflammatory bowel disease condition and the other to breast cancer treatment. We predict some of the major observations obtained following these laboratory experiments, and provide new insights that may shed additional light on these results.

Tool requirements: Cytoscape 3.x, JAVA 8

Availability: The tool is freely available at http://bionsi.wix.com/bionsi, where a complete user guide and a step-by-step manual can also be found.

Keywords: biological network, simulation, gene expression, KEGG, Cytoscape, BioNSi

Introduction

Modeling and simulation of biological regulatory networks is becoming an integral part of biological research nowadays 1. Such networks consist of nodes, representing, e.g., proteins, RNA, genes, nutrients, signals, etc. Edges represent the interactions between nodes, such as activation, inhibition, phosphorylation, etc. Simulation tools allow the analysis and visualization of the dynamics of these networks. Biological data, such as protein interactions and gene or protein expression are gathered and provide the required input for such tools. Indeed, in recent years many bioinformatics tools aiming at the simulation of biological networks have been published. The purpose of such tools is to elucidate the relationships between genes, proteins, pathways, or other biological entities involved, to shed new light on the experimental results, and to suggest possible directions for future research.

Simulation tools come in various flavors. Some focus on transcriptional regulatory networks, others on signal propagation or metabolic pathways. There are tools that require programing skills, while others provide a complete graphical user interface (GUI). The underlying mathematical models differ a lot too. Perhaps the most fundamental distinction is between continuous, Boolean and discrete models. Continuous network models typically apply differential equations, using real (as opposed to discrete) numbers to represent the system’s variables ( e.g., 2), while in Boolean network models variables may assume only one of two values, namely 0 (“OFF”, or inactive) and 1 (“ON”, or active) ( e.g., 3). Between the two "extremes" are discrete models, in which values are taken from a range of integers ( e.g., 0, 1, 2, …9). This is often a good compromise between the expressiveness of the model and its simplicity, and it is especially useful when only partial or imprecise biological knowledge of these systems is available 4.

Table 1 provides a summary of several network modeling and simulation tools properties. These tools are relatively new ones (most of them published within the last four years). In addition, we focused on biologist-friendly tools that require very little computational background and in particular no script writing skills. The last tool in Table 1 is BioNSi - Biological Network Simulator 5, whose extensions are the focus of this paper, as will be described later.

Table 1. A representative list of free, network simulation, tools for non-programming biologists.

A representative list of seven network simulation tools was obtained from OmicsTools. A summary of some important features is shown. V: denotes that a specific feature exists in the tool. Web/App: a web tool or downloadable software. Nodes levels: Boolean (B)/discrete (D)/continuous (C). Non-simultaneous update: defining delays on edges or priorities on nodes, which inflict order on the update of nodes. Asynchronous update: non-deterministic behavior which enables several successors to a single network state. Batch simulation mode: running a collection of simulation with various parameters to reveal emergent network properties. Edges weights: weights represent strength of interaction. Network generation: textual (T)/graphical (G). A star (*) represents features implemented in the extended BioNSi tool (but not in the original version).

Tool (ref) Web/App Nodes
levels
Non
simultaneous
update
Asynchronous
update
Batch
simulation
mode
Edges
weights
Network
generation
Merge
pathways
Upload
expression
data
SimBoolNet 6 app C V V T
GINsim 7 app D V V V V G
SQUAD 8 app B+C V T+G
ViSiBooL 9 app B V V T+G
ANIMO 10 app+web D V T+G
BooleSim 11 web B T+G
BioNSi 5 app D V V V T+G * *

In Table 1 we compare the following features:

  •     Web/App: whether the tool is web-based (web) or a stand-alone software or application (app)

  •     Nodes' levels: the granularity of the model variables, that is, whether nodes levels are Boolean (B), discrete (D) or continuous (C) variables

  •     Asynchronous update: whether the node level transition rules can be applied asynchronously, rather than simultaneously (for example, by specifying delays on edges, or defining priorities on edges)

  •     Batch simulation mode: whether the tool supports a batch mode of simulation, in which multiple initial conditions are explored simultaneously and global (emergent) properties such as the system steady states (attractors) are identified

  •     Edge weights: whether the edges in the network have weights to represent strength (amplitude) of interaction, in contrast to merely +1 (activation) and -1 (inhibition)

  •     Network generation: which format is supported for network generation - graphical (G), in which the network is drawn on the canvas, or textual (T), in which the user generates a table of nodes and edges

  •     Merge pathways: whether the tool supports merging of multiple pathways of interest into a single network.

  •     Upload expression data: whether expression data from e.g., laboratory experiments can be loaded into the network automatically.

Network simulation tools typically require manual network construction based on biological data. For networks beyond a very small scale of a few nodes, this process becomes tedious, time consuming, and error prone. Furthermore, conducting numerous simulations of a network under different biological conditions may require a tiresome and monotonous process of modifying network parameters for each set of conditions. This contributes to the discouragement of many biologists from integrating network simulation tools in their daily laboratory routine. This situation is extremely unfortunate, since a lot of network data based on lab experiments already exists in various common databases. Therefore, the integration of network simulation tools with biological data previously deposited into those databased provides novel opportunities for more comprehensive studies of these networks. Initiatives aiming at such integration may support biological understanding of previous laboratory results, and direct researchers to design new experiments.

There are several existing tools for the generation of mathematical models from KEGG pathway database information 12 (e.g., Büchel et al. 13, Wrzodek et al. 14). We aim at integrating a recently published Cytoscape application, BioNSi (Biological Network Simulator 5) with KEGG, exploiting this convenient simulation environment. For the sake of self-containment of this paper, Box 1 describes BioNSi at a high level. BioNSi was originally used for visualizing biological networks and simulating their time course behavior. A full user guide and a step by step manual can be found on the tool's website ( http://bionsi.wix.com/bionsi).

Box 1. BioNSi - A high level description.

BioNSi is a biological network simulation tool 5 . It allows constructing a regulatory network and simulating its behavior. The network consists of nodes (interactors) and edges (interactions). For example, nodes may represent genes, RNA or transcription factors, while edges represent interactions such as transcription, translation, phosphorylation, etc.

For each node, the user manually defines an initial expression level, termed initial state, an integer typically taken from the range between 0 and 9 (0 – no activity, 9 – high activity). Time in the model is also discrete, represented as clock "tics" or time steps ( t = 0, 1, 2,…). Nodes' states are updated in each time step by a transition rule (see below).

Each edge is assigned a weight which reflects the strength of the interaction between 2 nodes, and is either a positive integer (activation, translation, etc.) or a negative one (repression, inhibition, degradation, etc.). Degradation usually appears as a self-negative edge. The effect of node i on node j is the product of node i's state and the weight of the edge ij. Thus, the strength of this effect is proportional to both the weight of the edge and the relative level of activity of the source node. The "net" effect on a node is the sum of effects of all nodes on it. Figure A shows a "toy" network with 4 nodes (A, B, C, signal; colored rectangles), whose initial states are shown below their names. Edges are labeled with their weights and colored accordingly (green = activation; red = inhibition).

graphic file with name f1000research-6-16110-g0000.jpg

Figure A: a "toy" network in BioNSi

graphic file with name f1000research-6-16110-g0001.jpg

Figure B: graphical presentation of simulation results

A simulation starts with a given configuration of initial states, and consists of the repeated application of a transition rule, simultaneously to all nodes: the state of a node will increase, decrease, or remain unchanged when the "net" effect on it is, respectively, positive, negative or 0. The simulation ends when either a steady state is reached (two consecutive identical configurations of nodes' states), or a loop of configurations is detected.

The tool presents the course of simulation graphically (see Figure B). The network exhibits an infinite loop in which the states of nodes A, B and C repeatedly go up and down, while the signal node is constant at state 1. BioNSi also enables a batch mode, which is used to gather statistics on a set of simulations, to study global properties of the network ( e.g., the distribution of steady states, average time to get to each steady state, etc.).

BioNSi supports various additional features, which are described in the manual on the website. These include (i) delay on edges for non-simultaneous update of nodes, (ii) blocking edges, where nodes X1, X2, … can block the effect of some node Y on node Z at a specific time step. The blocking nodes can act together or each of them separately. And (iii) nodes that represent external signals (hormone injection, sunlight, etc.) with a pre-defined expression pattern whose behavior is pre-set by the user, etc.

Our new developments enable merging of multiple KEGG pathways into a single coherent network under the BioNSi framework. Although the files downloaded from KEGG may omit some protein-protein interactions details, they were found to be useful in presenting networks based on the researcher’s interest. Furthermore, uploading experimental expression data enables conducting a simulation of the network. The initial expression levels of network nodes (genes, proteins, etc.) may be obtained either from in-house experiments or from various expression databases ( e.g.,, GEO 15 or ArrayExpress 16. This allows easy simulation of the resulting network under different experimental conditions (expression values). For example, one can simulate the effect of inhibitors, activators, mutations, abnormal cell conditions or the effect of a new drug on the network's behavior, suggesting further experimental exploration. BioNSi is a Cytoscape application 17. Cytoscape is an open source bioinformatics software platform, which hosts various applications for visualization and analysis of networks. The benefit of this is that the BioNSi simulation results can be further extended with additional analyses based on other applications. The typical workflow with the new version of BioNSi is presented in Figure 1.

Figure 1. The extended BioNSi simulation tool.

Figure 1.

Gene expression data and selected KEGG pathway data-files are uploaded to BioNSi’s Cytoscape Application and analyzed. The resulting network can be modified and tuned. Simulation is performed for gene expression data resulting from various conditions, possibly with addition of artificial perturbations. Differences in the results of simulation under various conditions provide novel biological insights.

We first describe the details of our new extensions to BioNSi. Then, as a proof of concept, we demonstrate the advantages and usability of our tool by presenting the simulation and analysis of previously published experimental data from two sets of experiments: (1) the characterization of inflammatory bowel disease (IBD) conditions, and (2) treatment of MCF7 breast cancer cells with heregulin (HRG). These two test cases may provide insights into how our tool can be used to elucidate existing biological data, raise new hypotheses and suggest new lab experiments to test them.

Methods

Implementation

Cytoscape is an open source software platform for complex network analyses and visualization 17. It enables "plugging in" in the form of dedicated software called applications (Apps), which are developed by users. BioNSi 5 is one such application that provides a user friendly tool for simulating the dynamics of biological regulatory networks. It is implemented under Cytoscape version 3. Installation is easy and explained in the tool's website at http://bionsi.wix.com/bionsi. We have extended BioNSi’s capabilities with two key features presented in the next sections.

Operation

Importing and merging multiple KEGG pathways into BioNSi. The original version of BioNSi allowed creating networks “ de novo”, by adding nodes and edges manually, according to the biological knowledge of the researcher. This may be a tedious task, when the networks are beyond a very small size. The new feature allows easily importing multiple pathway files that were exported from the KEGG database. The pathways are simply downloaded from KEGG as xml files, and merged into a single, integrated network under BioNSi. The user guide on BioNSi's website explains the details for this operation. Briefly, nodes contained in several entities often have alternative pathways are fused into a single node, with all their incoming and outgoing edges. KEGG names are all preserved for each node to allow matching of the same entity with different KEGG name.

The user can specify the desired weights for each KEGG type of edge, such as activation, inhibition, phosphorylation, etc., or rely on the default values provided. As biological materials are routinely being degraded, "self-inhibition" loops are automatically added to each node. Subsequently, the resulting network can be "tuned" by adding or removing specific nodes and edges, or by changing specific parameters ( e.g., the weight of a specific edge). Thus, constructing large-scale integrative networks from known pathways is done with little effort, and enables inspecting biological pathways at a broader biological context.

Importing nodes’ initial states. The second new feature allows the initial nodes' states obtained either from in-house experiments or from various expression databases to be loaded from a simple text file (a two-column comma separated values (csv) text file, in which the left column contains nodes' names and the right one contains the desired initial levels). Such a file can be easily generated from most existing gene expression databases, using e.g., Microsoft Excel. Since existing expression data sources may have different scales ( e.g., logarithmic vs. linear), nodes' initial states are read from the input file and normalized to the range 0–9 for consistency and convenience. Note that the initial states are scaled to 0–9 for the whole network based on the maximal expression level in the network, rather than for the range of each gene separately. Since the scaling depends of the maximal expression level of a node in the network, an extreme outlier may restrict the levels of other nodes to a very limited range. Therefore, the use of logarithmic expression data is recommended.

In addition, we extended BioNSi with a graphical representation of the nodes' levels: nodes' background colors indicate their initial states (higher levels are darker gray). Once a simulation ends, nodes colors are automatically updated to reflect their final state, providing a visual image of the incline or decline in nodes' expression. This new feature allows easy comparison between the network's behavior under different experimental conditions, by loading alternative sets of initial states and conducting a new simulation each time.

Simulation. We explored two use-cases as a proof of concept, as described in the next section. In both use cases, KEGG pathways were integrated to form a BioNSi network, and gene expression analysis was used as initial nodes' states. In both use cases, the states of nodes changed throughout the simulation process, terminating in a steady state. This change in states may reflect one of two situations: (i) the expression data itself does not represent a biological steady state, and therefore the changes in-silico reflect changes in-vivo, and (ii) data is missing from the network, either structural (missing regulators or connections between existing nodes) or quantitative (weights on edges, initial states, etc.). In the latter case the conclusion is that the model needs further improvement.

In the two test cases described in the Use Cases section, we conducted simulations for 100 time-steps. All simulations reached a steady state within this number of steps. In order to compare between simulation results, we had to define when a node is considered to exhibit different behavior in the two simulations. We used a simple Euclidian distance measure between the two simulation trajectories of each node. Furthermore, we filtered out nodes whose difference in the two simulations is merely a result of a different initial state. We did that by considering only nodes for which the difference between the simulations, at some time step, was larger than the difference in the initial state. In other words, the gap between the node's states in the two compared simulations should increase at some point beyond the initial gap, in order for it to be considered differential between the simulations. An implementation of this distance function can be found as a Python script (diff.py) that can be downloaded from the tool's website.

Use cases

Case study 1: Characterization of inflammatory bowel disease (IBD) conditions

IBD are a series of chronic inflammations of the intestine. Pouch surgery is a useful treatment of IBD patients, but pouch inflammation is a very common outcome 18. Crohn’s-like disease of the pouch (CLDP) is the most severe inflammation condition. Activation of NFKB was identified as one of the key regulators inducing IBD inflammation by promoting pro-inflammatory cytokines 19, 20. Parthenolide, a strong NFKB inhibitor, has recently been demonstrated to be a promising therapeutic agent in IBD inflammation, promoting apoptosis of cancer cells 21. We aim at extending the original study analyzing differentially expressed genes and their functional enrichment, to suggest new targets for treatment.

Expression data source. Affymetrix GeneChip (Human Gene 1.0 ST arrays) was used for gene expression analysis of normal and CLDP donors, as previously described 18.

Pathway data. As reported in 18, 74 genes were found to be up-regulated in CLDP vs normal expression (pFDR<0.05 and fold-change difference=5). WebGestalt KEGG pathway enrichment 22 revealed enriched 16 pathways (pBonf<0.05), of which the top 5 were selected for analysis ( Table 2). In addition, IBD pathway was added, although not specifically enriched in the analysis (total of 6 pathways).

Table 2. Five top pathways enriched (adjP) in differentially expressed gene analysis 16, in addition to inflammatory bowel diseases (IBD) pathway.

Top human KEGG pathways were obtained and enrichment p-values presented. Links to KEGG xml file download are provided.

KEGG enrichment KEGG
(hsa)
adjP Link to xml files
Cytokine-cytokine receptor
interaction
04060 1.50e-11 http://www.genome.jp/kegg-bin/show_pathway?hsa04060
Leishmaniasis 05140 6.52e-10 http://www.genome.jp/kegg-bin/show_pathway?hsa05140
Rheumatoid arthritis 05323 3.49e-09 http://www.genome.jp/kegg-bin/show_pathway?hsa05323
Chemokine signaling pathway 04062 1.98e-08 http://www.genome.jp/kegg-bin/show_pathway?hsa04062
Hematopoietic cell lineage 04640 7.13e-06 http://www.genome.jp/kegg-bin/show_pathway?hsa04640
IBD 05321 Not enriched http://www.genome.jp/kegg-bin/show_pathway?hsa05321

NetworkAnalyzer out-degree analysis. Network analysis was performed using NetworkAnalyzer tool, which is another built-in Cytoscape application 23. We used it in order to visualize the most out-connected (notably high out-degree) genes in the network. The most outstanding such node is NFKB1 (19 outgoing edges, see Table 3). Therefore, an additional NFKB signaling pathway ( hsa04064) was added from KEGG. Altogether, 7 KEGG pathways were analyzed, composed of 379 nodes (mostly genes).

Table 3. Node distribution by out degree score.

Cytoscape’s NetworkAnalyzer tool was used to present out-degree score. NFKB1 has an outstanding out-degree score.

Out-degree 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Number of nodes 0 138 155 51 18 11 2 3 0 0 0 0 0 0 0 0 0 0 0 1 (NFKB1)

In the resulting merged network ( Figure 2), in addition to genes, compounds and complexes were also detected (presented as triangles and hexagons, respectively). These were automatically assigned expression value 0 as no information could be found for them in the expression data.

Figure 2. The main component of the CLDP test case network.

Figure 2.

We show only 241 out of 379 genes, which belong to the main connected subnetwork resulting from 7 KEGG pathways (excluding subnetworks of 8 nodes or less). We used Cytoscape’s hierarchical layout, in which nodes with no outgoing edges appear at the top. Gene’s fill color is according to control expression values (grayscale), except nodes that are different between the simulation of CLDP and control expression, which are marked in red, and NFKB1 which is colored turquoise. Edges are colored according to type of interaction (activation–green; inhibition-red). For convenience, genes or proteins are represented as rectangles, compounds as triangles and complexes (dimers) are hexagons.

Edge weights. We used edge weights as following: inhibition -1, activation +2. As biological materials are being degraded over time, self-inhibition edges (weight -1) were automatically added for each node in the network. Accordingly, dephosphorylation (-p) edges are set to -1. Activation edges, including phosphorylation (+p) were set to +2, to compensate for the degradation edges automatically added to all nodes.

Results. We aim at comparing between two simulations: 1) Control gene expression; 2) CLDP gene expression. The network consists of 379 nodes. The comparison revealed 49 nodes that exhibited different simulation trajectories (genes, compounds and complexes; 12.9% of all nodes in network). 87.8% of these (43 nodes) were downstream in the network, namely, had no outgoing edges (except for self-loops that all nodes have). To visualize this we used Cytoscape’s hierarchical layout, as shown in Figure 2. The nodes that differed between the two simulations are colored red. The hierarchical layout positions these nodes at the top of the network. NFKB1 gene (highlighted in turquoise) is highly connected within the network, but not to any of the selected (red) genes.

A large fraction of the network’s genes is not directly connected to the main network, but forms small independent sub-networks (138 nodes with 8 nodes or less). This is a result of selecting only 7 KEGG pathways for the analysis. Naturally, the analysis is highly dependent on the KEGG pathways selected. Disconnected sub-networks may be connected to the main network following the inclusion of additional, less enriched KEGG pathways. However, these subnetworks follow the same pattern seen in the main network, and all of the nodes differing in the 2 simulations are at the top of the hierarchy ( Supplement 1; http://www.cs.tau.ac.il/~amirr/files/supp/testcase1_files.zip). For clarity, Figure 2 shows only the main connected subnetwork (241 nodes; 138 nodes were removed from Figure 2, all harboring connected subnetworks with 8 or less nodes, presented in Supplement 1).

In terms of biological function, most of the 49 nodes that exhibit differences between the simulations are of genes that are either immune system receptors (receptors for chemokines, cytokines, interferons, interleukins, PDGF, growth factors (MET) etc.), or various kinases (BTK, MAPK14, MAP3K7, IKBKB, FLT3 and FLT4). Many therapeutic agents are offered based on such molecules 24. Several cytokines (that may be targeted by JAK inhibition), were found within the BioNSi resulting nodes that exhibited different simulation trajectories: IL6ST, IL6R and OSMR (gp130 family member), IL10RB (IL10 members), IL2RB, IL7R (γc family) and CSF1R (βc family), and IFNAR1, IFNAR2 (interferon related genes) 25. This analysis may suggest interesting new targets for drug response.

Case study 2: Treatment of MCF7 breast cancer cells with heregulin (HRG)

The purpose of the following example is to demonstrate the effect of HRG treatment on MCF7 breast cancer cells, based on published GEO experimental results ( GSE6462; 24, 25. The simulation results suggest that BioNSi is capable of recapitulating the major changes in the network after drug treatment, as observed by in vitro experiment.

Expression data source. MCF7 breast cancer cells were treated with a growth hormone, HRG, at four different doses and different time course (GSE6462). We used expression data after maximal treatment of 10 nM HRG for 90 minutes. In the experiment, the mRNA expression levels were measured for control (GSM148517) and HRG treated (GSM148572) cells (using Affymetrix Human Genome U133A 2.0 Arrays).

Pathway data. KEGG-xml files for three selected human pathways were exported from the KEGG database 12: ERBB signaling pathway ( hsa04012); MAPK signaling pathway ( hsa04010); EGFR cytosine kinase inhibitor resistance pathway ( hsa01521). These three representative pathways were reported to be crucial for the HRG treatment response 26. The merged network based on these three KEGG pathways consisted of 190 nodes.

Setting network parameters and post-import modification. We used edge weights as follows: inhibition -1, activation +2. As biological materials are being degraded over time, self-inhibition edges (weight -1) were automatically added for each node in the network. Accordingly, dephosphorylation (-p) edges are set to -1. Activation edges (including phosphorylation; +p) were set to +2, to compensate for the degradation edges automatically added to all nodes. A node representing HRG treatment was added manually (diamond shape), either with initial state 0 (without drug), or with initial state 9 (active drug). The HRG node was directly connected to its known target nodes, ErbB3 and ErbB4, in addition to their homodimers nodes, with strong inhibition edges (weight -10 each). Protein homodimers (resulting from KEGG database) were given expression values similar to their monomer molecules (from the microarray mRNA expression results). Heterodimers were given initial state 0, as we have no data concerning their actual expression.

Simulations. A useful application of the BioNSi tool is to compare between the simulation results under different nodes' initial states (reflecting different biological conditions). As illustrated in Figure 3, we compared simulations of control (A) vs HRG treatment (B) gene expression, in addition to simulation of artificial HRG effect on control gene expression (C).

Figure 3. Schematic illustration of BioNSi simulations performed.

Figure 3.

Expression data from GSE6462 experiment was obtained for Control MCF7 breast cancer cells, without ( A) or with ( B) HRG treatment. ( C) BioNSi simulation of artificial HRG effect performed on control expression.

In order to simulate artificial HRG treatment effects on the network, we used control expression values as initial states, and the HRG node at initial state 9. Network’s simulation under these conditions resulted in a complete and rapid decline of ErbB3, ErbB4 and their homodimer expression values.

Results. We aimed at comparing between two simulation sets, based on the same network created: (1) Simulation that compares between HRG gene expression vs. control gene expression. The comparison between these simulations revealed 132 different genes (69.5% of all genes in the network (190)) whose expression pattern differs between the simulations of HRG vs Control. (2) Simulation of HRG artificial treatment effect on the same control gene expression, vs. control expression. This comparison revealed 33 genes (17% of the total 190 in network) that were found to be different. Figure 4 shows a Venn diagram that demonstrates these findings. 28 genes are common to both analyses. Artificial HRG treatment results (28 genes) represent ~21% of the experimental results (132 genes). 5 nodes were found to be uniquely changed by the prediction analysis, including the obviously changed (ERBB3, ERBB4 and ERBB4-ERBB4, which are directly inhibited by HRG simulation prediction, in addition to STAT5A).

Figure 4. Venn diagram comparing simulation of HRG expression microarray results with simulation of artificial HRG treatment prediction.

Figure 4.

28 genes were common to both analyses (pink). 5 genes were unique to simulation (orange), and 104 genes were unique to experimental analysis (blue).

Interesting observations concerning the resulting genes can be observed in the BioNSi screenshot of the network in Figure 5. All the 33 genes (framed pink and orange), resulting from the artificial HRG treatment simulation analysis compared to control, are highly connected, especially downstream of the network. This suggests an interesting signal path of the HRG drug response, which may offer new targets for treatment. We note that the PI3K/AKT/mTOR and STAT pathway genes are predominantly changed following HRG artificial treatment prediction, pathways that may affect cell proliferation and apoptosis, as recently suggested 5, 26.

Figure 5. HRG network view.

Figure 5.

A network of 190 genes consisting of 3 KEGG pathways (described in the text), using control expression values is shown. Gene frames color code: common genes to the two analyses (pink), unique to prediction of artificial HRG treatment vs control (orange), unique to simulation resulting from HRG treatment vs control expression analysis (blue) or unchanged (black). The network shows genes with nodes fill color grayscale according to expression values and edge colors according to type of interaction (activation–green; inhibition-red). For convenience, genes or proteins are represented as rectangles, compounds as triangles and complexes (dimers) are hexagons. HRG treatment is presented as orange-framed diamond.

Conclusions

The bio-technological revolution we are witnessing in recent decades enables generating unprecedented amounts of biological data of various types. This includes expression levels of RNA and proteins, and the interactions between them and other substances, such as nutrients, hormones, external stimuli etc. While these data are accessible to researchers, their analyses at the system level are less so. Extracting specific and local information is easy, but performing large scale network analyses requires computational approaches, such as simulation tools. However, existing network simulation tools are often meant for manual construction by the researcher of the network under study, and therefore remain inefficient for studying networks beyond a small scale. This is extremely unfortunate in an era when computational approaches to study biological systems are proliferating. Network simulations are one such important computational approach, which has proven efficient in providing new insights, shedding light on existing results, and suggesting new directions for exploration, both in the lab and on the computer. In order for simulation tools to become an integral part of the researchers' lab routine, tools that enable simulation of large scale network data are called for. Such tools must avoid a tedious network construction process, and allow a high level representation of networks to be generated from existing data semi-automatically with little effort and no computational skills.

In this paper we suggest an extension to an existing network simulation tool, BioNSi, that allows researchers to import multiple pathway data from the KEGG database into a single network representation. The tool then allows for the study of the network via simulations, under different biological conditions. We believe such an approach will foster the use of computational tools in the lab by researchers with little or no computational background.

Software and data availability

Tool requirements: Cytoscape 3.x, JAVA 8

Tool download: http://bionsi.wix.com/bionsi (a full user guide and a step by step manual can be found at the tool's website)

Tool source code: http://doi.org/10.5281/zenodo.1065352 27

Archived source code as at time of publication: http://doi.org/10.5281/zenodo.1065352 27

License: CC BY 4.0

Data source for test case 1: https://www.ncbi.nlm.nih.gov/pubmed/24108111

Data source for test case 2: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE6nnn/GSE6462/matrix/GSE6462_series_matrix.txt.gz

Acknowledgements

We thank Varda Wexler from the Multimedia Unit in the Faculty of Life Sciences, Tel Aviv University, for creating the BioNSi logo, and Benny Chor from the School of Computer Science at Tel Aviv University for useful comments during manuscript preparation.

Funding Statement

The author(s) declared that no grants were involved in supporting this work.

[version 2; referees: 2 approved

Supplementary material

Test case 1 files: http://www.cs.tau.ac.il/~amirr/files/supp/testcase1_files.zip

Test case 2 files: http://www.cs.tau.ac.il/~amirr/files/supp/testcase2_files.zip

References

  • 1. Barabási AL, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–13. 10.1038/nrg1272 [DOI] [PubMed] [Google Scholar]
  • 2. Mendes P, Hoops S, Sahle S, et al. : Computational modeling of biochemical networks using COPASI.In Methods Mol Biol. 2009;500:17–59. 10.1007/978-1-59745-525-1_2 [DOI] [PubMed] [Google Scholar]
  • 3. Li F, Long T, Lu Y, et al. : The yeast cell-cycle network is robustly designed. Proc Natl Acad Sci U S A. 2004;101(14):4781–6. 10.1073/pnas.0305937101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rubinstein A, Gurevich V, Kasulin-Boneh Z, et al. : Faithful modeling of transient expression and its application to elucidating negative feedback regulation. Proc Natl Acad Sci U S A. 2007;104(15):6241–6. 10.1073/pnas.0611168104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Rubinstein A, Bracha N, Rudner L, et al. : BioNSi: A Discrete Biological Network Simulator Tool. J Proteome Res. 2016;15(8):2871–80. 10.1021/acs.jproteome.6b00278 [DOI] [PubMed] [Google Scholar]
  • 6. Zheng J, Zhang D, Przytycki PF, et al. : SimBoolNet--a Cytoscape plugin for dynamic simulation of signaling networks. Bioinformatics. 2010;26(1):141–2. 10.1093/bioinformatics/btp617 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Naldi A, Berenguier D, Fauré A, et al. : Logical modelling of regulatory networks with GINsim 2.3. Biosystems. 2009;97(2):134–9. 10.1016/j.biosystems.2009.04.008 [DOI] [PubMed] [Google Scholar]
  • 8. Di Cara A, Garg A, De Micheli G, et al. : Dynamic simulation of regulatory networks using SQUAD. BMC Bioinformatics. 2007;8(1):462. 10.1186/1471-2105-8-462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Schwab J, Burkovski A, Siegle L, et al. : ViSiBooL-visualization and simulation of Boolean networks with temporal constraints. Bioinformatics. 2017;33(4):601–604. 10.1093/bioinformatics/btw661 [DOI] [PubMed] [Google Scholar]
  • 10. Schivo S, Scholma J, van der Vet PE, et al. : Modelling with ANIMO: between fuzzy logic and differential equations. BMC Syst Biol. 2016;10(1):56. 10.1186/s12918-016-0286-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Bock M, Scharp T, Talnikar C, et al. : BooleSim: an interactive Boolean network simulator. Bioinformatics. 2014;30(1):131–2. 10.1093/bioinformatics/btt568 [DOI] [PubMed] [Google Scholar]
  • 12. Kanehisa M, Furumichi M, Tanabe M, et al. : KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(D1):D353–D361. 10.1093/nar/gkw1092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Büchel F, Rodriguez N, Swainston N, et al. : Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC Syst Biol. 2013;7:116. 10.1186/1752-0509-7-116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Wrzodek C, Büchel F, Ruff M, et al. : Precise generation of systems biology models from KEGG pathways. BMC Syst Biol. 2013;7:15. 10.1186/1752-0509-7-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Barrett T, Wilhite SE, Ledoux P, et al. : NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(Database issue):D991–5. 10.1093/nar/gks1193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Kolesnikov N, Hastings E, Keays M, et al. : ArrayExpress update--simplifying data submissions. Nucleic Acids Res. 2015;43(Database issue):D1113–6. 10.1093/nar/gku1057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Su G, Morris JH, Demchak B, et al. : Biological network exploration with Cytoscape 3. Curr Protoc Bioinformatics. 2014;47:8.13.1–24. 10.1002/0471250953.bi0813s47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Ben-Shachar S, Yanai H, Baram L, et al. : Gene expression profiles of ileal inflammatory bowel disease correlate with disease phenotype and advance understanding of its immunopathogenesis. Inflamm Bowel Dis. 2013;19(12):2509–21. 10.1097/01.MIB.0000437045.26036.00 [DOI] [PubMed] [Google Scholar]
  • 19. Atreya R, Neurath MF: New therapeutic strategies for treatment of inflammatory bowel disease. Mucosal Immunol. 2008;1(3):175–82. 10.1038/mi.2008.7 [DOI] [PubMed] [Google Scholar]
  • 20. Ahmed S, Dewan MZ, Xu R: Nuclear factor-kappaB in inflammatory bowel disease and colorectal cancer. Am J Dig Dis. 2014;1(2):84–96. Reference Source [Google Scholar]
  • 21. Kim Y, Kim TK, Kim Y, et al. : Principal network analysis: identification of subnetworks representing major dynamics using gene expression data. Bioinformatics. 2011;27(3):391–8. 10.1093/bioinformatics/btq670 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Wang J, Duncan D, Shi Z, et al. : WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 2013;41(Web Server issue):W77–83. 10.1093/nar/gkt439 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Doncheva NT, Assenov Y, Domingues FS, et al. : Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc. 2012;7(4):670–85. 10.1038/nprot.2012.004 [DOI] [PubMed] [Google Scholar]
  • 24. Neurath M: Current and emerging therapeutic targets for IBD. Nat Rev Gastroenterol Hepatol. 2017;14(11):688. 10.1038/nrgastro.2017.138 [DOI] [PubMed] [Google Scholar]
  • 25. De Vries LCS, Wildenberg ME, De Jonge WJ, et al. : The Future of Janus Kinase Inhibitors in Inflammatory Bowel Disease. J Crohns Colitis. 2017;11(7):885–93. 10.1093/ecco-jcc/jjx003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Nagashima T, Shimodaira H, Ide K, et al. : Quantitative transcriptional control of ErbB receptor signaling undergoes graded to biphasic response for cell differentiation. J Biol Chem. 2007;282(6):4045–56. 10.1074/jbc.M608653200 [DOI] [PubMed] [Google Scholar]
  • 27. Rubinstein A: BioNSi - Biological Network Simulation Tool (Version 1.2). Zenodo. 2017. Data Source [Google Scholar]
F1000Res. 2018 Jun 14. doi: 10.5256/f1000research.16110.r34005

Referee response for version 2

Ester Feldmesser 1, Bareket Dassa 2

We believe that the extension tools for BioNSi, which are presented in the paper, provide useful network tools for biologists. Still, we suggest several improvements that are listed below.

  1. The feature to compare between two networks is useful for the user, however more detailed explanations on how to work with it and how to visualize the differences between networks are missing. We would like to suggest that this feature will be integrated in the BioNSi application.

  2. We would like to suggest that manual changes in the node states in the visualized network will affect the resulting node color in the simulation.

  3. Uploading an expression data file with negative expression values (log2) is currently not applicable and should be corrected.

  4. The names of the edges in the “Edge Table” panel are not informative and it is difficult to relate between the graphics and the table. In the example “toy1” network, there are two edges with the name “Edge_0” and one with “Edge_1”.

  5. The option to “Destroy Network” exists in two places in the top menu: in the “BioNSi” tab and in the “Edit” tab. Please note that it worked properly only when activated from the “BioNSi” tab.

We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2018 Jun 11. doi: 10.5256/f1000research.16110.r33996

Referee response for version 2

Aurélien Naldi 1

The revised article answered most of my reservations with the first version, some minor clarifications are still needed.

The sentence explaining the merging process of multiple pathways is not clear and seems to merge two different sentences: "nodes contained in several entities often have alternative pathways are fused into  single node". What happens if gene A is associated to genes B and C in one pathway and to genes C and D in another? Will we obtain:

  • a single node merging A,B,C, and D?

  • a node merging A and C, and separate nodes for B and D?

  • 4 different nodes?

I assume the authors selected the later solution, which facilitates the mapping of external data (as each node corresponds to a single biological entity), but increases the complexity of the network. Is that correct?

Regarding the mapping of external data, I understand from the answer of the authors that the provided csv file will map official gene symbols, HGNC IDS, as well as KEGG-IDs. It is clearly stated in the documentation, but could deserve a comment in the paper, along with the fact that unmapped nodes will be assigned the value 0.

While the misleading use if asynchronous in table 1 has been fixed, the text referring to this table in the introduction has not been updated.

A closing parenthesis is missing on page 4: "(e.g. GEO or ArrayExpress".

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2018 Feb 21. doi: 10.5256/f1000research.14379.r29782

Referee response for version 1

Giovanni Scardoni 1

I tried to simulate the toy example from the BioNSi website, following the step by step guide.

It doesn't work. Tried some other examples (cellcyclesB, Testcase1_network) and no results.

Maybe there is a bug (I am using Cytoscape 3.6.0) or there is something I'm missing. In the second case the step by step guide should be improved.

This doesn't allow me to provide a complete evaluation of the tool and the paper that seems interesting and well written. I will be happy to reconsider my review once such problems are solved.

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

F1000Res. 2018 Mar 6.
Amir Rubinstein 1

Dear Giovanni,

​Thank you for the prompt feedback.

After carefully reviewing the data again, we found a bug in opening the toy.cys network on some combinations of Cytoscape, Java and operating systems. We are working on fixing it. In the meantime, we double checked and the csv format should work properly on all versions. 

Therefore, we temporarily removed the cys files from the examples tab, leaving only the csv files. In addition, we slightly changed the step-by-step pdf manual to match these changes.

We hope this will allow a complete evaluation of the new version of BioNsi.

Thank you,

Amir

F1000Res. 2018 Feb 5. doi: 10.5256/f1000research.14379.r29573

Referee response for version 1

Aurélien Naldi 1

This paper presents an extension of the BioNSi modeling software with two aims: (i) kickstart model construction by integrating KEGG pathways and (ii) use experimental expression data as initial states for the simulation. These aims make sense: facilitating the initial construction and evaluation of dynamical models is an important challenge in systems biology. However, I feel that details on the inner working and discussion of the limitations of this tool are missing from the paper. Some related work should also be taken into account. In particular Wrzodek et al. 1 and Büchel et al. 2 which generate qualitative models in SBML qual format 3 from KEGG pathways. My main concerns are detailed in the following text, none of them would be blocking taken separately, but my overall interpretation is that this work requires deep changes to become convincing.

I installed the BioNSi tool in Cytoscape 3.4. The documentation states that building a network from KEGG pathways requires the selection of a folder containing one or several KEGG xml files and a CSV file with expression data. I tried to provide a folder containing two KEGG pathways and an empty CSV file, but it did not work (error message: "number of expression files is not 1"). I had the same problem with Cytoscape 3.6.

KEGG pathways often "merge" multiple biological entities in a single node, the paper does not say how this case is handled, the documentation only states that ellipsis dots are removed from the name. Does it mean that these groups are preserved? How does this affect merging multiple KEGG pathways? It can be a concern as the same entity could be involved in different groupings in different KEGG pathways.

More generally, it would make sense to separate the transformation of KEGG pathways into discrete models (taking into account previous work 1 , 2), and the merging of multiple discrete models, which is a more general problem, and could be used to integrate several existing models with new models generated from pathway maps.

The import of initial states from experimental data, and in particular the discretization step deserve further discussion. For a generic tool, I suspect that it uses a simple method where the full range of values for all genes is scaled to 0-9, defining 9 evenly-spaced thresholds. Is it the case or are the thresholds based on the range of each gene separately? As mapping experimental data on the model is one of the two core aims of this paper, it should at least review the alternatives, give proper details on how this is handled and state that alternative mapping can be integrated as a preprocessing step.

Mapping entities in the model to entities in the experimental data is not discussed either, does the tool convert KEGG identifiers to official gene names, does it require the user to do the mapping in advance, does it provide a GUI to fine-tune the mapping afterwards?

The paper does not state what is the objective of using an experimentally measured values as initial state. The experiments presented here seem to represent "steady states" of the biological system, the simulation leads to a steady state which can be compared to the experimentally measured one. Differences between the experimental and computed states can be of two types: (i) inconsistency showing that the model needs further improvement, and (ii) effect that is not measured in the experiment: for example change of phosphorylation which is not measured by expression data. As this is the very core of the paper, a discussion on the possible biological meaning of the results seems required.

The BioNSi tool, which this work extends, supports simulations of qualitative (discrete) models. Its capabilities are summarized in this paper, and it is briefly compared to other tools for qualitative modelling in Table 1. A minor typo in this table replaced BooleSim with BoolSim (there is also a tool called BoolSim in this field: it is the command line tool handling the Boolean part of SQUAD). The paper describes BioNSi (and some other tools) as doing synchronous and asynchronous simulations, but misuses the term "asynchronous". Here asynchronous refers to synchronous with delays or priorities (i.e. all transitions do not happen at the same speed, but are still using the same clock), however it usually means independent trajectories, leading to non-deterministic behaviour. Using the classical asynchronous updating of qualitative models, a state can have several separate successors, each corresponding to the update of a single component. This allows to study systems where the relative speeds of transitions is not known, but introduces numerous concurrent trajectories, making the analysis much harder. GINsim is the only tool in the provided list implementing asynchronous updating, it also supports priorities, but not delays. BoolNet 4, PyBoolNet 5, and Boolsim 6 support asynchronous updating. To handle this complexity, some tools such as BooleanNet 7 or MaBoSS 8 perform random walk in the asynchronous dynamics.

In BioNSi, inhibitions have negative weights, while activations have positive weights. These weights are multiplied by the level of their source nodes, the resulting weighted sum is then used to define the evolution of the target element. While this allows to quickly define a model, I am concerned by the lack of flexibility: some inhibitors only block the effect of one of the activators, some regulators have a cooperative effect, some act independently. I agree that the simple weight system used in BioNSi is convenient when these fine-grained mechanisms are unknown: it provides a good first model. However, the KEGG pathways mostly describe processes which have been deeply studied, and for which such an approximation could be avoided. There is some value in a simple system based on weights, especially as it fits better for integration into cytoscape, but building more precise functions through the use of intermediate nodes representing cooperative effects should at least be discussed as a common refinement.

Some minor points:

  • Missing closing of 2 parenthesis on page 2, another on page 3

  • Page 6: "138 nodes with 8 nodes or less" is confusing. I suspect it means 138 nodes within sub-networks of 8 nodes or less?

  • The first paragraph on page 6 ("simulation of different experimental conditions") repeats the last paragraph of the methods.

I have read this submission. I believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

References

  • 1. Wrzodek C, Büchel F, Ruff M, Dräger A, Zell A: Precise generation of systems biology models from KEGG pathways. BMC Syst Biol.2013;7: 10.1186/1752-0509-7-15 15 10.1186/1752-0509-7-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Büchel F, Rodriguez N, Swainston N, Wrzodek C, Czauderna T, Keller R, Mittag F, Schubert M, Glont M, Golebiewski M, van Iersel M, Keating S, Rall M, Wybrow M, Hermjakob H, Hucka M, Kell DB, Müller W, Mendes P, Zell A, Chaouiya C, Saez-Rodriguez J, Schreiber F, Laibe C, Dräger A, Le Novère N: Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC Syst Biol.2013;7: 10.1186/1752-0509-7-116 116 10.1186/1752-0509-7-116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Chaouiya C, Bérenguier D, Keating SM, Naldi A, van Iersel MP, Rodriguez N, Dräger A, Büchel F, Cokelaer T, Kowal B, Wicks B, Gonçalves E, Dorier J, Page M, Monteiro PT, von Kamp A, Xenarios I, de Jong H, Hucka M, Klamt S, Thieffry D, Le Novère N, Saez-Rodriguez J, Helikar T: SBML qualitative models: a model representation format and infrastructure to foster interactions between qualitative modelling formalisms and tools. BMC Syst Biol.2013;7: 10.1186/1752-0509-7-135 135 10.1186/1752-0509-7-135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Müssel C, Hopfensitz M, Kestler HA: BoolNet--an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics.2010;26(10) : 10.1093/bioinformatics/btq124 1378-80 10.1093/bioinformatics/btq124 [DOI] [PubMed] [Google Scholar]
  • 5. Klarner H, Streck A, Siebert H: PyBoolNet: a python package for the generation, analysis and visualization of boolean networks. Bioinformatics.2017;33(5) : 10.1093/bioinformatics/btw682 770-772 10.1093/bioinformatics/btw682 [DOI] [PubMed] [Google Scholar]
  • 6. Garg A, Di Cara A, Xenarios I, Mendoza L, De Micheli G: Synchronous versus asynchronous modeling of gene regulatory networks. Bioinformatics.2008;24(17) : 10.1093/bioinformatics/btn336 1917-25 10.1093/bioinformatics/btn336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Albert I, Thakar J, Li S, Zhang R, Albert R: Boolean network simulations for life scientists. Source Code Biol Med.2008;3: 10.1186/1751-0473-3-16 16 10.1186/1751-0473-3-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Stoll G, Caron B, Viara E, Dugourd A, Zinovyev A, Naldi A, Kroemer G, Barillot E, Calzone L: MaBoSS 2.0: an environment for stochastic Boolean modeling. Bioinformatics.2017;33(14) : 10.1093/bioinformatics/btx123 2226-2228 10.1093/bioinformatics/btx123 [DOI] [PubMed] [Google Scholar]
F1000Res. 2018 May 4.
Amir Rubinstein 1

Dear Aurélien,

We greatly appreciate the detailed report. Below please find our point-by-point reply. Changes were made to the manuscript in its 2nd version. To ease reading of the responses below, we copy-pasted the comments in the report as well. 

---------------------------------------

Some related work should also be taken into account. In particular Wrzodek et al. 1and Büchel et al. 2 which generate qualitative models in SBML qual format 3 from KEGG pathways. My main concerns are detailed in the following text, none of them would be blocking taken separately, but my overall interpretation is that this work requires deep changes to become convincing.

 

  • Regarding the additional resources, they were added to the Introduction. Our focus is at integrating a Cytoscape application (BioNSi) with KEGG, exploiting this convenient simulation environment.

I installed the BioNSi tool in Cytoscape 3.4. The documentation states that building a network from KEGG pathways requires the selection of a folder containing one or several KEGG xml files and a CSV file with expression data. I tried to provide a folder containing two KEGG pathways and an empty CSV file, but it did not work (error message: "number of expression files is not 1"). I had the same problem with Cytoscape 3.6.

 

  • In the current version of the tool the csv expression file should not be empty. This is obviously something to fix in the next version. In the meantime, we provide such a file with arbitrary initial levels of 9. The file can be downloaded from the website and from http://www.cs.tau.ac.il/~amirr/files/expression.csv.

  • Both Cytoscape 3.4 and 3.6 should work fine.

KEGG pathways often "merge" multiple biological entities in a single node, the paper does not say how this case is handled, the documentation only states that ellipsis dots are removed from the name. Does it mean that these groups are preserved? How does this affect merging multiple KEGG pathways? It can be a concern as the same entity could be involved in different groupings in different KEGG pathways.

 

  • BioNSi preserves all the alternative names for a KEGG entity. When merging pathways, even if two nodes have different alternative names, they will be fused into a single node. Furthermore, names of entities in the expression level file are matched against any of the alternative names of a node in the network. This information was emphasized in the new version of the user manual and paper.

More generally, it would make sense to separate the transformation of KEGG pathways into discrete models (taking into account previous work 1 , 2), and the merging of multiple discrete models, which is a more general problem, and could be used to integrate several existing models with new models generated from pathway maps.

 

  • Such separation of functionality indeed makes a lot of sense. It may be implemented as part of a later version of the tool.  

The import of initial states from experimental data, and in particular the discretization step deserve further discussion. For a generic tool, I suspect that it uses a simple method where the full range of values for all genes is scaled to 0-9, defining 9 evenly-spaced thresholds. Is it the case or are the thresholds based on the range of each gene separately? As mapping experimental data on the model is one of the two core aims of this paper, it should at least review the alternatives, give proper details on how this is handled and state that alternative mapping can be integrated as a preprocessing step.

 

  • The initial levels are scaled to 0-9 for all the nodes. The following was added to the paper: "Note that the initial states are scales to 0-9 for the whole network based on the maximal expression level in the network, rather than for the range of each gene separately. Since the scaling depends of the maximal expression level of a node in the network, an extreme outlier may restrict the levels of other nodes to a very limited range. Therefore, the use of logarithmic expression data is recommended, and when the raw data in not logarithmic a simple transformation is required as a preprocessing step."  

Mapping entities in the model to entities in the experimental data is not discussed either, does the tool convert KEGG identifiers to official gene names, does it require the user to do the mapping in advance, does it provide a GUI to fine-tune the mapping afterwards?

 

  • We use KEGG xml files which include the complete array of gene\protein names available, including official gene symbol, HGNC id, KEGG id and others. In BioNSi, we use all gene symbols for each entry and therefore, identifiers are not being lost. For the graphical presentation, the first identifier is being used.

The paper does not state what is the objective of using an experimentally measured values as initial state. The experiments presented here seem to represent "steady states" of the biological system, the simulation leads to a steady state which can be compared to the experimentally measured one. Differences between the experimental and computed states can be of two types: (i) inconsistency showing that the model needs further improvement, and (ii) effect that is not measured in the experiment: for example change of phosphorylation which is not measured by expression data. As this is the very core of the paper, a discussion on the possible biological meaning of the results seems required.

  • We added such a discussion to the Methods section ( Simulation): "We explored two use-cases as a proof of concept, as described in the next section. In both use cases, KEGG pathways were integrated to form a BioNSi network, and gene expression analysis was used as initial nodes' states. In both use cases, the states of nodes changed throughout the simulation process, terminating in a steady state. This change in states may reflect one of two situations: (i) the expression data itself does not represent a biological steady state, and therefore the changes in-silico reflect changes in-vivo, and (ii) data is missing from the network, either structural (missing regulators or connections between existing nodes) or quantitative (weights on edges, initial states, etc.). In the latter case the conclusion is that the model needs further improvement. "  

 

The BioNSi tool, which this work extends, supports simulations of qualitative (discrete) models. Its capabilities are summarized in this paper, and it is briefly compared to other tools for qualitative modelling in Table 1. A minor typo in this table replaced BooleSim with BoolSim (there is also a tool called BoolSim in this field: it is the command line tool handling the Boolean part of SQUAD).

  • Fixed.

 

The paper describes BioNSi (and some other tools) as doing synchronous and asynchronous simulations, but misuses the term "asynchronous". Here asynchronous refers to synchronous with delays or priorities (i.e. all transitions do not happen at the same speed, but are still using the same clock), however it usually means independent trajectories, leading to non-deterministic behaviour. Using the classical asynchronous updating of qualitative models, a state can have several separate successors, each corresponding to the update of a single component. This allows to study systems where the relative speeds of transitions is not known, but introduces numerous concurrent trajectories, making the analysis much harder. GINsim is the only tool in the provided list implementing asynchronous updating, it also supports priorities, but not delays. BoolNet 4, PyBoolNet 5, and Boolsim 6 support asynchronous updating. To handle this complexity, some tools such as BooleanNet 7 or MaBoSS 8 perform random walk in the asynchronous dynamics.

  • We changed "asynchronous" to "non-simultaneous", and added this column to Table 1. 

 

In BioNSi, inhibitions have negative weights, while activations have positive weights. These weights are multiplied by the level of their source nodes, the resulting weighted sum is then used to define the evolution of the target element. While this allows to quickly define a model, I am concerned by the lack of flexibility: some inhibitors only block the effect of one of the activators, some regulators have a cooperative effect, some act independently. I agree that the simple weight system used in BioNSi is convenient when these fine-grained mechanisms are unknown: it provides a good first model. However, the KEGG pathways mostly describe processes which have been deeply studied, and for which such an approximation could be avoided. There is some value in a simple system based on weights, especially as it fits better for integration into cytoscape, but building more precise functions through the use of intermediate nodes representing cooperative effects should at least be discussed as a common refinement.

 

  • BioNSi (already in the original version) does support additional regulation effects in the form of blocking and enabling edges and external signal nodes. These are only mentioned without details in the end of Box 1. We added a brief description of these in box 1.

Some minor points:

Missing closing of 2 parenthesis on page 2, another on page 3

  • We cannot find these missing parenthesis, we would appreciate a clarification.

Page 6: "138 nodes with 8 nodes or less" is confusing. I suspect it means 138 nodes within sub-networks of 8 nodes or less?

  • Fixed.

The first paragraph on page 6 ("simulation of different experimental conditions") repeats the last paragraph of the methods.

  • We deleted the redundant paragraph from the results.


Articles from F1000Research are provided here courtesy of F1000 Research Ltd

RESOURCES