Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Jun 3;17(6):e0269497. doi: 10.1371/journal.pone.0269497

A semantics, energy-based approach to automate biomodel composition

Niloofar Shahidi 1,*, Michael Pan 2,3,4, Kenneth Tran 1, Edmund J Crampin 2,3,4,5, David P Nickerson 1
Editor: Lutz Brusch6
PMCID: PMC9165793  PMID: 35657966

Abstract

Hierarchical modelling is essential to achieving complex, large-scale models. However, not all modelling schemes support hierarchical composition, and correctly mapping points of connection between models requires comprehensive knowledge of each model’s components and assumptions. To address these challenges in integrating biosimulation models, we propose an approach to automatically and confidently compose biosimulation models. The approach uses bond graphs to combine aspects of physical and thermodynamics-based modelling with biological semantics. We improved on existing approaches by using semantic annotations to automate the recognition of common components. The approach is illustrated by coupling a model of the Ras-MAPK cascade to a model of the upstream activation of EGFR. Through this methodology, we aim to assist researchers and modellers in readily having access to more comprehensive biological systems models.

1 Introduction

Modelling complex biological systems such as cells, organs, and organisms allows researchers to integrate and study different aspects of a biological entity, reveal the limits and shortcomings of our knowledge, and obtain new insights into disease treatment [1]. Motivated by such aims, hierarchical modelling is an approach that assists researchers in constructing system-level models, which are continuously expanding in detail, scope, and size [2]. An approach to automatically and hierarchically construct sophisticated models of biology that ultimately leads to the generation of biologically and physically correct models is currently missing.

Hierarchical models are composed of pre-existing smaller models, referred to as modules. Each module can operate and be examined independently, thus reducing model composition errors and facilitating large-scale model generation due to pre-existing models. To accelerate hierarchical model composition, one can capitalise on a myriad of the existing modules created by others. A requirement for this is that the modules be both accessible and reusable [2]. Over the past decade biosimulation models have become increasingly accessible on public repositories such as the Physiome Model Repository (PMR) [3] and BioModels [4], which store models in XML-based format such as CellML [5] and the Systems Biology Markup Language (SBML) [6, 7].

The main challenges in hierarchical model composition include: (a) incompatible code languages (using dramatically different modelling languages such as Object-oriented, graphical, and continuous/discrete time), (b) different modelling formalisms (such as using rule-based modelling, differential equations, neural networks, and Boolean networks), (c) post-composition adjustments (manual edits in mathematical equations, rules, or parameters to make the composed models biologically sensible [8]), and (d) physically implausible resultant models. Each integrating system or platform addresses some of these challenges. While existing model integration platforms, such as the SBML Hierarchical package [9] and PySB [10] can resolve issues with code and modelling formalism compatibility, they are limited to the biochemical domain. The resultant model often needs further adjustments to be executable and yet it might not follow the laws of thermodynamics and physics (such as energy, mass, and charge conservation) [11, 12]. This is referred to as physical feasibility. Several formulations and frameworks have been developed to ensure biochemical models follow the laws of thermodynamics [1315] but most of them are purely mathematical and are often difficult to implement for model composition due to non-standardised rate laws, and lacking an easy append/delete graphical structure. Furthermore, most of the model composition tools are not applicable to multi-physics systems and cannot be generalised to more complex biological systems. One solution to these issues is combining a hierarchical modelling approach (to help with the post-composition adjustments) with an energetic and multi-physics framework that explicitly models energy (expressing kinetic rate laws of biochemical reactions in terms of chemical energy level differences [15]) and ensures adherence to the laws of physics and is executable in multi-physics modelling (such as cardiomyocytes electromechanical coupling). The bond graph approach addresses these issues.

Bond graphs provide a domain-independent hierarchical framework that generates models based on the laws of physics and thermodynamics. Initially introduced by Paynter [16], bond graphs were primarily intended for engineering applications. The application of bond graphs was extended to the chemical domain by Oster et al. [17, 18] and subsequently by Cellier [19]. Gawthrop and Crampin have recently developed the bond graphs framework to model and analyse biochemical and electrochemical systems [12, 20]. Two physical co-variables form the energy-based foundation of bond graphs: effort (e) and flow (f). Power is the product of effort and flow (p = ef) and energy is the power over time: E = ∫p dt. Effort and flow are general terms that represent voltage and current in the electrical domain, force and velocity in the mechanical domain, chemical potential and molar flux in the chemical domain, respectively. Bond graphs represent systems as graphical representations which consist of a set of elements, i.e., components and junctions. Components represent physical entities (such as ions, complexes, genes, atoms in microscopic level and resistors, capacitors, dampers, and mass in macroscopic level) and are defined as general configurations of electrical, mechanical, or chemical elements. For instance, C components in bond graphs are charge storage components i.e. capacitors in electrical circuits, springs in mechanical systems, or chemical species in chemical reactions. Energy is conserved and travels between components bidirectionally through bonds (shown by harpoons). A common effort between the components is shown by a ‘0’ junction where the sum of flows equals the outward flow, and a ‘1’ junction shows a common flow where the sum of efforts equals the inward effort. Junctions are analogous to the Kirchhoff’s laws in electrical circuits, equilibrium of forces in mechanical systems, and stoichiometry balance in chemical reactions [21]. These conservation laws in different domains follow the same mathematical principles, hence they can be represented by generalised equations in bond graphs. Readers can find a more detailed description of the bond graph theory in [2224]. The extension of bond graphs to biochemical domain and the constitutive equation for each component will be explained in more details in Section 2.1.

In the context of biochemistry, modellers widely use traditional kinetic models. However, in general, kinetic models are not thermodynamically consistent (i.e. energy conserving) unless the parameters satisfy certain detailed balance constraints. Specifically, detailed balance constraints are required to ensure that biochemical loops have zero flux (i.e. dissipate no energy) at equilibrium. These detailed balance constraints become increasingly difficult to derive as biochemical networks become larger. Because bond graph models assign a chemical potential to each species, they automatically adhere to detailed balance constraints. Hence, parameters can be modified without violating thermodynamic consistency [25]. This ensures that model composition respects the constraints on thermodynamics for biochemical systems.

Semantic annotation is labelling the mathematical content of models or data with standard machine-readable descriptions [26]. These are crucial for the reusability and interoperability of models. However, biological and biochemical complexities can give rise to inconsistencies in semantic annotations. Many species and chemical compounds are not simply defined by a single semantic term. Subtle variations of names for species have long been an obstacle for semantics-based merging tools to integrate models based on identifying similarly annotated species. In this paper, we have used identical semantics for same species and leave it to the scientific community to develop a harmonising system for annotating biomodels.

An automated model composition approach significantly assists researchers in creating large-scale models from existing modules [27]. Shahidi et al. [28] introduced a general hierarchical model composition method by encoding bond graph modules in CellML and constructing a complex model using the SemGen merger tool [29]. The SemGen merger tool uses the biological semantics of the components in models to identify and interpret them unambiguously. Although this method facilitated the integration of annotated bond graph models, bottlenecks may arise when a modification in the CellML bond graph modules is needed (modellers must know the bond graph conservation laws). Moreover, it required adding auxiliary variables as ports to each module and connecting them manually using the semi-automated SemGen merger tool. While annotations are readily incorporated into bond graphs, using annotations in model composition has not been conducted in this context.

Here, as an extension to our previous work, we have incorporated annotations to bond graphs in a new platform. This platform allows us to automatically construct a composed model from annotated CellML files treated as modules. Because the CellML files do not contain the bond graph structure, a separate bond graph library in Python (BondGraphTools [25]) automatically deals with any required changes in the conservation laws. The annotated parameters from the CellML models are then extracted and their values assigned to their equivalent bond graph parameters. Since the equations of a bond graph can be automatically generated from their network structure, we only need to parameterise them using the parameter values from the CellML files. Thereafter, any common biochemical, biological, or physical entities among the modules are identified and merged to render a composed model. We demonstrate this by an example where a bond graph model is constructed from its constitutive modules, i.e., the Epidermal Growth Factor Receptor (EGFR) signalling pathway, Ras activation, and the Mitogen-Activated Protein Kinase (MAPK) cascade are merged to construct a model of the entire EGFR-Ras-MAPK signalling pathway. This type of model integration provides a reliable and consistent framework that is consistent with energy conservation; secondly ensures that reactions can only operate in the direction of decreasing chemical potential; and third, automates the model composition and merging using the modules’ rich semantics.

In this paper, we review the use of bond graphs in modelling biochemical reactions (Section 2.1) and describe how the bond graph modules of the EGFR-Ras-MAPK signalling pathway are constructed based on the existing work by Kholodenko et al. [30] on the EGFR model, Brightman & Fell [31] on the Ras model, and Pan et al. [32] on the MAPK model (Section 2.2). We introduce our automated model composition approach in Section 2.3. Its prerequisites along with the generic method description are reviewed in Sections 2.3.1 & 2.3.2. As an example of our method application, we utilise it to create the EGFR-Ras-MAPK signalling pathway in Section 2.4. Next, we describe how we verified the simulation results of our composed model in Section 2.5. In Section 3 we demonstrate the simulation results both for the constitutive modules and our composed bond graph model, and in Section 4 we discuss and analyse the behaviour of bond graph modules and then verify the behaviour of our composed model. Possible improvements and shortcomings are also discussed in this section. The main features of our method and the future developments are summarised in Section 5.

2 Materials and methods

In this section, we give a brief introduction to bond graph modelling of biochemical networks consisting of multiple reactions. Later, we demonstrate how a mathematical model of a biochemical network can be converted into bond graphs. We utilised this approach to create a bond graph model of the EGFR pathway and the Ras activation pathway. Next, we discuss our energy-conserving semantics-based model composition approach: the prerequisites and the approach. This is a generic model composition approach since the idea is domain-independent and could in principle be applied to models in different physical domains (e.g. electrical or mechanical). We will show that by having the bond graph model of any physical or chemical system, our method can merge the identically annotated entities within the models by automatically rewiring the connections between components and modules. The whole composed model in bond graphs will then be ready for simulation or connection to other modules. Bond graphs allow more than two levels of hierarchy which supports model integration. We demonstrate this by applying our method to automatically compose and generate a EGFR-Ras-MAPK signalling model.

2.1 Bond graph modelling of biochemical reactions

This section delineates how biochemical reactions are represented and composed in bond graphs.

To facilitate reusability, biochemical models must obey the laws of physics and thermodynamics [33]. In the context of biochemistry, this means that the storage of chemical energy within chemical species must be characterised by physical laws (here, nonlinear capacitive constitutive relation), and reactions are bond to advance in the direction of decreasing chemical potential. In conventional modelling approaches, modellers often ignore the energy transfer; thus, the reactions may proceed against chemical potential gradients and lead to physically implausible models [12]. Since bond graphs are based on energy conservation and thermodynamic laws, fluxes are always in the direction of decreasing potential. Biochemical bond graph models contain components for the species (Ce), stoichiometry (TF: N), and reaction (Re). To highlight the notion of the bond graph junctions for sharing a common molar flux or a common chemical potential, we indicate them by ‘1: v’ and ‘0: u’, respectively. Here,

  • The chemical potential is u (J mol−1), stored within the biochemical species, and the molar flux is v (mol s−1), driven by the reactions;

  • The biochemical species are defined using the component Ce, given by the constitutive relation u = RT ln(Kq q) (Boltzmann’s formula), where R (J mol−1 K−1) is the ideal gas constant, T (K) is the absolute temperature, q (mol m−3) is the molar concentration of the species, and Kq (mol−1) is the species thermodynamic constant [34]. Kq is related to the kinetic free energy of species to participate in reactions and is defined as Kq=1VcqrefeuqrefRT where Vc is the volume of the compartment, qref is the reference concentration (normally 1 mol), and uqref is the standard free energy formation of the species [25];

  • Following the definition in [35], species with fixed concentrations are called chemostats (CS) in bond graph terminology. Such species have a constant chemical potential [20];

  • In bond graphs, a reaction represents a dissipative process where chemical energy is lost in the form of heat [36]. In the case of reversible mass action kinetics, a reaction is defined in bond graphs by an Re component with the constitutive relation v = κ(eur/(RT)eup/(RT)) (Marcelin–de Donder equation), where κ is the reaction rate constant and ur and up are total chemical potentials of the reactants and products, respectively;

  • Stoichiometries are represented by transformer TF: N, in which the transformer ratio (N) corresponds to stoichiometry.

For further discussion of bond graph modelling of biomolecular and chemical systems, the reader is referred to the works by Gawthrop & Crampin [32, 37].

As an example, a reaction with two reactants and two products is demonstrated in Fig 1 along with its equivalent bond graph representation.

Fig 1. A chemical reaction and its bond graph equivalent.

Fig 1

A chemical reaction with two reactants and two products. (A) Schematic of a chemical reaction where κ is the reaction rate constant, A & B are the reactants, C & D are the products, and α & β are stoichiometries; (B) Bond graph equivalent of the reaction where Ce components correspond to the species, Re corresponds to the reaction, and TF components represent the stoichiometries. Since the consumption/production rate of all the contributing species in a reaction is equal to the reaction flow rate, they share a common flow with the Re component through a ‘1: v’ junction.

As shown in Fig 1B, the species complexes (A & B as reactants and C & D as products) at either side of the reaction are connected to the Re component through ‘1: v’ junctions because the pairs share common flows. The corresponding ‘0: u’ junction for a species can be directly connected to an Re component if it is the only reactant or product of that reaction. Here, the reaction flow rate for the Re component in Fig 1B is given by:

v=κ(e(αuA+uB)/RT-e(βuC+uD)/RT) (1)

or if we substitute the chemical potentials with the Boltzmann’s formula:

v=κ(KAαqAα.KBqB-KCβqCβ.KDqD) (2)

which can be generally described by mass action kinetics:

v=κ[i(Kriqri)αi-j(Kpjqpj)βj] (3)

where Kri and Kpj are the thermodynamic constants, qri and qpj are the concentrations, and αi and βj are the stoichiometries of reactants and products, respectively. Reversible Michaelis-Menten kinetics can also be represented using bond graphs [20]. However, because the default Re components in BondGraphTools follow the mass actions kinetics, we have chosen to approximate Michaelis-Menten kinetics using elementary mass action reactions (see [12]).

Fig 2 illustrates an example of composing together two reactions in bond graphs. Our framework recognises that the Ce: C component is the same in both reactions and merges them. When two components from two modules are merged, the conservation equations at their corresponding ‘0: u’ junction changes. C Section in S1 Text details the conservation laws and constitutive equations in each reaction separately as well as in the case where both reactions are combined to create the composition.

Fig 2. An example of composing two reactions in bond graphs.

Fig 2

Reactions 1 and 2 represent two separate reactions in which the species C is common. To compose the reactions, the common species (C) is merged and the conservation equation at its corresponding ‘0: u’ junction alters to account for the imposed changes in structure. The conservation equation at the ‘0: u’ junction connected to the species C is vc = v1 in Reaction 1 and in Reaction 2 is vC = −v2 and in the composed reaction it changes to vC = v1v2.

In the next section we illustrate how the bond graph approach toward biochemical reactions is utilised to create models of three exemplar biochemical pathways.

2.2 Modules for EGFR-Ras-MAPK signalling: Bond graph models of the pathways

In this section, we describe the required bond graph modules to compose the EGFR-Ras-MAPK model. Here, we summarise the applied methods to convert the existing mathematical models of the modules into bond graphs.

The EGFR-Ras-MAPK is a signalling pathway that transduces signals from the extracellular environment to the cell nucleus [38]. It participates in multiple biological functions in mammalian cells, including growth and differentiation, cell migration, and wound healing [30, 31, 38] and consists of three major parts: the EGFR pathway, the Ras activation pathway, and MAPK cascade. The EGFR pathway forms a complex (RShGS) which mediates the Ras activation through an intermediate pathway. Ras protein activation signals stimulate the downstream Ras-MAPK cascade [30]. As such, we consider the RShGS complex as a mutual species in the EGFR pathway and the Ras activation pathway. Also, we take Ras protein to be the mutual species in the Ras activation pathway and the MAPK cascade. The bond graph models of the EGFR pathway and the Ras activation pathway are created based on the models by Kholodenko et al. [30] (CellML model available from: Kholodenko 1999) and Brightman & Fell [31] (CellML model available from: Brightman 2000). The bond graph model of the MAPK cascade is adopted from the work by Pan et al. [32]. Here, we detail how the bond graph modules of these systems were constructed.

2.2.1 The EGFR pathway module

This section describes the creation of the EGFR bond graph module in terms of equations and structure. We inferred the bond graph specific parameters from the kinetic parameters of each model and a set of simulation data. We derived the bond graph structure of the models from their original developed network. We also added some missing biological entities (where applicable).

Fig 3 shows the schematic of the EGFR pathway model developed by Kholodenko et al. [30]. The signal transmission starts with the Epidermal Growth Factor (EGF) binding to the Epidermal Growth Factor receptor (EGFR). It continues via several subpathways that target the SOS (Son of Sevenless) protein. The formation of SOS complex (RShGS) activates the Ras protein through the Ras activation pathway, which initialises phosphorylation in the MAPK cascade.

Fig 3. Kinetic structure of the EGFR pathway.

Fig 3

The ATP hydrolysis species are shown in green (involved in phosphorylation and dephosphorylation reactions). The RShGS complex in yellow is mutual between the EGFR pathway and the Ras activation models. The reactions are numbered as the equations in the CellML source code. Steps 4, 8, 16 in orange represent the irreversible reactions. The network was adapted from [30].

Fig 4 illustrates the bond graph equivalent network of the EGFR pathway.

Fig 4. Bond graph representation of the EGFR pathway model.

Fig 4

Re components are numbered according to the steps in [30]. Each Ce or CS component is connected to a ‘0: u’ junction. Where a species participates in more than one reaction, new bonds are applied to its corresponding ‘0: u’ junction to share a common chemical potential (See R-PL where it is produced in reaction 5 and consumed in reaction 6). The chemostats in orange boxes are added to the reconstructed bond graph version.

The reactions in the EGFR model by Kholodenko et al. are either reversible or irreversible. The reversible reactions are described using the kinetic scheme as:

v=k+iqri-k-jqpj (4)

where k+ and k are the forward and reverse kinetic rate constants and ∏i qr and ∏j qp are the concentrations of reactants and products, respectively. The kinetic model and parameters are given in [30] (Table I & Table II). The irreversible reactions (steps 4, 8, 16) are described using the irreversible Michaelis–Menten kinetics as:

v=VmaxqrKm+qr (5)

where Vmax (mol/s) is the maximum reaction rate achieved by the system, Km (mol) is the Michaelis constant referring to the reactant concentration at half of the Vmax, and qr (mol) is the reactant concentration. Irreversible reactions are thermodynamically impossible in a bond graph model; we deal with this issue later.

The required energy for the reactions is supplied by Adenosine triphosphate (ATP) hydrolysis, producing Adenosine diphosphate (ADP) during phosphorylation and phosphate (Pi) during dephosphorylation. The reversible phosphorylation reactions (steps 3, 6, 14) follow the kinetic formulation (Eq 4) and the irreversible dephosphorylation reactions (steps 4, 8, 16) follow the Michaelis–Menten kinetics (Eq 5). Kholodenko et al. have not explicitly included ATP, ADP, and Pi in their model, which contravenes mass and energy conservation. Therefore, we considered these species in our bond graph model. ATP, ADP, and Pi are assumed to be chemostats.

To convert the kinetic parameters (k+ and k in Eq 4) to those required by bond graphs (κ, Kr, and Kp in Eq 3), we first removed the thermodynamically infeasible irreversible reactions from the network (for their different parameter definitions). Then, we applied the optimisation method described in [33] to the remaining reversible reactions. In brief, by taking logarithms on the constraints of each reaction (k+=κiKri and k-=κjKpj), the relationship between the kinetic and bond graph parameters can be expressed as a linear matrix. The reader is referred to an example on the generation of the linear matrix of thermodynamic constants in Appendix B for [33]. Due to accounting for the ATP, ADP, and Pi components in our bond graph model, we included them in the constraints of their corresponding reactions.

As our selected pathways are in the cytosolic compartment, they use the same sources of potential (ATP, ADP, and Pi); thus, we used the same chemical potentials for these chemostats as we used in the MAPK cascade module. We obtained the thermodynamic constants in the previous phase (except the ones in the irreversible reactions), in which we converted the kinetic parameters into bond graph parameters. We approximated the irreversible reactions with kinetic quantities (Eq 4) which led to a negligible reverse molar flux. We obtained the time-dependent behaviour of the contributing species in steps 4, 8, and 16 from the reference CellML model for the EGFR pathway and applied curve fitting to estimate the reaction rate constants for the irreversible steps (κ4, κ8, & κ16). As an example, this procedure is shown in B Section in S1 Text for step 4. As we will discuss further in Section 3, an exact fit was not possible because of two reasons: first there are irreversible reactions in the original model, and second, some of the reversible reactions in the original model do not satisfy detailed balance. Since the bond graph parameters are inferred from the original model, an approximation with the least square error is made to generate the closest fit to the data while adhering to detailed balance constraints. The reaction equations along with their participating species are given in S1S3 Tables compare the parameter amounts of the Kholodenko et al. model with the ones from our reconstructed bond graph model. The code to convert the kinetic parameters into bond graph equivalents for the EGFR pathway is accessible from: https://github.com/Niloofar-Sh/EGFR_MAPK/tree/main/EGF.

2.2.2 The Ras activation module

This section describes the reactions, species, and structure of the Ras activation module. We modified an existing CellML model to achieve our desired Ras activation model.

In the EGFR pathway model by Kholodenko et al., RShGS and RGS are considered to trigger the activation of Ras protein through intermediate molecules that transmit the signal to the downstream MAPK cascade and yield Ras. Since none of our initial reference models (the EGFR pathway and the MAPK cascade) included these molecules, we incorporated an intermediate module to account for the missing steps. We created this module by modifying an EGFR-Ras-MAPK pathway model developed by Brightman & Fell [31]. We kept the steps starting from RShGS (to be merged with RShGS in the EGFR pathway model) to Ras (to be merged with Ras is the MAPK cascade model) and removed all other steps. Since RGS was not included in the activation of Ras in the Brightman & Fell model, we did not include it in the Ras activation module. It is worth mentioning that RShGS has a prominent role in localising Ras compared to RGS [39, 40]. Fig 5 represents the kinetic and bond graph structure of the Ras activation module which links the EGFR and MAPK modules.

Fig 5. The structure of the Ras activation module.

Fig 5

(A) Kinetic representation. The RShGS complex in yellow is mutual between the EGFR pathway and the Ras activation modules and Ras protein in blue is mutual between the Ras activation module and the MAPK cascade module. Steps 2, 4 in orange represent the irreversible reactions. The network was adapted from [31]; (B) The bond graph representation of the Ras activation module.

We converted the kinetic parameters of the reactions into bond graph parameters using the same applied techniques in Section 2.2.1. For the irreversible reactions (steps 2 and 4), we assumed a very small value for the reverse kinetic constants (k) to limit the reverse flow to a negligible amount. The reaction equations along with their participating species are given in S4S6 Tables compare the parameter amounts of the Brightman & Fell model with the ones from our reconstructed bond graph model. The code to convert the kinetic parameters into bond graph equivalents for the Ras activation module is accessible from: https://github.com/Niloofar-Sh/EGFR_MAPK/tree/main/Ras.

2.2.3 The MAPK cascade module

Here, we discuss the sub-modules of the MAPK cascade model and how a single symbolic bond graph module can characterise the whole cascade.

Fig 6 shows the schematic of the MAPK cascade. Each oval trajectory in Fig 6A represents a cycle. The stimulus signal is amplified sequentially through the cycles in the cascade. MKKK is activated through a single phosphorylation phase by a kinase (Ras) and turns into MKKKP [41]. MKK and MK each phosphorylates in two steps and ultimately produce MKKPP and MKPP. The phosphorylated product of each layer plays a kinase role for the phosphorylation phase in the next downstream layer. Simultaneously, an opposing phosphatase dephosphorylates the product of each cycle (shown by backward arrows) [42]. Each layer in Fig 6A is dephosphorylated by a specific phosphatase: MKKK-Pase in the first layer, MKK-Pase in the second layer, and MK-Pase in the third layer. The dual phosphorylation-dephosphorylation mechanisms in the second and third layers act as amplification, generating ultrasensitive responses.

Fig 6. The structure of the MAPK cascade.

Fig 6

(A) Kinetic representation. The stimulus from the extracellular environment is received (Ras) and transmitted through the MAPK cascade to the cell nucleus. The layers demonstrate the cycles with the same kinase and phosphatase enzymes; (B) MAPK cascade with five modules. Linking species are shown in colours where green corresponds to the linking enzymes and pink corresponds to unphosphorylated/phosphorylated mitogen proteins. Arrows show the links between the modules; (C) The symbolic bond graph model of each cycle. Sources of potential with fixed concentrations (Cs:ATP, Cs:ADP, and Cs:Pi) are shown in orange. (Same sources of potential within the modules are omitted in (A) and (B) for clarity).

Although the species in each cycle are different, the structures of the cycles are the same. The similarity in the structures enables us to break the cascade down into five modules of cycles. Hence, we created a symbolic bond graph module for a single cycle and reused this template for the other four cycles. Fig 6B shows the modular representation of the cascade by reusing the bond graph module in Fig 6C. We used a semantics-based ‘white box’ approach rather than the ‘black box’ approach in Pan et al.’s work. We also annotated each cycle separately to automate the model composition. We modelled the MAPK cascade in the absence of feedback. The code for the modular bond graph model of the MAPK cascade in BondGraphTools is accessible from: https://github.com/Niloofar-Sh/EGFR_MAPK/tree/main/MAPK%20cascade.

In the literature, several models of the EGFR-Ras-MAPK signalling pathway have been developed considering the involvement of MKKP as an enzyme in phosphorylation of MK and MKP [31] as well as negative feedback from MKPP to the upstream EGFR pathway [4145]. This paper aims to demonstrate the reusability and composition of bond graph modules; thus, further involvements and feedback loops are not considered in our composition procedure. However, to verify the behaviour of the final composed model, we studied the response of our bond graph EGFR-Ras-MAPK model under the condition of adding negative feedback from MKPP to incorporate as an inactivating enzyme in the first cycle as studied by Kholodenko et al. [43].

The following section describes the pipeline of our automated model composition framework, which will later be applied to integrate the EGFR pathway, the Ras activation, and the MAPK cascade modules.

2.3 Automated model composition pipeline

In this section, we explain our automated model composition pipeline which is mainly based on the application of semantics and bond graphs. We mention the prerequisites to apply our framework in model composition and the structure of our developed framework.

We aim to minimise manual input through automation in model composition while using energy-based modules in an open-source environment. In this endeavour, we have provided some exemplar predefined bond graph models in which the parameters do not have any values. We call these predefined bond graph models as symbolic modules. Symbolic modules allow us to determine the parameters’ values later where the annotated parameters from the CellML models would accordingly link. The suitable bond graph template is then automatically selected from a list of symbolic bond graph templates by identifying specific annotated parameters (for example the species specific constants) in the CellML models. Here, we have stored symbolic bond graph templates for the EGFR pathway, Ras activation, and the MAPK cascade cycles (Sections 2.2.1, 2.2.2 & 2.2.3). In the current work, when a symbolic bond graph template is parameterised we call it a bond graph module. Merging points are automatically recognised and merged, resulting in a physically consistent model. Due to the hierarchical feature of bond graphs, the needed adjustments during the composition will be systematic, leading to the automation of modifications (adding/deleting bonds between the components).

To apply our method to CellML models, some preparations are required in advance, i.e., installing the bond graph Python library as well as downloading the required ontologies (required in the current framework but optional in the general approach). Once prepared, the user can commence the model composition in a Python environment like Jupyter Notebook.

2.3.1 The prerequisites

Below, we introduce the computational prerequisites in our framework.

  • BondGraphTools

    The task of automated model composition requires bond graph software that readily supports automation. For this purpose, we have selected BondGraphTools—an open-source Python library for bond graph modelling—created and developed by Cudmore et al. [25], accessible from https://github.com/BondGraphTools/BondGraphTools. BondGraphTools supports modularisation and automation in model building.

  • Ontologies (optional)

    An ontology is a semantic resource of standard notions and vocabularies of species, structures, and observations in terms of Resource Description Framework (RDF) triples (https://www.w3.org/RDF/). RDF is a standard mechanism to describe and interchange data on the Web and an RDF triple is a subject–predicate–object statement that describes the properties of an entity, often using ontologies [46, 47]. For example, the RDF [OPB00340—CHEBI29103—FMA70022] reads [concentration-potassium-extracellular space] which specifically describes the physical property and location of an entity. Ontologies are useful tools to add meaning to different parts of models to avoid any ambiguous interpretations [48]. Depending on the area of biomedical science in which the researchers annotate their models, one or various reference ontologies might be used. For the scope of this publication, we used the csv files for the Ontology of Physics for Biology (OPB) [49] and the Gene Ontology (GO) [50], downloaded from the following links:

The OPB is a reference ontology for physical principles such as chemical concentration, electrical capacitance, temperature, and fluid volume. The GO provides descriptions for molecular biology such as gene products, biological sequences, and molecular activities. Due to the limited size of uploaded files on GitHub, the required reference ontologies for the current model composition (OPB and GO) are not provided on our GitHub repository. We stored the ontologies locally to interpret the RDFs and use the interpretations where the user needs to make a decision based on the annotations but the approach can be reduced to a framework in which the annotations are only read and compared in RDF format and the interpretations are not given to the user.

2.3.2 The generic approach

In this section, we describe the steps in our model composition framework and the workflow towards it.

Automatic composition of bond graph modules by having bond graph templates and annotated reference models can be performed in any domain. To reuse and compose models deposited on online repositories such as PMR, we need a tool to first convert a non-bond-graph model into an equivalent bond graph one; second, automatically assign the parameters in the models to their equivalent bond graph components; third, identify the same entities in the models as the merging points and make the necessary changes to join the models without any loss of information.

To improve our model composition method toward automation and reuse the models in various formats, we built symbolic bond graph templates and connectivity matrices for some exemplar systems (the EGFR pathway, Ras activation, and the MAPK cascade). A connectivity matrix is a binary square matrix that defines connections between the elements of a system. Here, the number of rows and columns each equals the number of bond graph elements of a system [51]. Connectivity matrices are symmetric for undirected (bidirectional) networks and asymmetric for directed networks. Instead of using the embedded syntax in BondGraphTools to append/delete the bond graph elements, we used connectivity matrices. While not essential to our methodology, we chose this approach because binary representation of models clearly shows the connections and gives the minimal required details to define a network which can be exported to other tools and software for further analysis [51]. To modify a network, one can insert 0 or 1 in the matrix or delete its corresponding row and column. An example in A Section in S1 Text shows how connectivity matrix is defined for a simple network.

To identify the merging points between the modules we used a ‘white box’ approach. In this approach all or a group of the bond graph components in the modules are mergeable. In a ‘black box’ composition approach in contrast, only the components predefined as inputs or outputs are accessible [28, 52]. In coupling biological models, all entities are mergeable, hence, we found the ‘white box’ configuration more compatible with our model composition method. To do this, we need the parameters of the models to be annotated. To summarise, we started our automated model composition method by preparing the bond graph symbolic templates of the models, the connectivity matrix of each bond graph template (optional), and ontologies required for interpreting the annotations (optional).

In a given biological/physiological/physical context, our framework can detect the type of bond graph template that matches the annotated model. This is done by searching for specific groups of biological entities/processes within the annotated CellML files. If a certain group of entities is found in a file, then our framework will link it to its corresponding bond graph symbolic template. Thereafter, a function in our framework finds identical annotations in the models and selects the merging points. Based on this, the required changes in the bond graph components (deleting the duplicates) and the connectivity matrices (deleting or inserting rows and columns) will be made automatically. Ultimately, our framework produces the final model based on the connection/non-connection relationships between all the components (Fig 7).

Fig 7. The generic flowchart of our automated model composition approach.

Fig 7

The Stored files section shows the saved files for the current model composition framework. Ontologies and connectivity matrices in the blue dashed box are optional in the generic approach but were used in the current framework. The Input section shows two arbitrary CellML models to be merged using our framework but can be extended to any number of models. The main steps of the framework are denoted by numbers (1–8).

We have deposited the required ontologies, the bond graph symbolic template models, and their connectivity matrices in our repository. Ontologies and connectivity matrices are required for the current implementation but are removable based on the application and by slight modifications in implementation. Any number of CellML models containing the annotated parameters of a system can be used in our framework as inputs (here, we illustrate two models). Fig 7 depicts the eight main steps in our semantics-based model composition framework as follows:

  1. A function in our framework extracts the annotations and values of the CellML models. If exact matches of annotations are not detected between the models, a warning is given. The user should check the models to see if they are appropriate for composition. If there are matching annotations, two pathways are made available: composition process and value allocation.

  2. In this step, a function checks the mergeability of the identically annotated entities. If they are not mergeable, the function ignores the entities [53]. Otherwise, it passes them to the next step. For example, biochemical species are considered mergeable since they can simultaneously participate in multiple reactions but a parameter like temperature cannot be merged as it cannot become a port for external connections. Based on the deleted duplicate components, the connectivity matrices are combined, allowing the models to be merged (details in Section 2.4).

  3. In this step, only one entity is kept from each group of identically annotated mergeable entities and the rest is deleted.

  4. In this step, our framework links the modules at each merging point to integrate them. A link is a bond in bond graph terminology and can be added to the system by inserting a 1 in the connectivity matrix (used in the current approach) or adding a syntax to incorporate a new bond between the modules.

  5. Step 5 identifies inconsistencies in the values of identically annotated entities. These values include the initial conditions and the entities’ thermodynamic constants (as described in Section 2.1).

  6. This step prompts the user to choose a value for the identically annotated entities found in step 5. For instance, if a chemical species is present in more than one model (identically annotated in all the models) and has different initial concentrations, the user is asked to select one of the values or insert a new one for that specific chemical species.

  7. This step parameterises the bond graph symbolic templates with the values for each annotated entity.

  8. Step 8 gathers the information coming from the composition process and value allocation to generate a bond graph composed model in the form of a system of Ordinary Differential Equations (ODEs).

In the next section, we illustrate the workflow by applying it to an example biochemical network: the EGFR-Ras-MAPK signalling pathway.

2.4 Applying the composition method to the EGFR-Ras-MAPK signalling pathway

Here, we implement our model composition method (described in Section 2.3.2) to generate a model of the EGFR-Ras-MAPK signalling pathway. The EGFR-Ras-MAPK signalling pathway is comprised of three modules: the EGFR pathway, the Ras activation pathway, and the MAPK cascade. Since the MAPK cascade includes five structurally repetitive cycles, we broke it down into five sub-modules. As such, we need three template bond graph modules; one for the EGFR pathway (Fig 4), one for the Ras activation pathway (Fig 5B), and one for the cycles in the MAPK cascade (Fig 6C). The connectivity matrix for each module in csv format and the annotated CellML files for the parameters of each module/sub-module are available on GitHub: https://github.com/Niloofar-Sh/EGFR_MAPK.

In the composition process, if the identically annotated entities are mergeable, only one bond graph component is kept and our framework removes the rest from the modules (the list of components for each module will be updated). This process works for any number of components to be merged among the models. Furthermore, the rows and columns of the connectivity matrices that correspond to the removed components will be deleted and ultimately, a connectivity matrix describing all the connections between the components of the composed network is needed. Our framework integrates the modified connectivity matrices of all the modules into one by putting the connectivity matrices consecutively in the diagonal direction of a zero square matrix. Thus, the number of rows/columns equals the total number of components in the system. Subsequently, where we need a bond between two modules, our framework inserts an additional 1 in the matrix. Fig 8 demonstrates this with an example.

Fig 8. Construction of the whole-system connectivity matrix for a composed model.

Fig 8

The procedure is illustrated by integrating two connectivity matrices (1st and 2nd cycles in the MAPK cascade). Initially, the two cycles had identical connectivity matrices. (A) MKKKP is a common component between the first and second cycle; (B) The connectivity matrix for the 1st cycle; (C) The modified connectivity matrix for the 2nd cycle where the row and column for the common component (MKKKP) will be removed; (D) The placement of the connectivity matrices for each module on the diagonal of the whole-system connectivity matrix. The pink and green boxes indicate the connectivity matrices for the 1st and 2nd cycles, respectively. The corresponding ‘0:u’ junctions for MKKKP in the two cycles are connected by inserting two 1s (in red) to represent a bond between them (bidirectional connections between the components require the matrix be symmetric).

In the next section, we describe the verification methods taken to evaluate the behaviour of the three bond graph modules and our bond graph composed model of the EGFR-Ras-MAPK pathway.

2.5 Verification

To verify the behaviour of the composed EGFR-Ras-MAPK bond graph model, we first compare our bond graph estimation of the EGFR and Ras activation modules to the original models. Bond graph models of biochemical systems are deterministic and generate a set of ODEs which can be solved by any standard ODE solver package. In this paper, the models were simulated using the SUNDIALS package [54]. In future work, there is scope to expand the energy-based approach to model of stochastic systems, using algorithms such as the Gillespie algorithm for simulation.

To compare the simulation results between the models, the normalised root mean square error (NRMSE) was computed as in Eq 6, where xi^ corresponds to the simulation points of our bond graph estimation, and xi corresponds to the Kholodenko et al. model. The normalisation was performed relative to the difference of maximum and minimum data of the reference model in each simulation.

NRMSE=i=1n(xi^-xi)2nxmax-xmin (6)

Second, we studied the steady-state behaviour of the phosphorylated kinases at the terminal level of each layer of the MAPK cascade under varying stimulus strengths. This denoted how we should expect the kinases to respond to any stimulus coming from the upstream levels (here, the Ras protein).

To further study our composed model, we observed its behaviour under three more conditions: a) We added negative feedback from the terminal phosphorylated kinase in the last layer of the cascade (MKPP) to the dephosphorylation reaction in the initial layer [43] (Fig 9). The effect of adding the negative feedback was then observed and qualitatively verified. b) We simulated the model for different intracellular ATP concentrations and monitored how the concentration of activated kinases was correlated to this change. c) We investigated the behaviour of the composed bond graph model for varying initial concentrations of EGF (the initiating species in the EGFR pathway).

Fig 9. Bond graph schematic of adding negative feedback in the composed EGFR-Ras-MAPK bond graph model.

Fig 9

The negative feedback loop (red bonds) initiates from MKPP and has an enzymatic role in the first layer’s dephosphorylation reaction.

In the following section, we illustrate and explain the results of our verification measures on the composed EGFR-Ras-MAPK bond graph model.

3 Results

We used our method to merge the modules within the MAPK cascade and between the pairs {EGFR pathway, Ras activation} and {Ras activation, MAPK}. This yielded the bond graph configuration of the EGFR-Ras-MAPK signalling pathway. Fig 10 shows how the EGFR, Ras activation, and MAPK modules are manipulated to deal with same components existing within the modules. Here, RShGS and Ras in the Ras activation module are removed while RShGS in the EGFR module and Ras in the MAPK module are kept. Also, all ATP-ADP-Pi trio components in the MAPK model are removed while they are kept in the EGFR module. All the reserved mutual components are bonded to the ‘0: u’ junctions corresponding to the removed components.

Fig 10. The composed modular bond graph model of EGFR-Ras-MAPK signalling pathway.

Fig 10

The blue dashed boxes represent the bond graph modules, the yellow boxes show the merged common components between the modules (each sharing a common potential by a ‘0: u’ junction), and the blue harpoons represent the bonds between the modules and common components. The inter-module bonds, along with the internal bonds between the components in each module, are defined and automatically applied to the model using the whole connectivity matrix. The EGFR and MAPK cycles also share common potentials with CS:ATP, CS:ADP, and CS:Pi.

To check the function of our composed model, we verified the simulations in two steps: 1. verification of each bond graph module separately (EGFR, Ras activation, and MAPK); and 2. verification of the bond graph composed model (the EGFR-Ras-MAPK signalling pathway).

3.1 Verification of bond graph modules

In this section, we verify the created bond graph modules (EGFR, Ras activation) against their original non-bond graph models. The functionality of the MAPK cascade bond graph module (consisting of 5 sub-modules) is also studied.

  • The EGFR signalling pathway: Our approach requires models to be expressed as bond graphs. A bond graph equivalent of the EGFR pathway was not available, which motivated us to convert an existing kinetic model of the EGFR pathway into an equivalent bond graph form. An exact conversion was not possible due to the existence of irreversible reactions and not explicitly accounting for mass conservation. Hence, we approximated the non-bond graph irreversible reactions with bond graph equivalents and included the missing metabolites ATP, ADP, and Pi to provide the energy required to approximate the irreversible reactions.

    The conversion of the kinetic EGFR model into bond graphs was performed by solving a linear matrix of equations for the constraints. The species’ responses in the EGFR bond graph module were observed and compared to the ones derived from the Kholodenko et al. model. The responses of four exemplar species in the pathway are demonstrated in Fig 11, and the NRMSE is computed for each comparison in percentage. We see that the bond graph equivalent of the EGFR module functioned similarly to the original kinetic model, although the equations could not be solved perfectly. This implies that the kinetic parameters of the Kholodenko et al. model are not thermodynamically consistent. The bond graph equivalent represents a close-match approximation of the original model in a thermodynamically consistent manner.

  • The Ras activation pathway:

    The Ras activation pathway included both reversible and irreversible reactions which were expressed in mass action kinetics. We estimated the bond graph parameters of the reactions by applying the parameter balancing technique in which we included an additional constraint (relatively small k) for each irreversible reaction to limit the reverse flux (Section 2.2.1). Fig 12 demonstrates the behaviour of four species in the reduced Brightman & Fell Ras activation model and its equivalent bond graph approximation. The bond graph equivalent could follow the same trend as in the CellML reduced model with negligible error (0.07%<NRMSE<0.2%). Concentrations have no dimensions in the original CellML model to balance the units [55].

  • The MAPK cascade: The bond graph model of the MAPK cascade was developed by Pan et al. [32]. We have reused the model here with slightly different configuration of the modules.

    The bond graph version of the MAPK cascade in BondGraphTools was simulated with an initial amount of Ras = 3 × 10−5 (μM). A minor increase in the concentration of the input kinase results in amplified sigmoidal responses of downstream kinases, referred to as ultrasensitivity (S1 Fig) [43]. Amplification in the layers of the MAPK cascade form the ultrasensitive responses, i.e., single phosphorylation-dephosphorylation in the first layer and dual phosphorylation-dephosphorylation in the second and third layers. At this point, we plotted the steady-state responses of the activated kinases against a range of input concentrations (10−8—100 (μM)) in Fig 13. Note how for inputs less than 7 × 10−5 (μM) MKPP reaches a higher concentration than MKKPP while MKKPP overtakes MKPP for higher input concentrations. S2 Fig shows the relative activation of the kinases indicating the lower the layer in the MAPK cascade, the smaller the input concentration activates the kinases [32]. Note that the MKPP (third layer) activation curve is steeper compared to MKKPP (second layer) and MKKKP (first layer), projecting that a higher increase in the stimulus is required for MKKKP to reach its maximum response compared to MKKPP and MKPP (Table 1). The analysis of the behaviour of the MAPK module assisted us to predict how the kinases will respond to the input kinase (Ras) coming from the upstream module (the EGFR pathway) and validate our composed bond graph model.

    Table 1 delineates the required stimulus increase for each activated kinase to reach from 10% of its ultimate concentration to 90%. This affirms the ultrasensitive responses to the input as we go to the lower layers of the cascade. To estimate the ultrasensitivity in sigmoidal input-output curves, the Hill coefficient (nH) is also calculated per activated kinase as per Eq 7, where EC90 and EC10 are the input values required to produce 90% and 10% of the maximal response, respectively [56]. The greater the Hill coefficient than 1, the smaller input value is required for the concentration transition from 10% to 90% of its maximum amount. The figures are consistent to the predicted Hill coefficients for MAPK cascade in work by Huang & Ferrell [57].
    nH=log(81)log(EC90/EC10) (7)

Fig 11. Comparison between the Kholodenko et al. EGFR model and its bond graph approximation.

Fig 11

The simulations are given for four exemplar species in the pathway. NRMSE is calculated for each comparison in percentage. The initial concentration of EGF (the initiative molecule in the EGFR module) was 680 nM.

Fig 12. Comparison between the reduced Brightman & Fell Ras activation model and its bond graph approximation.

Fig 12

The simulations are given for four species. NRMSE is calculated for each comparison in percentage. The initial amounts in this simulation were 0 except for: RasGDP = 19800, RasGTP = 200, GAP = 15000.

Fig 13. The steady-state responses of the activated kinases for different input amounts in the MAPK cascade model.

Fig 13

The input Ras concentration is expressed on a logarithmic scale and each curve is normalised to the maximum reached concentration of that species.

Table 1. Input differences in reaching from 10% to 90% of maximum concentration in kinases.

Kinase 10% maximum response* 90% maximum response* Input increase n H
MKKKP 0.00027 0.0024 80-fold 1.002
MKKPP 0.108 0.972 12-fold 1.768
MKPP 0.098 0.88 2.5-fold 4.795

* Concentration amounts are in μM.

3.2 Verification of the bond graph composed EGFR-Ras-MAPK model

We investigated the behaviour of our bond graph composed model (EGFR-Ras-MAPK pathway) under four conditions: without negative feedback, with negative feedback, different ATP concentrations, and different EGF concentrations to examine the functionality of our model under varying conditions. Each of these four conditions imply qualitatively predictable changes in the behaviour of the whole network which we aim to investigate in our composed model.

  • Without negative feedback:

    The simulated time courses of the three activated kinases (MKKKP, MKKPP, and MKPP) in the composed bond graph model of the EGFR-Ras-MAPK pathway are shown in Fig 14A. Fig 14B predicts the activated kinases at steady-state for various input concentrations. The concentration of the input kinase (Ras) at t = 100 (s) was 0.311 nM, which is indicated by the purple dashed line in Fig 14B. The intersection of this line with the MKKKP, MKKPP, and MKPP concentrations shows the expected steady-state concentrations of the aforementioned kinases.

  • With negative feedback:

    Negative feedback in MAPK cascade may lead to inhibited responses or oscillations depending on the stability points of the system [45]. The activated kinases respond differently when a negative feedback loop is added to the system. This feature was also explored in our composed bond graph model of the EGFR-Ras-MAPK pathway.

    Fig 15 compares the activation of kinases in the MAPK cascade model in two cases: without negative feedback (Fig 15A) and with negative feedback (Fig 15B). Under the effect of a negative feedback loop in the MAPK cascade the activation of the kinases decreases as we go further downstream. Fig 15B shows the expected functionality of the MAPK cascade module in the presence of negative feedback where MKPP is less activated than MKKPP and MKKKP. The peak in the activation of MKKKP in Fig 15B corresponds to an initial rise in MKKKP concentration from the upstream MKKK where it is immediately consumed by the downstream species to activate MKKPP and MKPP.

    Fig 16 shows the inhibited responses of the terminal kinases and subsequently, the significant delay in reaching their steady-states (compare with Fig 14A). The added Negative feedback in the EGFR-Ras-MAPK model strengthens the dephosphorylation reaction in the first layer of the MAPK cascade module which receives the Ras stimulus. This strengthened dephosphorylation inhibits its corresponding phosphorylation pair and affects phosphorylation in all the proceeding layers.

  • ATP concentration:

    ATP is one of the species involved in prompting wound responses that activates the MAPK pathways in cells [58]. As such, ATP shortage causes delays or failure in activating kinases, and as a result, dysfunction in wound healing responses. The production of ATP in cells might be blocked or reduced due to multiple reasons, such as mitochondrial disorders, ageing, or very intense exercises [5961].

    The impact of ATP concentration on the behaviour of the bond graph EGFR-Ras-MAPK model was investigated by clamping the ATP concentration at 10%, 30%, 50%, and 100% of its baseline level (Fig 17). Fig 17A–17C illustrate how different levels of cellular ATP (energy) influence the behaviour of activated kinases and also confirm that ATP shortage induces a delay in the responses. Fig 17D compares the steady-state concentration of MKKKP, MKKPP, and MKPP against various ATP concentrations relatively. The lower the ATP production, the lower the steady-state concentration of MKKKP, MKKPP, and MKPP, highlighting the importance of energy for the function of the pathway. The initial concentration of all other species was not changed. Here, the initial concentration of RShGS and Ras (common species between the modules) was 0.

  • EGF concentration:

    We examined the composed bond graph model of the EGFR-Ras-MAPK pathway to analyse and compare its functionality with other similar mathematical models. To do this, we investigated the effect of EGF concentration on MKPP. EGF initiates the EGFR pathway model and MKPP is the last terminal kinase of the MAPK cascade model. Fig 18 illustrates the behaviour of MKPP against various initial concentrations of EGF. Lower concentrations of EGF impose a delay in MKPP to reach its steady-state concentration which emphasises the role of EGF on the downstream species to the end of the MAPK cascade. Note that EGF = 0 nM does not terminate the functionality of the composed model considering that ATP hydrolysis and other intermediate species (such as RasGTP and RasGDP) fuel the subsequent steps and stimulate Ras. The time delay was also studied by Jurado et al. in [62], where lowering the EGF concentration triggered a delay in the MKPP response. Due to the different configuration of the constitutive models and the absence of EGF regulation by MKPP, the MKPP concentration in our composed model plateaus instead of descending as illustrated in [62].

Fig 14. Verification of the responses of activated kinases to Ras in the composed EGFR-Ras-MAPK bond graph model by comparing with the predicted steady-state responses in the MAPK cascade module.

Fig 14

(A) Ultrasensitivity in the composed EGFR-Ras-MAPK bond graph model. The steady-state concentrations of the kinases are: MKKKP = 1.37 nM, MKKPP = 1054.37 nM, MKPP = 987.96 nM; (B) Predicted steady-state concentration of the kinases. The purple dashed line shows the concentration of Ras at t = 100 (s) in the composed EGFR-Ras-MAPK bond graph model. The predicted steady-state concentrations of MKKKP, MKKPP, and MKPP at Ras = 0.311 nM match with the ones in the composed EGFR-Ras-MAPK bond graph model.

Fig 15. Activation of terminal kinases with and without negative feedback in the composed EGFR-Ras-MAPK bond graph model.

Fig 15

(A) Without negative feedback; (B) With negative feedback.

Fig 16. Time course behaviour of the terminal kinases in the composed EGFR-Ras-MAPK bond graph model with negative feedback.

Fig 16

Fig 17. Effect of different levels of ATP concentration on activated kinases in the composed EGFR-Ras-MAPK bond graph model.

Fig 17

(A) MKPP; (B) MKKPP; (C) MKKKP; (D) Steady-state concentration of MKKKP, MKKPP, and MKPP against relative ATP concentration. MKKKP concentration is also separately shown in a box due to its relatively small amounts compared to MKKPP and MKPP (initial concentration of common species: Ras = 0, RShGS = 0).

Fig 18. Effect of different levels of EGF concentration on MKPP in the composed EGFR-Ras-MAPK bond graph model.

Fig 18

The concentration of EGF was set to 0, 0.25%, 0.5%, 1%, 2%, 3%, 5%, 10%, and 100% of its initial concentration (680 nM). The behaviour of MKPP changes by altering the initial concentration of EGF.

4 Discussion

In this paper, we introduced a generic approach to assemble computational models in biology without starting from scratch. This was enabled by constructing symbolic bond graph modules of biophysical systems and obtaining the required parameters from existing models. To extract and allocate the parameters, the conventional target model needs to be fully and properly annotated. Thereafter, all the biochemical reactions (reversible or irreversible) in the reference models are converted into bond graph compatible ones (Section 2.2.1). The modules will then automatically combine when the common components (species) among them are merged. The resulting composed bond graph model complies with the laws of physics and can be coupled to other bond graph modules. As an example, we applied our method to the EGFR-Ras-MAPK pathway.

Our composed model of the EGFR-Ras-MAPK signalling pathway is different from the ones in the literature in three ways which prevents us from conducting a direct comparison:

  • Our reference models of the EGFR pathway, Ras activation, and the MAPK cascade are adopted from different sources which included/excluded some reactions or feedback effects;

  • Some reactions in the original EGFR and Ras activation models were irreversible and therefore thermodynamically infeasible;

  • Kholodenko et al. regarded RShGS and RGS in their EGFR model to impact the Ras activation module. However, Brightman & Fell did not account for RGS contribution in their model. Hence, we only merged RShGS in the EGFR and Ras activation modules.

We validated the composed model by comparing the behaviour of the terminal kinases to the predicted behaviours from running the MAPK cascade model solely (Fig 13). The purple dashed line in Fig 14B denotes the steady-state concentration forecast of each activated kinase at Ras = 0.311 (n mol). The results gained from our composed model in Fig 14A comply with the predicted ones in Fig 14B.

Merging components across models might raise mismatches in their parameters. Here, RShGS in the EGFR and Ras activation models, and Ras in the Ras activation and MAPK models were merged. These species have different initial values and/or thermodynamic constants in their corresponding models. In such cases, our framework flags different values for same species. This is solved by asking the user to either select one of the values or insert a new value for the flagged parameter. Since the user may not have the relevant expertise, we aim to provide users with an evaluation of the ambiguous parameter in multiple models available on PMR in the future. This will give the user a better awareness of the range of values for uncertain parameters.

Our bond graph composed model allowed us to investigate the role of EGF concentration and ATP on kinases while this was not possible on the individual models before composing them. While ATP hydrolysis is mentioned in the Kholodenko et al. EGFR model, it is not included in their computational model. Our composed bond graph model accounts for the missing energy sources that firstly provides a more biologically realistic model and secondly, enables us to examine hypotheses on ATP shortage in the EGFR-Ras-MAPK pathway.

In general, systems biology models will frequently omit metabolites such as ATP or H+ from their reactions, causing issues for mass and energy conservation. In cases where the selected reference models do not describe this part of the biology, users can apply their knowledge or search the literature to add any missing steps or subsystems to the composed bond graph model manually. To enhance this procedure in future, we could employ genome-scale metabolic models (GSMMs) as scaffolds to identify the missing entities or reactions [63, 64].

There are situations where different representations of a certain reaction or process are available through the literature. For example, a reaction might be described with or without an allosteric inhibitor. This arises from different applications for different versions of a model and the scope of the studies. In such cases, one has to decide which version of the model they want to use in model composition.

Our parameter optimisation method is similar to the parameter balancing method utilised by Stanford et al. in [14] in using the thermodynamic constants. While parameter balancing is based on assumptions about typical ranges of parameters and probability distributions [65], our parameter optimisation technique concerns the replication of the model performance with the least square error. In the future, we can utilise other techniques such as parameter balancing in our approach to incorporate the experimentally measured values of parameters and create more realistic bond graph models.

As an improvement to our previous approach [28], the present framework overcame the aforementioned limitations:

  1. No mathematical formulation of bond graphs is required in the CellML modules (formulating symbolic bond graph modules in BondGraphTools is more straightforward and less error-prone);

  2. Auxiliary variables are not needed in the CellML modules as linking ports (ports are automatically detected using ‘white box’ approach by finding identical annotations);

  3. Instead of the semi-automated SemGen merger tool, our approach integrates the modules in a fully-automated manner (our implementation automatically merges the modules and performs the required structural changes).

Here, we have selected models encoded in CellML because CellML can deal with models that are not purely biochemical, but the approach can be applied to models in other formats, such as SBML, as long as they can include a semantic description of the system being modelled. While symbolic templates are required to apply our approach to CellML models, this step is not required for SBML models. This is because the reactant(s)-reaction-product(s) relationships are explicitly defined within SBML models while this information is not clearly provided in CellML models. In this paper, we aimed to illustrate a possible way to convert CellML biomodels into bond graphs and automatically compose them. In future, we intend to apply the same method on SBML models in a more automated way.

Currently, our model composition approach is capable of detecting exact matches. However, this approach could be improved by allowing the user to specify mergeable components from a shortlist of similarly annotated ones. In the future, if the scientific community defines a globally accepted standard to unify the annotation of similar biological models, finding matching annotations among the models will be much facilitated.

Our energy-based model composition approach is designed to link mathematical models encoded in CellML to their bond graph equivalent and compose them in a consistent and physics-based environment. Currently, there is no general method of automatically converting mathematical models into bond graphs, and each model requires domain-specific expertise to generate a similar bond graph form. To reuse and compose the massive number of existing biological models, the community should either push the researchers to build thermodynamically consistent and physically plausible models or encourage the researchers to develop computational tools that convert existing biological models into bond graphs.

If a model follows the laws of physics and thermodynamics, it can be directly converted into bond graphs. Otherwise one must make assumptions to produce a bond graph that approximates the original model. To facilitate such decisions, we propose establishing an evaluation system to check whether the original model is physically realistic or not. If the model cannot represent a physically plausible system and its bond graph approximation does not fit the data, it highlights some inconsistencies in the original model that must be noted and fixed.

The ultimate goal of applying our model composition method is to provide a foundation for future tool developments to convert any arbitrary CellML/SBML model into bond graphs and then convert it back to a CellML/SBML file. We require the bond graph conversion for appending, deleting, and editing modules. This allows us to firstly avoid any errors or confusions during the process, and secondly, make sure that the model conserves energy and mass and remains thermodynamically and physically consistent as we modify it. Eventually, the generated mathematical equations in the bond graph environment can be exported to CellML for simulation and reproducibility. The regenerated bond graph model encoded in CellML will lose its graphical structure and the model will be expressed as a system of ODEs. Since we can convert the exported bond graph ODEs into MathML format, the biochemical equations would be also expressible in SBML. The structure of such SBML models will be preserved since the required parameters, rate laws, and reactant(s)-reaction-product(s) relationships are extractable from the generated bond graph model.

Models are constructed in different units for parameters and various scales of amounts. Coupling arbitrary models will alter their boundary conditions which induces differences that propagate throughout the models. In the future, we plan to apply nondimensionalization to remove dependencies to the measured units across the models and generate unified composed models, regardless of their units [66]. Nondimensionalization is especially useful in models that are described by differential equations. In this systematic technique, all variables and parameters become unitless by rescaling them relative to a reference value.

Another widely-used formalism in computational biology is rule-based modelling, in which a series of rules describe the mechanistic details of biochemical processes, for example the random binding of multiple ligands to a receptor [67]. Recently, rule-based approaches have incorporated energetic parameters to ensure thermodynamic consistency [6870]. Danos et al. showed that by computing the free energy of species formation and hence, free energy inequalities in reactions in rule-based models, one can verify whether a model satisfies the free energy constraints and detailed balance [71]. Moreover, rule-based languages such as BioNetGen [72] and Kappa [73] allow annotations. One advantage of bond graph modelling over rule-based modelling is that they can model multi-physical systems such as electrophysiology, whereas rule-based approaches are limited to the biochemical domain.

5 Conclusion

We have developed a method that automates the integration of biosimulation models. We utilised the SemGen annotator tool to add metadata to CellML models and the Python library BondGraphTools to generate the bond graph template of models. Describing the bonds between bond graph components with connectivity matrices helped us conveniently delete or add bonds/components to the modules. This minimises user error when a structural change is required in complex systems. Here we have presented a method that automates the composition by taking advantage of semantics in the modules and the systematic structural modification using connectivity matrices. We demonstrated the functionality of our method by coupling two biosimulation models and their sub-models. Likewise, several annotated biosimulation models can be integrated automatically if they have common entities. This is particularly pivotal when dealing with complex and large biological systems where mathematically merging models requires time-consuming and error-prone post-composition adjustments. We believe that our method is one of the initial steps toward multiscale cell-to-organ-level model integration.

Supporting information

S1 Fig. Ultrasensitivity in MAPK cascade.

For an input kinase of Ras = 3 × 10−5 (μM), the concentration changes of the activated kinases (MKKKP, MKKPP, and MKPP) show the signal is amplified through each layer.

(TIF)

S2 Fig. The normalised activation of kinases in the MAPK cascade module for different input amounts (Ras).

(TIF)

S1 Table. Reactant(s) and product(s) of each step in EGFR pathway and the reaction rate equations.

Steps 4, 8, and 16 are irreversible reactions, which are approximated by mass action kinetics. κi(i ∈ {Step}) in the reaction rate equations represent the reaction rate constants, Kx (x ∈ {Reactants, Products}) is the thermodynamic constant of each species, and qx (x ∈ {Reactants, Products}) is the concentration amount of each species.

(PDF)

S2 Table. Original and modified parameters of the species in the EGFR pathway model.

(PDF)

S3 Table. Original and modified parameters of the reactions in the EGFR pathway model.

(PDF)

S4 Table. Reactant(s) and product(s) of each step in the Ras activation pathway and the reaction rate equations.

Steps 2 and 4 are irreversible reactions, which are approximated by mass action kinetics. κi(i ∈ {Step}) in the reaction rate equations represent the reaction rate constants, Kx (x ∈ {Reactants, Products}) is the thermodynamic constant of each species, and qx (x ∈ {Reactants, Products}) is the concentration amount of each species.

(PDF)

S5 Table. Original and modified parameters of the species in the Ras activation pathway model.

(PDF)

S6 Table. Original and modified parameters of the reactions in the Ras activation pathway model.

(PDF)

S1 Text. Supplementary material.

Appendix A: Connectivity matrix example. B: Parameter estimation for step 4 in the EGFR pathway model. Appendix C: An example of composing two reactions in bond graphs. Fig A: An example network with its connectivity matrix. Fig B: The irreversible Michaelis-Menten and its equivalent approximated reversible mass action kinetics for step 4 in the EGFR signalling pathway model.

(PDF)

Acknowledgments

NS would like to thank Yuda Munarko for his helpful comments and suggestions. EC passed away before the submission of the final version of this manuscript. NS accepts responsibility for the integrity and validity of the data collected and analysed.

Data Availability

The reference MAPK cascade model is available from: https://github.com/mic-pan/Modularity-SysBio The reference model of the EGFR pathway is available from: https://models.physiomeproject.org/e/47f/kholodenko_demin_moehren_hoek_1999.cellml/docgen All the model files for this manuscript are available on GitHub: https://github.com/Niloofar-Sh/EGFR_MAPK.

Funding Statement

1. NS was supported by an Aotearoa Fellowship to DPN from the Aotearoa Foundation. 2. MP was supported by a Postdoctoral Research Fellowship from the School of Mathematics and Statistics, University of Melbourne. 3. KT was supported by a Marsden Fast-Start grant (UOA1703) from the Royal Society of New Zealand (https://www.royalsociety.org.nz) and a Sir Charles Hercus Health Research Fellowship (21/116) from the Health Research Council of New Zealand (https://gateway.hrc.govt.nz/funding/career-development-awards/2021-sir-charles-hercus-health-research-fellowship). 4. EJC was supported by the Australian Research Council Centre of Excellence in Convergent Bio-Nano Science and Technology (project number CE140100036) (http://purl.org/au-research/grants/arc/CE140100036). 5. DPN was supported by an Aotearoa Fellowship from the Aotearoa Foundation and the Center for Reproducible Biomedical Modeling P41 EB023912/EB/NIBIB NIH HHS/United States (https://projectreporter.nih.gov/project_description.cfm?projectnumber=5P41EB023912-03). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Carrera J, Covert MW. Why build whole-cell models? Trends in cell biology. 2015;25(12):719–722. doi: 10.1016/j.tcb.2015.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Cooling MT, Nickerson DP, Nielsen PM, Hunter PJ. Modular modelling with Physiome standards. The Journal of physiology. 2016;594(23):6817–6831. doi: 10.1113/JP272633 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Yu T, Lloyd CM, Nickerson DP, Cooling MT, Miller AK, Garny A, et al. The physiome model repository 2. Bioinformatics. 2011;27(5):743–744. doi: 10.1093/bioinformatics/btq723 [DOI] [PubMed] [Google Scholar]
  • 4. Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic acids research. 2006;34(suppl_1):D689–D691. doi: 10.1093/nar/gkj092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Clerx M, Cooling MT, Cooper J, Garny A, Moyle K, Nickerson DP, et al. CellML 2.0. Journal of Integrative Bioinformatics. 2020;17(2-3). doi: 10.1515/jib-2020-0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015 [DOI] [PubMed] [Google Scholar]
  • 7. Malik-Sheriff RS, Glont M, Nguyen TV, Tiwari K, Roberts MG, Xavier A, et al. BioModels—15 years of sharing computational models in life science. Nucleic acids research. 2020;48(D1):D407–D415. doi: 10.1093/nar/gkz1055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Neal ML, Carlson BE, Thompson CT, James RC, Kim KG, Tran K, et al. Semantics-based composition of integrated cardiomyocyte models motivated by real-world use cases. PLoS One. 2015. Dec 30;10(12):e0145621. doi: 10.1371/journal.pone.0145621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Smith LP, Hucka M, Hoops S, Finney A, Ginkel M, Myers CJ, et al. SBML Level 3 package: Hierarchical Model Composition, Version 1 Release 3. Journal of Integrative Bioinformatics. 2015;12:603–659. doi: 10.2390/biecoll-jib-2015-268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lopez CF, Muhlich JL, Bachman JA, Sorger PK. Programming biological models in Python using PySB. Molecular systems biology. 2013;9(1):646. doi: 10.1038/msb.2013.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. de Bono B, Safaei S, Grenon P, Hunter PJ. Meeting the multiscale challenge: representing physiology processes over ApiNATOMY circuits using bond graphs. Interface Focus. 2017;8. doi: 10.1098/rsfs.2017.0026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Gawthrop PJ, Crampin EJ. Energy-based analysis of biochemical cycles using bond graphs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2014;470(2171):20140459. doi: 10.1098/rspa.2014.0459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ederer M, Gilles ED. Thermodynamically feasible kinetic models of reaction networks. Biophysical journal. 2007. Mar 15;92(6):1846–57. doi: 10.1529/biophysj.106.094094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Stanford NJ, Lubitz T, Smallbone K, Klipp E, Mendes P, Liebermeister W. Systematic construction of kinetic models from genome-scale metabolic networks. PloS one. 2013. Nov 14;8(11):e79195. doi: 10.1371/journal.pone.0079195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Mason JC, Covert MW. An energetic reformulation of kinetic rate laws enables scalable parameter estimation for biochemical networks. Journal of Theoretical Biology. 2019. Jan 14;461:145–56. doi: 10.1016/j.jtbi.2018.10.041 [DOI] [PubMed] [Google Scholar]
  • 16. Paynter H. Analysis and Design of Engineering Systems. MIT press; 1961. [Google Scholar]
  • 17. Oster G, Perelson A, Katchalsky A. Network thermodynamics. Nature. 1971;234(5329):393–399. doi: 10.1038/234393a0 [DOI] [Google Scholar]
  • 18. Oster GF, Perelson AS, Katchalsky A. Network thermodynamics: dynamic modelling of biophysical systems. Quarterly reviews of Biophysics. 1973;6(1):1–134. doi: 10.1017/S0033583500000081 [DOI] [PubMed] [Google Scholar]
  • 19. Cellier F. Modeling Chemical Reaction Kinetics. In: Continuous System Modeling. Springer, New York, NY; 1991. [Google Scholar]
  • 20. Gawthrop PJ, Cursons J, Crampin EJ. Hierarchical bond graph modelling of biochemical networks. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2015;471(2184):20150642. doi: 10.1098/rspa.2015.0642 [DOI] [Google Scholar]
  • 21. Safaei S, Blanco PJ, Müller LO, et al. Bond graph model of cerebral circulation: Toward clinically feasible systemic blood flow simulations. Front Physiol. 2018. Mar; 9:148. doi: 10.3389/fphys.2018.00148 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Wellstead PE. Introduction to physical system modelling. vol. 4. Academic Press London; 1979. [Google Scholar]
  • 23. Gawthrop P, Smith L. Metamodelling: For bond graphs and dynamic systems. Prentice Hall International (UK) Ltd.; 1996. [Google Scholar]
  • 24. Borutzky W. Bond graph methodology: development and analysis of multidisciplinary dynamic system models. Springer Science & Business Media; 2009. [Google Scholar]
  • 25. Cudmore P, Pan M, Gawthrop PJ, Crampin EJ. Analysing and simulating energy-based models in biology using BondGraphTools. The European Physical Journal E. 2021. Dec;44(12):1–20. doi: 10.1140/epje/s10189-021-00152-4 [DOI] [PubMed] [Google Scholar]
  • 26. Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, et al. Harmonizing semantic annotations for computational models in biology. Briefings in bioinformatics. 2019. Mar;20(2):540–50. doi: 10.1093/bib/bby087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Schamai W, Buffoni L, Fritzson PA. An Approach to Automated Model Composition Illustrated in the Context of Design Verification. Modeling Identification and Control. 2014;35:79–91. doi: 10.4173/mic.2014.2.2 [DOI] [Google Scholar]
  • 28. Shahidi N, Pan M, Safaei S, Tran K, Crampin EJ, Nickerson DP. Hierarchical semantic composition of biosimulation models using bond graphs. PLoS computational biology. 2021. May 13;17(5):e1008859. doi: 10.1371/journal.pcbi.1008859 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Neal ML, Thompson CT, Kim KG, James RC, Cook DL, Carlson BE, et al. SemGen: a tool for semantics-based annotation and composition of biosimulation models. Bioinformatics. 2019;35 9:1600–1602. doi: 10.1093/bioinformatics/bty829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Kholodenko BN, Demin OV, Moehren G, Hoek JB. Quantification of Short Term Signaling by the Epidermal Growth Factor Receptor. The Journal of Biological Chemistry. 1999;274:30169–30181. doi: 10.1074/jbc.274.42.30169 [DOI] [PubMed] [Google Scholar]
  • 31. Brightman FA, Fell DA. Differential feedback regulation of the MAPK cascade underlies the quantitative differences in EGF and NGF signalling in PC12 cells. FEBS Letters. 2000;482(3):169–174. doi: 10.1016/S0014-5793(00)02037-8 [DOI] [PubMed] [Google Scholar]
  • 32. Pan M, Gawthrop PJ, Cursons J, Crampin EJ. Modular assembly of dynamic models in systems biology. PLoS computational biology. 2021. Oct 13;17(10):e1009513. doi: 10.1371/journal.pcbi.1009513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Pan M, Gawthrop PJ, Tran K, Cursons J, Crampin EJ. A thermodynamic framework for modelling membrane transporters. Journal of theoretical biology. 2018;. [DOI] [PubMed] [Google Scholar]
  • 34.Atkins PW, de Paula JC. Physical Chemistry for the Life Sciences; 2005.
  • 35. Polettini M, Esposito M. Irreversible thermodynamics of open chemical networks. I. Emergent cycles and broken conservation laws. The Journal of chemical physics. 2014. Jul 14;141(2):07B610_1. doi: 10.1063/1.4886396 [DOI] [PubMed] [Google Scholar]
  • 36. Aqeel Ashraf M, Faheem M. Energy balances in biological systems. Nanomaterials and Energy. 2021. Mar;10(1):1-. doi: 10.1680/jnaen.2021.10.1.1 [DOI] [Google Scholar]
  • 37. Gawthrop PJ, Crampin EJ. Bond Graph Representation of Chemical Reaction Networks. IEEE Transactions on NanoBioscience. 2018;17:449–455. doi: 10.1109/TNB.2018.2876391 [DOI] [PubMed] [Google Scholar]
  • 38. Molina J, Adjei A. The Ras/Raf/MAPK pathway. Journal of Thoracic Oncology. 2006;1(1):7–9. doi: 10.1097/01243894-200601000-00004 [DOI] [PubMed] [Google Scholar]
  • 39. Sasaoka T, Langlois WJ, Leitner JW, Draznin B, Olefsky JM. The signaling pathway coupling epidermal growth factor receptors to activation of p21ras. The Journal of biological chemistry. 1994;269 51:32621–5. doi: 10.1016/S0021-9258(18)31679-X [DOI] [PubMed] [Google Scholar]
  • 40. Resat H, Ewald JA, Dixon DA, Wiley HS. An integrated model of epidermal growth factor receptor trafficking and signal transduction. Biophysical journal. 2003;85 2:730–43. doi: 10.1016/s0006-3495(03)74516-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Sarma U, Ghosh I. Oscillations in MAPK cascade triggered by two distinct designs of coupled positive and negative feedback loops. BMC Research Notes. 2011;5:287–287. doi: 10.1186/1756-0500-5-287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Orton RJ, Sturm OE, Vyshemirsky V, Calder M, Gilbert DR, Kolch W. Computational modelling of the receptor-tyrosine-kinase-activated MAPK pathway. The Biochemical journal. 2005;392 Pt 2:249–61. doi: 10.1042/BJ20050908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Kholodenko BN. Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades. European journal of biochemistry. 2000;267 6:1583–8. doi: 10.1046/j.1432-1327.2000.01197.x [DOI] [PubMed] [Google Scholar]
  • 44. Lake D, Corrêa S, Muller J. Negative feedback regulation of the ERK1/2 MAPK pathway. Cellular and Molecular Life Sciences. 2016;73. doi: 10.1007/s00018-016-2297-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Arkun Y, Yasemi M. Dynamics and control of the ERK signaling pathway: Sensitivity, bistability, and oscillations. PLoS ONE. 2018;13. doi: 10.1371/journal.pone.0195513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Alocci D, Mariethoz J, Horlacher O, Bolleman JT, Campbell MP, Lisacek F. Property graph vs RDF triple store: A comparison on glycan substructure search. PloS one. 2015. Dec 14;10(12):e0144578. doi: 10.1371/journal.pone.0144578 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Wild DJ, Ding Y, Sheth AP, Harland L, Gifford EM, Lajiness MS. Systems chemical biology and the Semantic Web: what they mean for the future of drug discovery research. Drug discovery today. 2012. May 1;17(9-10):469–74. doi: 10.1016/j.drudis.2011.12.019 [DOI] [PubMed] [Google Scholar]
  • 48. Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, Dumontier M, et al. Controlled vocabularies and semantics in systems biology. Molecular systems biology. 2011;7(1):543. doi: 10.1038/msb.2011.77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cook DL, Mejino Jr JL, Neal ML, Gennari JH. Bridging biological ontologies and biosimulation: the ontology of physics for biology. InAMIA Annual Symposium Proceedings 2008 (Vol. 2008, p. 136). American Medical Informatics Association. [PMC free article] [PubMed]
  • 50.Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic acids research. 2004. Jan 1;32(suppl_1):D258–61. doi: 10.1093/nar/gkh036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Chapter 3—Connectivity Matrices and Brain Graphs. In: Fornito A, Zalesky A, Bullmore ET, editors. Fundamentals of Brain Network Analysis. San Diego: Academic Press; 2016. p. 89–113. [Google Scholar]
  • 52. Neal ML, Cooling MT, Smith LP, Thompson CT, Sauro HM, Carlson BE, et al. A Reappraisal of How to Build Modular, Reusable Models of Biological Systems. PLoS Computational Biology. 2014;10. doi: 10.1371/journal.pcbi.1003849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Gawthrop PJ, Pan M, Crampin EJ. Modular dynamic biomolecular modelling with bond graphs: the unification of stoichiometry, thermodynamics, kinetics and data. Journal of the Royal Society Interface. 2021;18. doi: 10.1098/rsif.2021.0478 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, et al. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software (TOMS). 2005. Sep 1;31(3):363–96. doi: 10.1145/1089014.1089020 [DOI] [Google Scholar]
  • 55.Brightman FA, Fell DA. Differential feedback regulation of the MAPK cascade underlies the quantitative differences in EGF and NGF signalling in PC12 cells. FEBS Letters. 2000 [cited 14 March 2022]. CellML author(s): Catherine Lloyd Available from: https://models.physiomeproject.org/exposure/55e182564e746cc9bac6b03ad7778d4d/brightman_fell_2000.cellml/view [DOI] [PubMed]
  • 56. Altszyler E, Ventura AC, Colman-Lerner A, Chernomoretz A. Ultrasensitivity in signaling cascades revisited: Linking local and global ultrasensitivity estimations. PLoS ONE. 2017;12. doi: 10.1371/journal.pone.0180083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Huang CY, Ferrell JE. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proceedings of the National Academy of Sciences of the United States of America. 1996;93 19:10078–83. doi: 10.1073/pnas.93.19.10078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Medina-Castellanos E, Esquivel-Naranjo EU, Heil M, Herrera-Estrella A. Extracellular ATP activates MAPK and ROS signaling during injury response in the fungus Trichoderma atroviride. Frontiers in Plant Science. 2014;5. doi: 10.3389/fpls.2014.00659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Johnson TA, Jinnah HA, Kamatani N. Shortage of Cellular ATP as a Cause of Diseases and Strategies to Enhance ATP. Frontiers in Pharmacology. 2019;10. doi: 10.3389/fphar.2019.00098 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Schütt F, Aretz S, Auffarth GU, Kopitz J. Moderately reduced ATP levels promote oxidative stress and debilitate autophagic and phagocytic capacities in human RPE cells. Investigative ophthalmology & visual science. 2012;53 9:5354–61. doi: 10.1167/iovs.12-9845 [DOI] [PubMed] [Google Scholar]
  • 61. Hargreaves M, Spriet LL. Skeletal muscle energy metabolism during exercise. Nature Metabolism. 2020; p. 1–12. [DOI] [PubMed] [Google Scholar]
  • 62. Jurado M, Castaño Ó, Zorzano A. Stochastic modulation evidences a transitory EGF-Ras-ERK MAPK activity induced by PRMT5. Computers in Biology and Medicine. 2021. Jun 1;133:104339. doi: 10.1016/j.compbiomed.2021.104339 [DOI] [PubMed] [Google Scholar]
  • 63. Namrak T, Raethong N, Jatuponwiphat T, Nitisinprasert S, Vongsangnak W, Nakphaichit M. Probing Genome-Scale Model Reveals Metabolic Capability and Essential Nutrients for Growth of Probiotic Limosilactobacillus reuteri KUB-AC5. Biology. 2022. Feb 11;11(2):294. doi: 10.3390/biology11020294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010. Jan;5(1):93–121. doi: 10.1038/nprot.2009.203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Lubitz T, Schulz M, Klipp E, Liebermeister W. Parameter balancing in kinetic models of cell metabolism. The Journal of Physical Chemistry B. 2010. Dec 16;114(49):16298–303. doi: 10.1021/jp108764b [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Ledder G. Scaling for Dynamical Systems in Biology. Bulletin of Mathematical Biology. 2017;79:2747–2772. doi: 10.1007/s11538-017-0339-5 [DOI] [PubMed] [Google Scholar]
  • 67. Chylek LA, Harris LA, Faeder JR, Hlavacek WS. Modeling for (physical) biologists: an introduction to the rule-based approach. Physical biology. 2015. Jul 15;12(4):045007. doi: 10.1088/1478-3975/12/4/045007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Ollivier JF, Shahrezaei V, Swain PS. Scalable rule-based modelling of allosteric proteins and biochemical networks. PLoS computational biology. 2010. Nov 4;6(11):e1000975. doi: 10.1371/journal.pcbi.1000975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Hogg JS. Advances in rule-based modeling: compartments, energy, and hybrid simulation, with application to sepsis and cell signaling (Doctoral dissertation, University of Pittsburgh).
  • 70.Sekar JA, Hogg JS, Faeder JR. Energy-based modeling in BioNetGen. In2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2016 Dec 15 (pp. 1460–1467). IEEE.
  • 71.Danos V, Harmer R, Honorato-Zimmer R. Thermodynamic graph-rewriting. InInternational Conference on Concurrency Theory 2013 Aug 27 (pp. 380–394). Springer, Berlin, Heidelberg.
  • 72. Harris LA, Hogg JS, Tapia JJ, Sekar JA, Gupta S, Korsunsky I, et al. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016. Nov 1;32(21):3366–8. doi: 10.1093/bioinformatics/btw469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Boutillier P, Maasha M, Li X, Medina-Abarca HF, Krivine J, Feret J, et al. The Kappa platform for rule-based modeling. Bioinformatics. 2018. Jul 1;34(13):i583–92. doi: 10.1093/bioinformatics/bty272 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Lutz Brusch

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

28 Feb 2022

PONE-D-21-39103A semantics, energy-based approach to automate biomodel compositionPLOS ONE

Dear Dr. Shahidi,

Thank you for submitting your manuscript to PLOS ONE and please accept my sincere condolences on the loss of Professor Crampin. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please carefully address the comments by all three Reviewers and note, Reviewer #1 has provided a four-page report as an attached pdf file and the below text just shows a summary thereof. Especially, improvements to the structure and clarity of the manuscript text are required. Following comment 3a by Reviewer #1, a verbal description of the hierarchical merging of multiple models is required but no larger model would need to be explicitly constructed here. Regarding comment 3f by Reviewer #1, a general verbal description of model behaviors (e.g. due to closed feedback loops) in the merged model that were broken in the un-merged models is required but no specific new predictions or comparison with experimental data or literature will be required for this EGFR-Ras-MAPK example here. The separately available figure files (tiff) are all of high resolution, so you may ignore the last comment on figure rasterization by Reviewer #2.

Please submit your revised manuscript by Apr 14 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Lutz Brusch, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements: 

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

Reviewer #3: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I am of the opinion that energy-based model composition will play a critical role in expanding the utility of reaction models in biochemistry. The authors have applied multiple innovative strategies such as bond graphs, templating, automated semantic inference, and automated merging to their model composition pipeline, so the premise of the paper is sound. However, it disappoints me to recommend rejecting this manuscript on several counts.

1. It fails to properly introduce the problems being solved, how they originate in biology, and how the proposed methods solve those problems.

2. It fails to introduce the technical background of the methods being deployed in a manner accessible to a general biological audience, or even an audience interested in biochemical models.

3. It has multiple issues with the demonstrated results: questionable choices in merging, insufficient demonstration of proof of concept, poor structuring and comparison of simulation results.

4. Severe readability and copy-editing issues with the text.

Issue 3a and 3f (please see attached PDF for full review) are critical reasons behind recommending rejection over revision.

Reviewer #2: # A semantics, energy-based approach to automate biomodel composition

## Major comments

This paper describes a methdology of using Bond graphs to create complex composite models.

This is a topic many people (myself included) will want to learn more about, and the accompanying software could potentially be very useful.

However, there are several issues with the manuscript text that will need addressing before this can be published.

Most notably:

1. The introduction and methods section are far more abstract than they needs to be. For example, the text frequently refers to "entities", "elements", "components" of bond models without making clear what these words refer to (or if they are the same), it mentions "similar annotations" without saying what this means, etc. Such parts of the text should be rewritten to be both clear and precise.

2. The problem statement is unclear. In several places it seems to be about coupling two (known) models, but parts of the methodology imply it is about identifying models that could potentially be coupled?

3. As the problem statement is an abstract one, an example (perhaps a simple toy problem in addition to the real-life example shown) should be introduced early on (i.e. in the introduction) and used to explain the problem statement. In the current manuscript a general outline of a solution is being sketched long before the reader has been given the tools to understand the problem this aims to solve.

4. The level of detail varies considerably throughout the paper, lots of words are devoted to fairly simple processes such as creating a connectivity matrix and removing components that have been deemed identical, but very little is said about more complex steps, e.g. how annotations are compared, how "thermodynamically consistent parameters" are created from inconsistent ones etc.

## Detailed comments

### Line 17

> The main challenges in hierarchical model composition include: (a) incompatible code languages, (b) different modelling frameworks, (c) post-composition adjustments, and (d) physically implausible resultant models.

This needs a slower and more careful explanation. What is the difference between an "incompatible language" and a "different framework"? What are "post-composition adjustments". Can you give an example?

### Line 20

> A majority of model integration platforms ... require compatibility between

the languages and modelling frameworks

> In contrast, the resultant model still needs further post-merging code-wise adjustments to be executable, yet it might not represent a physically feasible model.

How are these two statements "In contrast"?

Are "post-merging code-wise adjustments" the same thing as "post-composition adjustments"?

### Line 26

> One solution to these issues is using a hierarchical modelling approach (to help with the post-composition adjustments) and an energy-based modelling framework (to guarantee a physically plausible composed model).

I'm not sure if you are (A) stating something I should already be able to understand at this point, of if you are (B) saying this will be explained in the upcoming text. If A, then it needs a lot more explanation, if B then please rewrite the text so that this is clear.

### Line 53

> The annotated data from the CellML modules are then extracted and assigned to their equivalent bond graph components.

By "data" do you mean data (i.e. parameter values?), or e.g. equations, units, etc. ?

And what is a "component"? Should data not be connected to data, or variables to variables? Please rewrite to make statements such as these less abstract and more clear.

### Line 55

> We demonstrate this by an example where a bond graph model is constructed from its constitutive modules, i.e., the Epidermal Growth Factor Receptor (EGFR) signalling pathway and the Mitogen-Activated Protein Kinase (MAPK)

Given this example, I'm a bit surprised you opted for CellML over SBML. This may warrant some discussion.

### Line 79 Materials and methods

Please make this whole section less abstract, e.g. by starting with a detailed example of two models that you wish to connect, so that the reader has some idea what you mean by the various "entities" and "elements" that appear in the text.

### Line 86

> automatically rewiring the connections between components and modules

What are "components" in this sentence? The introduction should probably give a very brief explanation of what a bond graph looks like (maybe a figure?) and explain what the difference between e.g. a bond graph "module" and a bond graph "component" is. (Is it just a variable? Then please say so)

### Line 92

> Our method to expand and integrate biosimulation models provides the foundation for further developments in an open-source environment based on energy-based modules and automation to minimise manual input.

I'm not sure what this sentence means? Without the "to minimise manual input" it reads like part of an abstract or conclusion section. But are you just trying to say you want to minimise manual input when composing models?

### Line 94

> In this endeavour, we have provided some exemplar symbolic bond graph models to which the annotated parameters from the CellML modules would accordingly link.

Does this mean that any time I want to link 2 models, I need to write 2 templates (that presumably match the 2 models I want to link), and then write a connection matrix describing how my two templates are linked? If it's so hard-coded, then why not just write down which variables I want to connect straight away?

### Line 99

> The suitable bond graph model is then automatically selected from the list by

Which list?

### Line 100

> identifying specific annotated components in the CellML modules

What exactly is a "component" here? Is it a CellML component (i.e. a container of variables)?

### Line 117

> Ontologies

I'm not sure who the intended audience for this paper is, but if it's "anyone who wants to compose a big model" you should probably add a line saying what an ontology is, or maybe add a reference.

### Line 121

> We suggest downloading CHEBI, FMA, OPB, and GO ontologies from the following links:

Please explain what domains these 5 are, and why they are appropriate (are they approriate for everything, or just for your two pathway example?)

> We used the OPB and GO ontologies for the particular case study in this paper.

Why do I need the other two then?

Please explain what these ontologies will be used for. Do they provide labels identifying unique variables? Variable types e.g. "is a concentration"? Will we be inferring properties of our variables using these ontologies e.g. units?

### Line 128

> The generic approach

Do you mean generic, i.e. this section explains why the approach works on any model, or "general approach", so that this section presents an outline of the approach taken in the paper?

### Line 131

> the parameters in the models

Please explain in the intro what you expect to find inside each model (should this say "module"?), e.g. variables, parameters, etc. and use these terms throughout instead of "elements", "entities", "components" etc.

Is a distinction made between variables and constants/parameters?

### Line 138

> The number of rows and columns each equals the number of elements in the network in total

What are the "elements" and the "network"? Are elements modules? Variables in a module?

Figure 1 and the accompanying text are far too vague and focus on the wrong aspects: Readers who are interested in this topic will have come accross the topic of a network and a connectivity matrix before, and will be wondering instead how it applies to your problem of tying two pathway models together. This second bit is not explained in any detail here.

If they are variables, then what are we expecting from the on

### Line 142

> "facilitates computational measurements"

What is a "computational measurement" and how is it facilitated by a binary representation of these connections?

### Line 144

> Modifying a network is easily performed by inserting 0 or 1 in the matrix

In two places, to preserve symmetry? Given the zero diagonal, the symmetry (which the user needs to manually maintain), and the presumed sparseness of this matrix in real applications (if elements are variables, as I suspect at this point in my reading), this is not a very compact or easy-to-use representation.

### Line 152

> In a `black box' composition approach, the elements of the modules are not accessible and only the input/output variables can be used for coupling

It's not clear to me why we'd expect the modules to have clearly defined "inputs" and "outputs" at this point, so I'm struggling a bit to see why you need to point out that any variables can be connected.

### Line 154

> almost all the entities can be regarded as merging ports

What is a "merging port"? And do we need this bit of jargon or can it be stated more simply?

### Figure 2

What are "similar annotations"? Do both models have the exact same set of RDF properties? The exact same unique labels? Is it OK if only a subset matches?

This figure seems to show 2 preselected models, but there was also mention of a "list", where does that fit in?

How is the "repository" section related to the two models?

### Sections 2.2 & 2.3

These are very helpful, but are required reading to understand e.g. 2.1, so some re-structuring is necessary!

### Line 216

1. Is the "." in the equation meant to be a \\cdot? Also, if a multiplication symbol is used here it should be present in the other two multiplications (R*T*ln(K_q*q) or RT ln(K_q q))

2. What is u_q in this equation?

### Line 222

> A reaction represents a dissipative process, which in the case of mass-action kinetics...

This could do with some clarification. Are all reactions "dissipative"? Mass-action kinetics are usually phrased in terms of reversible processes.

### Line 250

> and the bond graph model of MAPK cascade is taken from the work by Pan et al. [21]. In this paper, the bond graph representation of the reference MAPK cascade was available. Here, we detail how bond graph models of these systems were constructed.

The second sentence repeats the first. But then the third contradicts it?

### Figure 4

> The network adapted from

Missing is/was

### Line 294

> We applied curve fitting to estimate the reaction rate constants for the irreversible steps (κ4 , κ8 , & κ16 ). We obtained the time-dependent behaviour of the contributing species in steps 4, 8, and 16 (required for curve fitting).

1. If the second sentence is required for the first, it should come first.

2. These two steps contain a lot of work. Much more explanation is needed to make this reproducible.

### Line 299

> some of the reversible reactions do not satisfy detailed balance

I had assumed this would be guaranteed by the bond graph methodology, some comment or a reference could be useful here.

### Line 300

S1 Table. S2 Table and S3 Table --> Table S1. Table S2 and Table S3

### Lines 339, 343, 344, 345

> model of EGFR-Ras-MAPK

> and MAKP cascase

> Since MAPK cascade includes

Many missing "the"s throughout the text

### Lines 350

> Due to the limited size ...

This would make more sense in section 2.1.1

### Line 354

> for inconsistencies among the values of similarly annotated components and parameters

What is meant by "inconsistencies" here? Please be precise and give examples.

### Line 510

> For biochemical reactions, if the parameters are thermodynamically inconsistent, they are converted into bond graph compatible ones.

Where is this process explained?

### All figures

This could be just a proof issue, but the figures are all rasterised, and at a low resolution.

Reviewer #3: In their manuscript “A semantics, energy-based approach to automate biomodel composition”, Shahidi et al. describe a new framework for combining biochemical network models based on a mathematical representation in the form of bond graphs. They describe the method and how it is designed to guarantee thermodynamic correctness of the resulting models, and illustrate the procedure with an example case, combining two existing signaling pathway models into a larger, consistent model.

The manuscript is extremely well and clearly written and was a pleasure to read. I think that the method will be very useful. Since it has already been implemented for CellML models, and an implementation for SBML models is conceivable, it has the potential for broad applications in biochemical pathway and network modeling.

I did not check the code.

I have no substantial criticisms. Below I list a few minor points to improve the manuscript, mostly about clarification of words. I leave it to the authors to decide which of these points they would like to account for.

Finally, I would like to express my condolences to the authors for the passing of Professor Crampin. It must have been painful for you to complete the work without your colleague.

----------------------------------------------------------------------------------------------------------------------

Title: The term “bond graph” could be mentioned in the paper title.

2: “Physicians”: I think it’s a dream of modelers that their models will be used by physicians, but I think we’re usually still far from this.

27 (and elsewhere): “Energy-based modeling framework”: since “energy-based” can mean many things, it would be good to explain this term very clearly and explicitly early on.

60: “provides a reliable and consistent framework that first conserves energy” again, not very clear. The need to satisfy thermodynamic Wegscheider conditions does not exactly arise from energy conservation (first law of thermodynamics), but is also related to the second law, and basically the fact that Gibbs free energy is a thermodynamic potential. So “conserves energy” is a bit imprecise, and maybe not very well understandable.

101: “Due to the hierarchical feature of bond graphs,” also, “hierarchical” is not very clearly explained. I guess it refers to the usage of symbolic templates and (“inside” them) the actual network-like model structures. But these are just two layers, and “hierarchical” sounds like there were many hierarchy layers.

109: “A common effort between the components is shown by a ‘0’ junction, while a ‘1’ junction shows a common flow, and the energy is conserved and travels between components bidirectionally through bonds (shown by harpoons).” This explanation is not very easy to get, please epxlain in more detail (e.g. mentioning little examples?)

147: “Notice that the connectivity matrix is symmetric” at first, not clear if the connectivity matrix is always symmetric (by definition) or just happens to be symmetric in this example application.

Section 2.2 explains the usage of bond graph modeling of biochemical reactions, but it remains unclear how parameters, rate law formulae, and other data attached to the nodes will be treated during model composition. Is an enzymatic rate law a property attached to the reaction node? What if in model combination, the same reaction is described (in the two models) in different ways, e.g. with or without an allosteric inhibitor? Is this just a choice between data attached to the reaction node, or a choice between different structures of the bond graph?

247: “As such, we consider Ras protein to be the mutual species in both pathways.” Would there be additional complications if models are connected by several species (e.g. closing thermodynamic loops that were not present in the initial models, but in the combined model”?

In the bond graph model, saturable rate laws were described by irreversible Michaelis-Menten kinetics. Would it also be possible to use reversible Michaelis-Menten kinetics? Or do reversible reactions have to be modelled by mass-action kinetics, for mathematical reasons? (I guess the answer to the latter question is no; maybe it would be good to point this out?)

“The chemostats” - I guess the word refers to species with fixed and given concentrations (sometimes called “external metabolites” in kinetic modelling)? Since the same word is often used in biology in a different meaning (a device with fixed and given concentrations in the INFLOWING medium, not in the bioreactor itself), it would be good to add a short explanation (just say what “chemostat” means in this work).

220: “Hence, a symbolic bond graph module for a cycle could be created and reused.” is “could” past or conditional? Please rephrase.

545: “As an improvement to our previous approach [17], the present framework overcame

the aforementioned limitations: ..” these are indeed great achievements. Congratulation!

583: “Eventually, the generated mathematical equations in the bond graph environment can be converted into CellML for simulation and reproducibility.” Would the results models again have the form of a “normal model”, or do they still look very “bond-graph like”, e.g. with non-biological components representing junctions? (And I have the same question for a (potential, future) conversion to SBML models).

References: “Paynter H. Analysis and Design of Engineering Systems/Paynter HM;.” reference incomplete

Fig 7: The fonts are a bit small

Fig 15: in the legend for subfigure C, maybe mention the little dip in the curve and how it is caused?

Finally, I would like to mention some “historical predecessors” of this work, which tried to establish ways to build biochemical network models from simple “standard elements” while taking thermodynamic feasibility into account.

Ederer M. and Gilles E.D. (2007), Thermodynamically Feasible Kinetic Models of Reaction Networks, Biophysical Journal, Volume 92, Issue 6, 1846-1857,

Stanford N.J., Lubitz T., Smallbone K., Klipp E., Mendes P., Liebermeister W. (2013), Systematic construction of kinetic models from genome-scale metabolic networks, PLoS ONE 8(11): E79195

While the present work is certainly more elegant, it may make sense to cite these earlier works.

Furthermore, the adjustment of parameters to become thermodynamically feasible seems to resemble parameter balancing (which is used in the Stanford et al paper and could also be cited).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Michael Clerx

Reviewer #3: Yes: Wolfram Liebermeister

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: REVIEW.pdf

PLoS One. 2022 Jun 3;17(6):e0269497. doi: 10.1371/journal.pone.0269497.r002

Author response to Decision Letter 0


19 Apr 2022

Response to Reviewers

A semantics, energy-based approach to automate biomodel composition

We thank the reviewers for their detailed comments on our manuscript. We have addressed the issues as detailed below. The reviewer comments will be shown in black, our responses in green and quotations from the revised manuscript in blue.

Key changes to our manuscript are below:

(Key change 1) The addition of an extra module for Ras activation

Reviewer 1 raised issues on both the need to merge more than two models and issues with directly linking Sos to Ras. While we believe that our approach of merging together two models is sufficient to demonstrate the feasibility of our approach, we have now added a RAS activation module as an intermediate between the EGFR and MAPK models to make the model more biologically realistic. This new module is described in Section 2.2.2 and the beginning of Section 3, along with Figs. 5, 10 and 12.

(121-124) “... describe how the bond graph modules of the EGFR-Ras-MAPK signalling pathway are constructed based on the existing work by Kholodenko et al. [39] on the EGFR model, Brightman & Fell [40] on the Ras model, and Pan et al. [11] on the MAPK model (Section 2.2).”

We outline the model and its parameterisation in the text below:

(308-318) “Fig 5 represents the kinetic and bond graph structure of the Ras activation module which links the EGFR and MAPK modules.”

We converted the kinetic parameters of the reactions into bond graph parameters using the same applied techniques in Section 2.2.1. For the irreversible reactions (steps 2 and 4), we assumed a very small value for the reverse kinetic constants (k^- ) to limit the reverse flow to a negligible amount. The reaction equations along with their participating species are given in S4 Table. S5 Table and S6 Table compare the parameter amounts of the Brightman & Fell model with the ones from our reconstructed bond graph model. The code to convert the kinetic parameters into bond graph equivalents for the Ras activation intermediate pathway is accessible from: https://github.com/Niloofar-Sh/EGFR_MAPK/tree/main/Ras.

We have also updated Fig. 10 to show how the intermediate module is merged with the other models.

(558-560) “Here, RShGS and Ras in the Ras activation module are removed while RShGS in the EGFR module and Ras in the MAPK module are kept.”

A comparison of this module to the existing model has been added to the Results

(591-601) “The Ras activation pathway:

The Ras activation intermediate pathway included both reversible and irreversible reactions which were expressed in mass action kinetics. We estimated the bond graph parameters of the reactions by applying the parameter balancing technique in which we included an additional constraint (relatively small k^-) for each irreversible reaction to limit the reverse flux (Section 2.2.1). Fig 12 demonstrates the behaviour of four species in the reduced Brightman & Fell Ras activation model and its equivalent bond graph approximation. The bond graph equivalent could follow the same trend as in the CellML reduced model with negligible error (0.07%<NRMSE<0.2%). Concentrations have no dimensions in the original CellML model to balance the units [63].”

(Key change 2) The connectivity matrix

Several reviewers raised issues in understanding the connectivity matrix. We acknowledge that the connectivity matrix is not essential to our approach. Nonetheless, we found it to be useful in allowing models to be merged in a general manner.

We have modified the manuscript to explain this issue:

(433-438) While not essential to our methodology, we chose this approach because binary representation of models clearly shows the connections and gives the minimal required details to define a network which can be exported to other tools and software for further analysis [46]. To modify a network, one can insert 0 or 1 in the matrix or delete its corresponding row and column. An example in S1 Text.A shows how connectivity matrix is defined for a simple network.

We agree that describing the details of the connectivity matrix disrupted the flow of the text, so we have moved the figure along with the extensive explanation on connectivity matrices to the supplementary material (S1 Text.A).

To help a reader grasp the merging process facilitated by the connectivity matrix, we have added a simple example in Fig. 2.

Reviewer #1

I am of the opinion that energy-based model composition will play a critical role in expanding the utility of reaction models in biochemistry. The authors have applied multiple innovative strategies such as bond graphs, templating, automated semantic inference, and automated merging to their model composition pipeline, so the premise of the paper is sound. However, it disappoints me to recommend rejecting this manuscript on several counts. Issue 3a and 3f below are critical reasons behind recommending rejection over revision.

1. It fails to properly introduce the problems being solved, how they originate in biology, and how the proposed methods solve those problems.

2. It fails to introduce the technical background of the methods being deployed in a manner accessible to a general biological audience, or even an audience interested in biochemical models.

3. It has multiple issues with the demonstrated results: questionable choices in merging, insufficient demonstration of proof of concept, poor structuring and comparison of simulation results.

4. Severe readability and copy-editing issues with the text.

We have made substantial changes to the manuscript which we hope address the above issues. To deal with the critical issues 3a and 3f, we have added a third module for the activation of Ras (Section 2.2.2) and included additional simulations showing the effects of EGF on the merged model (Fig. 18).

Issue 1: Introducing the problems being solved

The introduction takes for granted that the reader is aware of scale issues in biochemical modeling. First, they must motivate why system-level modeling of biochemical reactions is difficult. This includes providing information on

how biochemical complexity affects species semantics, and how that creates problems for model composition,

We have added the following text to describe biochemical complexity and semantics:

(82-88) “... biological and biochemical complexities can give rise to inconsistencies in semantic annotations. Many species and chemical compounds are not simply defined by a single semantic term. Subtle variations of names for species have long been an obstacle for semantics-based merging tools to integrate models based on identifying similarly annotated species. In this paper, we have used identical semantics for same species and leave it to the scientific community to develop a harmonising system for annotating biomodels.."

detailed balance and thermodynamic consistency concepts, how they apply to biology, and how current approaches fail at this,

We have added the following text to describe the application of detailed balance and thermodynamic consistency to biology:

(70-79) “In the context of biochemistry, modellers widely use traditional kinetic models. However, in general, kinetic models are not thermodynamically consistent (i.e. energy conserving) unless the parameters satisfy certain detailed balance constraints. Specifically, detailed balance constraints are required to ensure that biochemical loops have zero flux (i.e. dissipate no energy) at equilibrium. These detailed balance constraints become increasingly difficult to derive as biochemical networks become larger. Because bond graph models assign a chemical potential to each species, they automatically adhere to detailed balance constraints. Hence, parameters can be modified without violating thermodynamic consistency [27]. This ensures that model composition respects the constraints on thermodynamics for biochemical systems.”

how energy-based composition resolves the issue of thermodynamic consistency.

We trust that this has been addressed in the above comment.

The manuscript touches upon these points briefly in several places (e.g., white-box approach, etc.). But it never provides a cohesive structured argument accessible to the reader. It is not sufficient to simply state that other methods produce infeasible models, but to demonstrate what infeasibility means in this context and why it happens.

Issue 2: Introducing the technical background

Both in the introduction and in the methods section, the authors over-explain the mechanics of what they do, but under-explain the concepts relevant to understanding. Thus, it feels like reading a tutorial without grasping the scientific intuition behind it.

A more detailed representation of related work is necessary. E.g., what are the issues with SBML- hierarchical and state of the art in model composition? What are post-composition adjustments?

We have added the following text to describe limitations with other model composition methods:

(23-25) “… post-composition adjustments (manual edits in mathematical equations, rules, or parameters to make the composed models biologically sensible [8]).”

(26-31) “While existing model integration platforms, such as the SBML Hierarchical package [9] and PySB [10] can resolve issues with code and modelling formalism compatibility, they are limited to the biochemical domain. The resultant model often needs further adjustments to be executable and yet it might not follow the laws of thermodynamics and physics (such as energy, mass, and charge conservation) [12,14]”

For bond graphs, the authors need to explain separately bond graph theory, bond graph terms and what they mean, and instructions on how to build/visualize/understand bond graphs. Combining all of these into a single section (Sec 2.2) makes it hard to understand. For example, how does one decide where to glue edges on the bond graph? In Fig 3A, 0:u species nodes are connected to a 1:v reaction node, whereas in Fig 5, 0:u nodes are often connected directly to Re:k nodes. My guess is that it does not matter because of some underlying bond graph theory, but I should not have to resort to guesswork in the face of insufficient explanation.

We have added more text to explain the bond graph concepts in Figure 1 (new version):

(190-194) “As shown in Fig 1.B, the species complexes (A & B as reactants and C & D as products) at either side of the reaction are connected to the Re component through ‘1 : v’ junctions because the pairs share common flows. The corresponding ‘0 : u’ junction for a species can be directly connected to an Re component if it is the only reactant or product of that reaction.”

Explain the connection between thermodynamic terms and kinetic terms, e.g., is the thermodynamic constant of a species the same as the more familiar free energy of formation of species? What is a dissipative process?

We have added the following text to define these terms:

(171-174) “K_q is related to the kinetic free energy of species to participate in reactions and is defined as K_q=1/(V_c q_ref ) e^((u_q^ref)/RT) where Vc is the volume of the compartment, q_ref is the reference concentration (normally 1 mol), and u_q^ref is the standard free energy formation of the species [27].”

(178-179) “In bond graphs, a reaction represents a dissipative process where chemical energy is lost in the form of heat [52].”

Explain the connection between detailed balance and energy conservation. What does it mean to obey physical laws in the context of reactions (it is not sufficient to just state that they should be obeyed)? The manuscript does not explain it to the general audience: specifically, that on a per-mol basis, energy should be conserved around a loop of reactions, which places constraints on the relationships between kinetic parameters.

We have added the following text to mention the physical laws in the context of reactions:

(30-31) “... follow the laws of thermodynamics and physics (such as energy, mass, and charge conservation) ...”

We have added the following text to explain the energy conservation in a loop of reactions (as discussed in issue 1b):

(70-79) “In the context of biochemistry, modellers widely use traditional kinetic models. However, in general, kinetic models are not thermodynamically consistent (i.e. energy conserving) unless the parameters satisfy certain detailed balance constraints. Specifically, detailed balance constraints are required to ensure that biochemical loops have zero flux (i.e. dissipate no energy) at equilibrium. These detailed balance constraints become increasingly difficult to derive as biochemical networks become larger. Because bond graph models assign a chemical potential to each species, they automatically adhere to detailed balance constraints. Hence, parameters can be modified without violating thermodynamic consistency [27]. This ensures that model composition respects the constraints on thermodynamics for biochemical systems.”

Explain to a general audience how energy-based composition automatically produces thermodynamically consistent models: specifically that reframing kinetic parameters using energies of formation of species leads to conservation laws preserved around loops.

We have addressed this comment in Issue 1b.

Bond graph model simulation is not mentioned at all. However, results from the simulation are shown. How is bond graph model simulation different from or related to known methods like ODE integration and Gillespie SSA? This is something a general modeling audience will be completely unaware of.

The Bond graph formulation produces a set of ODEs which can be solved using any standard solver package. We have added the following text to make this clear:

(529-533) “Bond graph models of biochemical systems are deterministic and generate a set of ODEs which can be solved by any standard ODE solver package. In this paper, the models were simulated using the SUNDIALS package [62]. In future work, there is scope to expand the energy-based approach to model of stochastic systems, using algorithms such as the Gillespie algorithm for simulation. ”

For semantics-based composition, the methods section has too many special terms that are not fully defined. Sufficient background needs to be provided on annotations and ontologies and a simple example must be used to demonstrate how merging occurs.

While we believe the term ‘annotations’ to be readily understood by the reader, we have added text to describe what semantic annotation is.

(80-82) “Semantic annotation is labeling the mathematical content of models or data with standard machine-readable descriptions [28]. These are crucial for the reusability and interoperability of models. ”

We have defined the term “ontology” in the following text.

(391-408) An ontology is a semantic resource of standard notions and vocabularies of species, structures, and observations in terms of Resource Description Framework (RDF) triples (https://www.w3.org/RDF/). RDF is a standard mechanism to describe and interchange data on the Web and an RDF triple is a subject–predicate–object statement that describes the properties of an entity, often using ontologies [41, 42]. For example, the RDF [OPB00340 - CHEBI29103 - FMA70022] reads [concentration-potassium-extracellular space] which specifically describes the physical property and location of an entity. Ontologies are useful tools to add meaning to different parts of models to avoid any ambiguous interpretations [43]. Depending on the area of biomedical science in which the researchers annotate their models, one or various reference ontologies might be used. For the scope of this publication, we used the csv files for the Ontology of Physics for Biology (OPB) [44] and the Gene Ontology (GO) [45], downloaded from the following links:

– OPB: https://bioportal.bioontology.org/ontologies/OPB;

– GO: https://bioportal.bioontology.org/ontologies/GO.

The OPB is a reference ontology for physical principles such as chemical concentration, electrical capacitance, temperature, and fluid volume. The GO provides descriptions for molecular biology such as gene products, biological sequences, and molecular activities.

We have added an illustrative example of merging two simple models to help a reader understand the concept.

(204-209) “Fig 2 illustrates an example of composing together two reactions in bond graphs. Our framework recognises that the Ce : C component is the same in both reactions and merges them. When two components from two modules are merged, the conservation equations at their corresponding `0 : u' junction changes. S1 Text.C details the conservation laws and constitutive equations in each reaction separately as well as in the case where both reactions are combined to create the composition.”

For the composition section, use a small bulleted list to convey the elements of the pipeline. Then explain each element in detail.

We have added text to describe the key elements of our semantics-based model composition pipeline and modified Fig 7 accordingly.

(464-496)

“Fig 7 depicts the eight main steps in our semantics-based model composition framework as follows:

1. A function in our framework extracts the annotations and values of the CellML models. If exact matches of annotations are not detected between the models, a warning is given. The user should check the models to see if they are appropriate for composition. If there are matching annotations, two pathways are made available: composition process and value allocation.

2. In this step, a function checks the mergeability of the identically annotated entities. If they are not mergeable, the function ignores the entities [48]. Otherwise, it passes them to the next step. For example, biochemical species are considered mergeable since they can simultaneously participate in multiple reactions but a parameter like temperature cannot be merged as it cannot become a port for external connections. Based on the deleted duplicate components, the connectivity matrices are combined, allowing the models to be merged (details in Section 2.4).

3. In this step, only one entity is kept from each group of identically annotated mergeable entities and the rest is deleted.

4. In this step, our framework links the modules at each merging point to integrate them. A link is a bond in bond graph terminology and can be added to the system by inserting a 1 in the connectivity matrix (used in the current approach) or adding a syntax to incorporate a new bond between the modules.

5. Step 5 identifies inconsistencies in the values of identically annotated entities. These values include the initial conditions and the entities’ thermodynamic constants (as described in Section 2.1).

6. This step prompts the user to choose a value for the identically annotated entities found in step 5. For instance, if a chemical species is present in more than one model (identically annotated in all the models) and has different initial concentrations, the user is asked to select one of the values or insert a new one for that specific chemical species.

7. This step parameterises the bond graph symbolic templates with the values for each annotated entity.

8. Step 8 gathers the information coming from the composition process and value allocation to generate a bond graph composed model in the form of a system of Ordinary Differential Equations (ODEs).”

Issue 3: Problems with Results

The EGFR-Ras-MAPK example shown is small enough to be in a tutorial, but it is not sufficient to be a full demonstration of the proof of concept. At the very least, attempt must be made to merge multiple (> 2) models hierarchically.

We have addressed this comment in Key change 1.

I’m not very happy with the decision to merge Sos species with Ras. It could’ve been easily avoided by using a third model with two simple Sos-Ras reactions.

We have added a third module, as discussed in Key change 1.

In fact, this highlights a potential problem with the hierarchical composition approach: what happens when you start to merge two models, but then you identify missing elements missing that require additional modeling? I’m assuming this comes under post-composition adjustments.

Yes, this type of issue is a post-composition adjustment. We argue that this arises from gaps in the modelling literature and our hierarchical composition approach provides a solution to this problem. We have added the following text to highlight this issue:

(751-757) “In general, systems biology models will frequently omit metabolites such as ATP or H+ from their reactions, causing issues for mass and energy conservation. In cases where the selected reference models do not describe this part of the biology, users can apply their knowledge or search the literature to add any missing steps or subsystems to the composed bond graph model manually. To enhance this procedure in future, we could employ genome-scale metabolic models (GSMMs) as scaffolds to identify the missing entities or reactions [71, 72].”

The interplay between manual selection (e.g., indicating Sos as Ras) vs automated semantic inference (e.g., inferring Sos to be a specific biochemical entity) should be clearly delineated and its effects discussed.

We have added an intermediate module, so this is no longer an issue (see Key change 1).

The figures for the results are poorly structured. Whenever figure panels are being compared in the text, they should be juxtaposed in the same figure. For example, it would be useful to place MAPK cascade simulation results and MAPK bond graph simulation results in the same panel for direct comparison and verification.

We do not compare a mathematical model of the MAPK cascade with its bond graph equivalent, since a bond graph model was previously developed. Fig 13 shows the steady-state responses of kinases within the MAPK module alone. The same plot is later reproduced in Fig 14.B to use as a measure of verification for the curves in Fig 14.A

In the section examining effect of ATP concentrations, the inputs provided are not mentioned (e.g., what Ras concentration is used for each curve in Fig 15). In fact, comparing the curves at a single parameter point is not sufficient to make a general statement.

We have added the following text to describe the concentrations used in the simulations.

(689-690) “The initial concentration of all other species was not changed. Here, the initial concentration of RShGS and Ras (common species between the modules) was 0.”

A critical shortcoming of the manuscript is that it does not even examine the composed model in detail. The goal of model composition is to enable the pieces of one model to influence the effects of another model. In this case, the goal of merging EGFR model with MAPK is to examine the effect of EGF concentrations on MAPK. What new types of analysis are now possible on your merged model that you couldn’t do with the unmerged models? What predictions does it do that confirm or contradict existing experiments or predictions from the many EGFR-MAPK models in the literature?

While the focus of this manuscript is not to make predictions to confirm or contradict existing experiments, we have demonstrated that our approach can merge together models in order to make predictions that were not otherwise possible.

We have added new results on the effects of EGF on downstream molecules in the MAPK pathway.

(691-706)

EGF concentration:

We examined the composed bond graph model of the EGFR-Ras-MAPK pathway to analyse and compare its functionality with other similar mathematical models. To do this, we investigated the effect of EGF concentration on MKPP. EGF initiates the EGFR pathway model and MKPP is the last terminal kinase of the MAPK cascade model. Fig 18 illustrates the behaviour of MKPP against various initial concentrations of EGF. Lower concentrations of EGF impose a delay in MKPP to reach its steady-state concentration which emphasises the role of EGF on the downstream species to the end of the MAPK cascade. Note that EGF = 0 nM does not terminate the functionality of the composed model considering that ATP hydrolysis and other intermediate species (such as RasGTP and RasGDP) fuel the subsequent steps and stimulate Ras. The time delay was also studied by Jurado et al. in [70], where lowering the EGF concentration triggered a delay in the MKPP response. Due to the different configuration of the constitutive models and the absence of EGF regulation by MKPP, the MKPP concentration in our composed model plateaus instead of descending as illustrated in [70].

We have also emphasized that our bond graph model now allows one to examine the effects of energy availability on the integrated system. This was not possible with existing models because ATP and related species were not included.

(744-750) Our bond graph composed model allowed us to investigate the role of EGF concentration and ATP on kinases while this was not possible on the individual models before composing them. While ATP hydrolysis is mentioned in the Kholodenko et al. EGFR model, it is not included in their computational model. Our composed bond graph model accounts for the missing energy sources that firstly provides a more biologically realistic model and secondly, enables us to examine hypotheses on ATP shortage in the EGFR-Ras-MAPK pathway.

Suggestions on related work

The innovation of this work is in the application of bond graphs to reaction composition in biochemical systems. However, it is being increasingly considered that the species complexity of biochemical cells will limit our ability to compose large models from small ones due to inconsistencies in species semantics across models (touched upon in this manuscript). The authors are encouraged to check out rule-based modeling, where species semantics are formally embedded in graph structures and the energy-based extension of rule-based modeling, which largely applies the same thermodynamic principles used in this manuscript and produces consistent models that obey detailed balance. References:

Rule-based modeling:

• Chylek et al. Physical Biology 2015

• Harris et al. Bioinformatics 2016

• Boutillier et al. Bioinformatics 2018

Energy-based rule-based modeling

• Ollivier et al PLoS Comp Bio 2010

• Sekar et al IEEE BIBM 2016

• Justin Hogg Ph.D. dissertation Chapter 2, University of Pittsburgh, 2013

• Thermodynamic Graph Rewriting, Danos et al. arxiv 2015

We have cited the suggested related works in the Discussion.

(834-844) Another widely-used formalism in computational biology is rule-based modelling, in which a series of rules describe the mechanistic details of biochemical processes, for example the random binding of multiple ligands to a receptor [29]. Recently, rule-based approaches have incorporated energetic parameters to ensure thermodynamic consistency [30–32]. Danos et al. showed that by computing the free energy of species formation and hence, free energy inequalities in reactions in rule-based models, one can verify whether a model satisfies the free energy constraints and detailed balance [33]. Moreover, rule-based languages such as BioNetGen [34] and Kappa [35] allow annotations. One advantage of bond graph modelling over rule-based modelling is that they can model multi-physical systems such as electrohpysiology, whereas rule-based approaches are limited to the biochemical domain.

Issue 4: Problems with Readability

Part of scientific communication is to emphasize clarity and directness. As it stands, the text is too verbose and unstructured and is not fully copy-edited. Some suggestions to make it readable:

Using passive voice unnecessarily makes sentences long and complicated. E.g., Instead of saying “modifying a network is easily performed by…”, you can say “to modify a network, one can…” It also makes things difficult to understand as to whether it was done automatically or manually, particularly in several places in the methods section.

Following the reviewer’s suggestions, we have made modifications to the text to improve the readability of the manuscript.

(436-437) To modify a network, one can insert 0 or 1 in the matrix or delete its corresponding ...

(360) We aim to minimise manual input through automation in model composition ...

(452-453) ...our framework will link it to its corresponding bond graph symbolic template.

(453-454) Thereafter, a function in our framework finds similar annotations in the models ....

(457) Ultimately, our framework produces the final model...

(335) Hence, we created a symbolic bond graph module ...

(517-518) Our framework integrates the modified connectivity matrices ...

(521-522) ... our framework inserts an additional 1 in the matrix.

Some words are overused and do not convey any meaning to the reader. For example, I fail to understand what is “generic” about the composition pipeline.

We have modified the text to make this clearer.

(140-142) “This is a generic model composition approach since the idea is domain-independent and could in principle be applied to models in different physical domains (e.g. electrical or mechanical).”

Some sentences are unnecessarily long without providing any additional meaning. E.g., instead of saying “we employed the idea of having symbolic bond graph templates”, you can simply say “we built symbolic bond graph templates”.

We have modified the text to address this issue.

(426) “... we built symbolic bond graph templates...”

(554-556) “We used our method to merge the modules within the MAPK cascade and between the pairs (EGFR pathway, Ras activation) and (Ras activation, MAPK). This yielded the bond graph configuration of the EGFR-Ras-MAPK signalling pathway.”

(433) “...we employed the concept of a connectivity matrix. � ...we used connectivity matrices.”

Figure captions need to provide sufficient information so that they can be read in isolation. This means the caption should briefly summarize how the figure is referenced in the paper. E.g. Fig 15 caption does not even mention which model is used.

We have modified the caption text to provide more information.

“Fig 13. The steady-state responses of the activated kinases for different input amounts in the MAPK cascade model.”

“Fig 14. Verification of the responses of activated kinases to Ras in the composed EGFR-Ras-MAPK bond graph model by comparing with the predicted steady-state responses in the MAPK cascade module. (A) Ultrasensitivity in the composed EGFR-Ras-MAPK bond graph model. The steady-state concentrations of the kinases are: MKKKP = 1.37 nM, MKKPP = 1054.37 nM, MKPP = 987.96 nM; (B) Predicted steady-state concentration of the kinases. The purple dashed line shows the concentration of Ras at t = 100 (s) in the composed EGFR-Ras-MAPK bond graph model. The predicted steady-state concentrations of MKKKP, MKKPP, and MKPP at Ras = 0.311 nM match with the ones in the composed EGFR-Ras-MAPK bond graph model.”

“Fig 15. Activation of terminal kinases with and without negative feedback in the composed EGFR-Ras-MAPK bond graph model. (A) Without negative feedback; (B) With negative feedback.”

“Fig 17. Effect of different levels of ATP concentration on activated kinases in the composed EGFR-Ras-MAPK bond graph model. (A) MKPP; (B) MKKPP; (C) MKKKP; (D) Steady-state concentration of MKKKP, MKKPP, and MKPP against relative ATP concentration. MKKKP concentration is also separately shown in a box due to its relatively small amounts compared to MKKPP and MKPP (initial concentration of common species: Ras=0, RShGS=0).”

The text in figures is extremely tiny relative to the size of the figure and unreadable. Effort should be made so that the figure looks good printed on paper.

We have increased the text size for Fig 8, Fig 9, Fig 14, Fig 15, and Fig 17D.

Many paragraphs begin with extra-long sentences that run on. E.g. lines 48-50 packs too many different concepts into a single sentence. This is unnecessary and can be broken down.

We have modified the text to make this clearer.

(101-103) “Here, as an extension to our previous work, we have incorporated annotations to bond graphs in a new platform. This platform allows us to automatically construct a composed model from annotated CellML files treated as modules.”

(264-266) “... we first removed the thermodynamically infeasible irreversible reactions from the network (for their different parameter definitions). Then, we applied the parameter balancing method described in....”

(554-556) “We used our method to merge the modules within the MAPK cascade and between the pairs (EGFR pathway, Ras activation) and (Ras activation, MAPK). This yielded the bond graph configuration of the EGFR-Ras-MAPK signalling pathway.”

Sections should begin with a brief paragraph summarizing the section. Each paragraph should have a first sentence summarizing the paragraph.

We have added summary paragraphs to several sections throughout the paper. Sections:

Automated model composition pipeline,

The prerequisites,

The generic approach,

Modules for EGFR-Ras-MAPK signalling: Bond graph models of the pathways,

The EGFR pathway module,

The Ras activation intermediate module,

The MAPK cascade module,

Verification of bond graph modules.

In many places, special terms are used before being defined, which is poor form. For example, physical feasibility in line 20 is defined only in line 26. Similarly, symbolic models in line 96 is used first and then explained. Semantics-based in line 40 has no explanation. Using “… will be explained later” is also poor form and shows lack of narrative.

We have modified the text to make this clearer.

Physical feasibility is now defined:

(29-31) “The resultant model often needs further adjustments to be executable and yet it might not follow the laws of thermodynamics and physics (such as energy, mass, and charge conservation) [12, 14]. This is referred to as physical feasibility.”

We have added a definition for symbolic modules below:

(361-365) “In this endeavour, we have provided some exemplar predefined bond graph models in which the parameters do not have any values. We call these predefined bond graph models as symbolic modules. Symbolic modules allow us to determine the parameters' values later where the annotated parameters from the CellML models would accordingly link.”

We have now also defined semantic annotation.

(80-82) “Semantic annotation is labeling the mathematical content of models or data with standard machine-readable descriptions [28]. These are crucial for the reusability and interoperability of models. ”

Reviewer #2: # A semantics, energy-based approach to automate biomodel composition

## Major comments

This paper describes a methdology of using Bond graphs to create complex composite models.

This is a topic many people (myself included) will want to learn more about, and the accompanying software could potentially be very useful. However, there are several issues with the manuscript text that will need addressing before this can be published.

Most notably:

The introduction and methods section are far more abstract than they needs to be. For example, the text frequently refers to "entities", "elements", "components" of bond models without making clear what these words refer to (or if they are the same), it mentions "similar annotations" without saying what this means, etc. Such parts of the text should be rewritten to be both clear and precise.

We have defined the words “entities”, “elements”, and “components” of bond graph models in the following text.

(52-53) Bond graphs represent systems as graphical representations which consist of a set of elements, i.e, components and junctions.

(53-58) Components represent physical entities (such as ions, complexes, genes, atoms in microscopic level and resistors, capacitors, dampers, and mass in macroscopic level) and are defined as general configurations of electrical, mechanical, or chemical elements. For instance, C components in bond graphs are charge storage components i.e. capacitors in electrical circuits, springs in mechanical systems, or chemical species in chemical reactions.

We have changed the term “similar annotations” to “identical annotations” to make it specific for the approach we use.

2. The problem statement is unclear. In several places it seems to be about coupling two (known) models, but parts of the methodology imply it is about identifying models that could potentially be coupled?

The focus of this paper is purely about coupling together models in systems biology automatically in an energy-based manner. Through this aim, only a part of our work is looking for models to be merged; this was required because bond graph models are not available for many biological systems.

We have stated the challenges in hierarchical model composition and how bond graphs (as an energy-based modelling approach) can address these challenges. Also, to automate this process, we are using semantic annotations. We have added the following text to clearly state the problem.

(7-9) An approach to automatically and hierarchically construct sophisticated models of biology that ultimately leads to the generation of biologically and physically correct models is currently missing.

Moreover, we have stated the bottlenecks in our previous work and our current approach to address them in the following text.

(89-103) An automated model composition approach significantly assists researchers in creating large-scale models from existing modules [36]. Shahidi et al. [37] introduced a general hierarchical model composition method by encoding bond graph modules in CellML and constructing a complex model using the SemGen merger tool [38]. The SemGen merger tool uses the biological semantics of the components in models to identify and interpret them unambiguously. Although this method facilitated the integration of annotated bond graph models, bottlenecks might arise when a modification in the CellML bond graph modules is needed (modellers must know the bond graph conservation laws). Moreover, it required adding auxiliary variables as ports to each module and connecting them manually using the semi-automated SemGen merger tool. While annotations are readily incorporated into bond graphs, using annotations in model composition has not been conducted in this context.

Here, as an extension to our previous work, we have incorporated annotations to bond graphs in a new platform. This platform allows us to automatically construct a composed model from annotated CellML files treated as modules.

As the problem statement is an abstract one, an example (perhaps a simple toy problem in addition to the real-life example shown) should be introduced early on (i.e. in the introduction) and used to explain the problem statement. In the current manuscript a general outline of a solution is being sketched long before the reader has been given the tools to understand the problem this aims to solve.

We have addressed this issue by giving an example in Fig. 2. We have also moved the bond graph introduction in Section 2.1 to Introduction to give the reader the essential tools to understand the problem.

The level of detail varies considerably throughout the paper, lots of words are devoted to fairly simple processes such as creating a connectivity matrix and removing components that have been deemed identical, but very little is said about more complex steps, e.g. how annotations are compared, how "thermodynamically consistent parameters" are created from inconsistent ones etc.

We have dealt with the issue of connectivity matrices in Key change 2.

We have also shrunk the extensive explanation on removing identical components and mainly discussed about it in Steps 3, 4 of the framework description as follows.

(478-483)

3. In this step, only one entity is kept from each group of identically annotated mergeable entities and the rest is deleted.

4. In this step, our framework links the modules at each merging point to integrate them. A link is a bond in bond graph terminology and can be added to the system by inserting a 1 in the connectivity matrix (used in the current approach) or adding a syntax to incorporate a new bond between the modules.

We have stated that our framework merges the components that have identical annotations (lines 454, 471, 478, 484, 487, 489, 511, 777).

We have explained the equations in the following text and referred the reader to an example for further reading.

(266-271) ... we applied the optimisation method described in [49] to the remaining reversible reactions. In brief, by taking logarithms on the constraints of each reaction (k^+ = κ∏_i▒K_(r_i ) and k^- = κ∏_j▒K_(p_j ) ), the relationship between the kinetic and bond graph parameters can be expressed as a linear matrix. The reader is referred to an example on the generation of the linear matrix of thermodynamic constants in Appendix B for [49].

## Detailed comments ### Line 17

The main challenges in hierarchical model composition include: (a) incompatible code languages, (b) different modelling frameworks, (c) post-composition adjustments, and (d) physically implausible resultant models.

This needs a slower and more careful explanation. What is the difference between an "incompatible language" and a "different framework"? What are "post-composition adjustments". Can you give an example?

We have explained the terms "incompatible language", "different framework", and "post-composition adjustments" in the following text with examples.

(19-25) The main challenges in hierarchical model composition include: (a) incompatible code languages (using dramatically different modelling languages such as Object-oriented, graphical, and continuous/discrete time), (b) different modelling formalisms (such as using rule-based modelling, differential equations, neural networks, and Boolean networks), (c) post-composition adjustments (manual edits in mathematical equations, rules, or parameters to make the composed models biologically sensible [8])

### Line 20

A majority of model integration platforms ... require compatibility between the languages and modelling frameworks

In contrast, the resultant model still needs further post-merging code-wise adjustments to be executable, yet it might not represent a physically feasible model.

How are these two statements "In contrast"?

We have modified the text to make it clearer.

(26-31) “While existing model integration platforms, such as the SBML Hierarchical package [9] and PySB [10] can resolve issues with code and modelling formalism compatibility, they are limited to the biochemical domain. The resultant model often needs further adjustments to be executable and yet it might not follow the laws of thermodynamics and physics (such as energy, mass, and charge conservation) [12,14]”

Are "post-merging code-wise adjustments" the same thing as "post-composition adjustments"?

We changed “post-merging code-wise adjustments” terms into "post-composition adjustments" throughout the manuscript.

### Line 26

One solution to these issues is using a hierarchical modelling approach (to help with the post-composition adjustments) and an energy-based modelling framework (to guarantee a physically plausible composed model).

I'm not sure if you are (A) stating something I should already be able to understand at this point, of if you are (B) saying this will be explained in the upcoming text. If A, then it needs a lot more explanation, if B then please rewrite the text so that this is clear.

We have added text to the introduction to explain how energy-based models help to ensure that models are compliant with the laws of physics.

(31-40) Several formulations and frameworks have been developed to ensure biochemical models follow the laws of thermodynamics in particular ( [15-17]) but most of them are purely mathematical and are difficult to implement for model composition. Furthermore, most of the model composition tools are not applicable to multi-physics systems and cannot be generalised to more complex biological systems. One solution to these issues is combining a hierarchical modelling approach (to help with the post-composition adjustments) with an energetic and multi-physics framework that explicitly models energy to ensure adherence to the laws of physics and is executable in multi-physics modelling. The bond graph approach addresses these issues.

### Line 53

The annotated data from the CellML modules are then extracted and assigned to their equivalent bond graph components.

By "data" do you mean data (i.e. parameter values?), or e.g. equations, units, etc. ?

And what is a "component"? Should data not be connected to data, or variables to variables? Please rewrite to make statements such as these less abstract and more clear.

We have replaced the word “data” with “parameters” modified the text to make it clearer.

(106-111) The annotated parameters from the CellML models are then extracted and their values assigned to their equivalent bond graph parameters. Since the equations of a bond graph can be automatically generated from their network structure, we only need to parameterise them using the parameter values from the CellML files. Thereafter, any common biochemical, biological, or physical entities among the modules are identified and merged to render a composed model.

We have defined the word “component” with an example in the following text.

(53-58) Components represent physical entities (such as ions, complexes, genes, atoms in microscopic level and resistors, capacitors, dampers, and mass in macroscopic level) and are defined as general configurations of electrical, mechanical, or chemical elements. For instance, C components in bond graphs are charge storage components i.e. capacitors in electrical circuits, springs in mechanical systems, or chemical species in chemical reactions.

### Line 55

We demonstrate this by an example where a bond graph model is constructed from its constitutive modules, i.e., the Epidermal Growth Factor Receptor (EGFR) signalling pathway and the Mitogen-Activated Protein Kinase (MAPK)

Given this example, I'm a bit surprised you opted for CellML over SBML. This may warrant some discussion.

The main goal of this manuscript is to find a way for CellML model composition using bond graphs since CellML models of biochemical reactions lack some extra information that SBML models already have. Yet in broader applications, CellML can deal with models that are not purely biochemical. We illustrated a solution to overcome the shortcomings of CellML models while this could be done in a more automatic manner for SBML models as we have discussed in the following text.

(782-791) Here, we have selected models encoded in CellML because CellML can deal with models that are note purely biochemical, but the approach can be applied to models in other formats, such as SBML, as long as they can include a semantic description of the system being modelled. While symbolic templates are required to apply our approach to CellML models, this step is not required for SBML models. This is because the reactant(s)-reaction-product(s) relationships are explicitly defined within SBML models while this information is not clearly provided in CellML models. In this paper, we aimed to illustrate a possible way to convert CellML biomodels into bond graphs and automatically compose them. In future, we intend to apply the same method on SBML models in a more automated way.

(814-816) The ultimate goal of applying our model composition method is to provide a foundation for future tool developments to convert any arbitrary CellML/SBML model into bond graphs and then convert it back to a CellML/SBML file.

### Line 79 Materials and methods

Please make this whole section less abstract, e.g. by starting with a detailed example of two models that you wish to connect, so that the reader has some idea what you mean by the various "entities" and "elements" that appear in the text.

We have added an example of connecting two reactions in bond graphs in Fig 2 and explained the equations in S1 Text.C.

(204-209) “Fig 2 illustrates an example of composing together two reactions in bond graphs. Our framework recognises that the Ce : C component is the same in both reactions and merges them. When two components from two modules are merged, the conservation equations at their corresponding `0 : u' junction changes. S1 Text.C details the conservation laws and constitutive equations in each reaction separately as well as in the case where both reactions are combined to create the composition.”

### Line 86

automatically rewiring the connections between components and modules

What are "components" in this sentence? The introduction should probably give a very brief explanation of what a bond graph looks like (maybe a figure?) and explain what the difference between e.g. a bond graph "module" and a bond graph "component" is. (Is it just a variable? Then please say so)

We have defined the term “component” in the following text.

(53-58) Components represent physical entities (such as ions, complexes, genes, atoms in microscopic level and resistors, capacitors, dampers, and mass in macroscopic level) and are defined as general configurations of electrical, mechanical, or chemical elements. For instance, C components in bond graphs are charge storage components i.e. capacitors in electrical circuits, springs in mechanical systems, or chemical species in chemical reactions.

We have given a brief description of bond graphs in the Introduction section (lines 47-69).

We have given examples of bond graph modelling in Fig 1 and Fig 2.

We have defined the term “bond graph module” in the following text.

(370-371) In the current work, when a symbolic bond graph template is parameterised we call it a bond graph module.

### Line 92

Our method to expand and integrate biosimulation models provides the foundation for further developments in an open-source environment based on energy-based modules and automation to minimise manual input.

I'm not sure what this sentence means? Without the "to minimise manual input" it reads like part of an abstract or conclusion section. But are you just trying to say you want to minimise manual input when composing models?

We have modified the text to make it clearer.

(360-361) We aim to minimise manual input through automation in model composition while using energy-based modules in an open-source environment.

### Line 94

In this endeavour, we have provided some exemplar symbolic bond graph models to which the annotated parameters from the CellML modules would accordingly link.

Does this mean that any time I want to link 2 models, I need to write 2 templates (that presumably match the 2 models I want to link), and then write a connection matrix describing how my two templates are linked? If it's so hard-coded, then why not just write down which variables I want to connect straight away?

We have addressed this in Key change 2. While it is possible to define a model composition approach by writing down the variables one wants to connect, we found the connectivity matrix to be helpful in implementing the approach using software.

### Line 99

The suitable bond graph model is then automatically selected from the list by

Which list?

We have modified the text to make it clearer.

(365-367) The suitable bond graph template is then automatically selected from a list of symbolic bond graph templates....

### Line 100

identifying specific annotated components in the CellML modules

What exactly is a "component" here? Is it a CellML component (i.e. a container of variables)?

We have modified the text to make it clearer.

(367-368)...by identifying specific annotated parameters (for example the species specific constants) in the CellML models.

### Line 117

Ontologies

I'm not sure who the intended audience for this paper is, but if it's "anyone who wants to compose a big model" you should probably add a line saying what an ontology is, or maybe add a reference.

We have explained “ontologies” in the following text.

(391-408) An ontology is a semantic resource of standard notions and vocabularies of species, structures, and observations in terms of Resource Description Framework (RDF) triples (https://www.w3.org/RDF/). RDF is a standard mechanism to describe and interchange data on the Web and an RDF triple is a subject–predicate–object statement that describes the properties of an entity, often using ontologies [41, 42]. For example, the RDF [OPB00340 - CHEBI29103 - FMA70022] reads [concentration-potassium-extracellular space] which specifically describes the physical property and location of an entity. Ontologies are useful tools to add meaning to different parts of models to avoid any ambiguous interpretations [43]. Depending on the area of biomedical science in which the researchers annotate their models, one or various reference ontologies might be used. For the scope of this publication, we used the csv files for the Ontology of Physics for Biology (OPB) [44] and the Gene Ontology (GO) [45], downloaded from the following links:

– OPB: https://bioportal.bioontology.org/ontologies/OPB;

– GO: https://bioportal.bioontology.org/ontologies/GO.

The OPB is a reference ontology for physical principles such as chemical concentration, electrical capacitance, temperature, and fluid volume. The GO provides descriptions for molecular biology such as gene products, biological sequences, and molecular activities.

### Line 121

We suggest downloading CHEBI, FMA, OPB, and GO ontologies from the following links:

Please explain what domains these 5 are, and why they are appropriate (are they appropriate for everything, or just for your two pathway example?)

We used the OPB and GO ontologies for the particular case study in this paper. Why do I need the other two then?

We were suggesting downloading these 4 ontologies to aid model composition in other contexts. Since they were not required for this study, we have removed the requirement of downloading the other 2 ontologies in this manuscript.

Please explain what these ontologies will be used for. Do they provide labels identifying unique variables? Variable types e.g. "is a concentration"? Will we be inferring properties of our variables using these ontologies e.g. units?

We have explained this below:

(406-414) The OPB is a reference ontology for physical principles such as chemical concentration, electrical capacitance, temperature, and fluid volume. The GO provides descriptions for molecular biology such as gene products, biological sequences, and molecular activities. Due to the limited size of uploaded files on GitHub, the required reference ontologies for the current model composition (OPB and GO) are not provided on our GitHub repository. We stored the ontologies locally to interpret the RDFs and use the interpretations where the user needs to make a decision based on the annotations but the approach can be reduced to a framework in which the annotations are only read and compared in RDF format and the interpretations are not given to the user.

### Line 128

The generic approach

Do you mean generic, i.e. this section explains why the approach works on any model, or "general approach", so that this section presents an outline of the approach taken in the paper?

We have added the following text to explain the term “generic approach” in our work.

(140-142) “This is a generic model composition approach since the idea is domain-independent and could in principle be applied to models in different physical domains (e.g. electrical or mechanical).”

### Line 131

the parameters in the models

Please explain in the intro what you expect to find inside each model (should this say "module"?), e.g. variables, parameters, etc. and use these terms throughout instead of "elements", "entities", "components" etc.

Is a distinction made between variables and constants/parameters?

We have defined the terms “components”, “elements”, and “entities” in Major Comment #1.

We have described that our framework extracts from the CellML models and the reason why we do not need to extract variables from the original models.

(106-110) The annotated parameters from the CellML models are then extracted and their values assigned to their equivalent bond graph parameters. Since the equations of a bond graph can be automatically generated from their network structure, we only need to parameterise them using the parameter values from the CellML files.

### Line 138

The number of rows and columns each equals the number of elements in the network in total

What are the "elements" and the "network"? Are elements modules? Variables in a module?

We have defined the “bond graph elements” in Major Comment #1. We have also rephrased the following sentence to make it clearer.

(429-430) Here, the number of rows and columns each equals the number of bond graph elements of a system [46].

Figure 1 and the accompanying text are far too vague and focus on the wrong aspects: Readers who are interested in this topic will have come across the topic of a network and a connectivity matrix before, and will be wondering instead how it applies to your problem of tying two pathway models together. This second bit is not explained in any detail here.

If they are variables, then what are we expecting from the on

We have addressed this comment in Key change 2.

As explained in the answer to the comment for ###Line 138, the rows and columns of a connectivity matrix do not represent the variables but the bond graph elements.

### Line 142

"facilitates computational measurements"

What is a "computational measurement" and how is it facilitated by a binary representation of these connections?

We have removed this term from the sentence to avoid any confusion because although reducing the computational cost is one of the advantages of using connectivity matrices, it is not particularly applied in our bond graph model composition. We have modified the following text.

(433-436) While not essential to our methodology, we chose this approach because binary representation of models clearly shows the connections and gives the minimal required details to define a network which can be exported to other tools and software for further analysis [46].

### Line 144

Modifying a network is easily performed by inserting 0 or 1 in the matrix

In two places, to preserve symmetry? Given the zero diagonal, the symmetry (which the user needs to manually maintain), and the presumed sparseness of this matrix in real applications (if elements are variables, as I suspect at this point in my reading), this is not a very compact or easy-to-use representation.

We have answered to this comment in Key change 2.

### Line 152

In a `black box' composition approach, the elements of the modules are not accessible and only the input/output variables can be used for coupling

It's not clear to me why we'd expect the modules to have clearly defined "inputs" and "outputs" at this point, so I'm struggling a bit to see why you need to point out that any variables can be connected.

To make the sentence clearer, we have explained why we need a white box approach in the context of biophysiology instead of the conventional black box approach in engineering.

(439-442) To identify the merging points between the modules we used a ‘white box’ approach. In this approach all or a group of the bond graph components in the modules are mergeable. In a ‘black box’ composition approach in contrast, only the components predefined as inputs or outputs are accessible [37, 47].

### Line 154

almost all the entities can be regarded as merging ports

What is a "merging port"? And do we need this bit of jargon or can it be stated more simply?

We have modified the sentence to make it clearer.

(442-443) In coupling biological models, all entities are mergeable,...

### Figure 2

What are "similar annotations"? Do both models have the exact same set of RDF properties? The exact same unique labels? Is it OK if only a subset matches?

We have replaced the term “similar annotations” with the term “identical annotations” to emphasise that a subset matching in annotations is not enough.

This figure seems to show 2 preselected models, but there was also mention of a "list", where does that fit in?

The list refers to the list of stored bond graph templates mentioned in the answer to the comment for ### Line 99.

(365-367) The suitable bond graph template is then automatically selected from a list of symbolic bond graph templates....

In this particular study, we populated the list by either using existing bond graph models or generating them from existing systems biology models

How is the "repository" section related to the two models?

In this paper, the bond graph models were stored locally. Accordingly, we have replaced the term “repository” with the term “stored files” to be distinguishable from online repositories such as PMR.

We have explained how the stored files relate to the two models in two steps of the flowchart. The ontologies are used in step 1 and the bond graph modules & connectivity matrices are used in step 4 in the following text.

(466-470) 1. A function in our framework extracts the annotations and values of the CellML models. If exact matches of annotations are not detected between the models, a warning is given. The user should check the models to see if they are appropriate for composition. If there are matching annotations, two pathways are made available: composition process and value allocation.

(480-483) 4. In this step, our framework links the modules at each merging point to integrate them. A link is a bond in bond graph terminology and can be added to the system by inserting a 1 in the connectivity matrix (used in the current approach) or adding a syntax to incorporate a new bond between the modules.

### Sections 2.2 & 2.3

These are very helpful, but are required reading to understand e.g. 2.1, so some re-structuring is necessary!

We have moved the introductory text to bond graphs and its principles to the Introduction section (lines 47-69). We have also re-structured the manuscript and moved Sections 2.2 & 2.3 before 2.1.

### Line 216

Is the "." in the equation meant to be a \\cdot? Also, if a multiplication symbol is used here it should be present in the other two multiplications (R*T*ln(K_q*q) or RT ln(K_q q))

We have edited the equations so that dots are consistently used for multiplication.

What is u_q in this equation?

We have replaced “u_q” with “u” which is defined in the following text.

(165) The chemical potential is u (J mol−1), stored within the biochemical species, ...

### Line 222

A reaction represents a dissipative process, which in the case of mass-action kinetics... This could do with some clarification. Are all reactions "dissipative"? Mass-action kinetics are usually phrased in terms of reversible processes.

We have explained a dissipative process in bond graphs and modified the text to be clearer on mass-action kinetics.

(178-181) In bond graphs, a reaction represents a dissipative process where chemical energy is lost in the form of heat [52]. In the case of reversible mass action kinetics, a reaction is defined in bond graphs by an Re component with the constitutive relation v = κ(e^(u_r/RT) -e^(up/RT)) (Marcelin–de Donder equation),

### Line 250

and the bond graph model of MAPK cascade is taken from the work by Pan et al. [21]. In this paper, the bond graph representation of the reference MAPK cascade was available.

Here, we detail how bond graph models of these systems were constructed. The second sentence repeats the first. But then the third contradicts it?

We have modified the sentence to make it clearer.

(229-231) The bond graph model of the MAPK cascade is adopted from the work by Pan et al. [11]. Here, we detail how the bond graph modules of these systems were constructed.

The bond graph model of the MAPK cascade was developed by Pan et al. using the black box approach but we have used a white box approach and annotated the cycles to automate the model composition. We have explained this in the following text.

(337-339) We used the `white box' approach rather than the `black box' approach in Pan et al.'s work. We also annotated each cycle separately to automate the model composition.

### Figure 4

The network adapted from Missing is/was

Fixed.

### Line 294

We applied curve fitting to estimate the reaction rate constants for the irreversible steps (κ4

, κ8 , & κ16 ). We obtained the time-dependent behaviour of the contributing species in steps 4, 8, and 16 (required for curve fitting).

If the second sentence is required for the first, it should come first.

We have modified the text to make it clearer.

(279-282) We obtained the time-dependent behaviour of the contributing species in steps 4, 8, and 16 from the reference CellML model for the EGFR pathway and applied curve fitting to estimate the reaction rate constants for the irreversible steps (κ4, κ8, & κ16).

These two steps contain a lot of work. Much more explanation is needed to make this reproducible.

We have explained these two steps with an example in S1 Text.B and added the following text.

(282-283) As an example, this procedure is shown in S1 Text.B for step 4.

### Line 299

some of the reversible reactions do not satisfy detailed balance

I had assumed this would be guaranteed by the bond graph methodology, some comment or a reference could be useful here.

We were referring to the reversible reactions in the original model, where thermodynamic consistency is not guaranteed. We have modified the sentence to make it clearer and explained it further in the following text.

(285-287) ..., some of the reversible reactions in the original model do not satisfy detailed balance. Since the bond graph parameters are inferred from the original model, an approximation with the least square error is made to generate the closest fit to the data while adhering to detailed balance constraints.

### Line 300

S1 Table. S2 Table and S3 Table --> Table S1. Table S2 and Table S3

According to the PLOS ONE journal, the supplementary materials must be named and referred to as S1 Table, S2 Table,...

Reference: https://journals.plos.org/plosone/s/supporting-information

### Lines 339, 343, 344, 345

model of EGFR-Ras-MAPK

and MAKP cascase

Since MAPK cascade includes

Many missing "the"s throughout the text

We have added the missing “the”s.

### Lines 350

Due to the limited size ...

This would make more sense in section 2.1.1

We have moved the text to section 2.3.1 (lines 409-411).

### Line 354

for inconsistencies among the values of similarly annotated components and parameters. What is meant by "inconsistencies" here? Please be precise and give examples.

We have replaced the term “inconsistencies” with “mismatches” to make the text clearer. We have explained it in the flowchart steps with a general example.

(487-491) This step prompts the user to choose a value for the identically annotated entities found in step 5. For instance, if a chemical species is present in more than one model (identically annotated in all the models) and has different initial concentrations, the user is asked to select one of the values or insert a new one for that specific chemical species.

We have also explained a case-specific example in value mismatches in the Discussion section (following text).

(735-743) Merging components across models might raise mismatches in their parameters. Here, RShGS in the EGFR and Ras activation models, and Ras in the Ras activation and MAPK models were merged. These species have different initial values and/or thermodynamic constants in their corresponding models. In such cases, our framework flags different values for same species. This is solved by asking the user to either select one of the values or insert a new value for the flagged parameter. Since the user may not have the relevant expertise, we aim to provide users with an evaluation of the ambiguous parameter in multiple models available on PMR in the future. This will give the user a better awareness of the range of values for uncertain parameters.

### Line 510

For biochemical reactions, if the parameters are thermodynamically inconsistent, they are converted into bond graph compatible ones.

Where is this process explained?

We have rephrased the text to make it clearer.

(712-714) Thereafter, all the biochemical reactions (reversible or irreversible) in the reference models are converted into bond graph compatible ones (Section 2.2.1).

### All figures

This could be just a proof issue, but the figures are all rasterised, and at a low resolution.

The separately available figure files (tiff) are all of high resolution.

Reviewer #3:

In their manuscript “A semantics, energy-based approach to automate biomodel composition”, Shahidi et al. describe a new framework for combining biochemical network models based on a mathematical representation in the form of bond graphs. They describe the method and how it is designed to guarantee thermodynamic correctness of the resulting models, and illustrate the procedure with an example case, combining two existing signaling pathway models into a larger, consistent model.

The manuscript is extremely well and clearly written and was a pleasure to read. I think that the method will be very useful. Since it has already been implemented for CellML models, and an implementation for SBML models is conceivable, it has the potential for broad applications in biochemical pathway and network modeling.

I did not check the code.

I have no substantial criticisms. Below I list a few minor points to improve the manuscript, mostly about clarification of words. I leave it to the authors to decide which of these points they would like to account for.

Finally, I would like to express my condolences to the authors for the passing of Professor Crampin. It must have been painful for you to complete the work without your colleague.

----------------------------------------------------------------------------------------------------------------

Title: The term “bond graph” could be mentioned in the paper title.

We decided not to include the term bond graph in the title since we wanted to emphasise the utility of incorporating energy into models of biochemistry rather than focusing on the specific methodology.

2: “Physicians”: I think it’s a dream of modelers that their models will be used by physicians, but I think we’re usually still far from this.

We have removed Physicians from the sentence.

27 (and elsewhere): “Energy-based modeling framework”: since “energy-based” can mean many things, it would be good to explain this term very clearly and explicitly early on.

We have modified the following text to be clearer.

(38-40) .... an energetic and multi-physics framework that explicitly models energy to ensure adherence to the laws of physics and is executable in multi-physics modelling.

60: “provides a reliable and consistent framework that first conserves energy” again, not very clear. The need to satisfy thermodynamic Wegscheider conditions does not exactly arise from energy conservation (first law of thermodynamics), but is also related to the second law, and basically the fact that Gibbs free energy is a thermodynamic potential. So “conserves energy” is a bit imprecise, and maybe not very well understandable.

We have modified the following text to be clearer.

(116-118) ... provides a reliable and consistent framework that first tracks energy transfer; secondly ensures that reactions can only operate in the direction of decreasing chemical potential; ...

101: “Due to the hierarchical feature of bond graphs,” also, “hierarchical” is not very clearly explained. I guess it refers to the usage of symbolic templates and (“inside” them) the actual network-like model structures. But these are just two layers, and “hierarchical” sounds like there were many hierarchy layers.

Although there are two layers of hierarchy in the current model composition, we also benefit from another property of hierarchical composition which is model integration, in general (following text).

(146-147) Bond graphs allow more than two levels of hierarchy which supports model integration.

109: “A common effort between the components is shown by a ‘0’ junction, while a ‘1’ junction shows a common flow, and the energy is conserved and travels between components bidirectionally through bonds (shown by harpoons).” This explanation is not very easy to get, please explain in more detail (e.g. mentioning little examples?)

We have explained this with an example in the following text. We have given the equations in S1 Tect.C instead of the main text to not divert much from the flow of the manuscript.

(204-209) “Fig 2 illustrates an example of composing together two reactions in bond graphs. Our framework recognises that the Ce : C component is the same in both reactions and merges them. When two components from two modules are merged, the conservation equations at their corresponding `0 : u' junction changes. S1 Text.C details the conservation laws and constitutive equations in each reaction separately as well as in the case where both reactions are combined to create the composition.”

147: “Notice that the connectivity matrix is symmetric” at first, not clear if the connectivity matrix is always symmetric (by definition) or just happens to be symmetric in this example application.

We have explained this in the following text.

(430-431) Connectivity matrices are symmetric for undirected (bidirectional) networks and asymmetric for directed networks.

Section 2.2 explains the usage of bond graph modeling of biochemical reactions, but it remains unclear how parameters, rate law formulae, and other data attached to the nodes will be treated during model composition. Is an enzymatic rate law a property attached to the reaction node?

We believe this has been addressed in the comment on line 109.

What if in model combination, the same reaction is described (in the two models) in different ways, e.g. with or without an allosteric inhibitor? Is this just a choice between data attached to the reaction node, or a choice between different structures of the bond graph?

This type of issue will be dealt with under post-composition adjustments. We argue that this type of issue arises from different applications of certain models and the scope of the work. Our hierarchical composition approach provides a solution to this problem by providing the option of readily removing or adding components to the composed model. We have added the following text to highlight this issue.

(758-762) There are situations where different representations of a certain reaction or process are available through the literature. For example, a reaction might be described with or without an allosteric inhibitor. This arises from different applications for different versions of a model and the scope of the studies. In such cases, one has to decide which version of the model they want to use in model composition.

247: “As such, we consider Ras protein to be the mutual species in both pathways.” Would there be additional complications if models are connected by several species (e.g. closing thermodynamic loops that were not present in the initial models, but in the combined model”?

This is a problem in traditional kinetic models as one would never know that they have closed thermodynamic loops. Whereas in bond graphs, it would be impossible to generate such infeasible models. The modeller would be forced to make changes to one or more of the initial models, which was the case in this study.

(70-79) “In the context of biochemistry, modellers widely use traditional kinetic models. However, in general, kinetic models are not thermodynamically consistent (i.e. energy conserving) unless the parameters satisfy certain detailed balance constraints. Specifically, detailed balance constraints are required to ensure that biochemical loops have zero flux (i.e. dissipate no energy) at equilibrium. These detailed balance constraints become increasingly difficult to derive as biochemical networks become larger. Because bond graph models assign a chemical potential to each species, they automatically adhere to detailed balance constraints. Hence, parameters can be modified without violating thermodynamic consistency [27]. This ensures that model composition respects the constraints on thermodynamics for biochemical systems.”

In the bond graph model, saturable rate laws were described by irreversible Michaelis- Menten kinetics. Would it also be possible to use reversible Michaelis-Menten kinetics? Or do reversible reactions have to be modelled by mass-action kinetics, for mathematical reasons? (I guess the answer to the latter question is no; maybe it would be good to point this out?)

We have added the following text to explain the possibility of using reversible Michaelis- Menten kinetics and how we can represent it in Bond GraphTools.

(200-203) Reversible Michaelis-Menten kinetics can also be represented using bond graphs [22]. However, because the default Re components in BondGraphTools follow the mass actions kinetics, we have chosen to approximate Michaelis-Menten kinetics using elementary mass action reactions (see [12]).

“The chemostats” - I guess the word refers to species with fixed and given concentrations (sometimes called “external metabolites” in kinetic modelling)? Since the same word is often used in biology in a different meaning (a device with fixed and given concentrations in the INFLOWING medium, not in the bioreactor itself), it would be good to add a short explanation (just say what “chemostat” means in this work).

We have explained “chemostats” in the following text.

(175-177) Following the definition in [51], species with fixed concentrations are called chemostats (CS) in bond graph terminology. Such species have a constant chemical potential [22].

220: “Hence, a symbolic bond graph module for a cycle could be created and reused.” is “could” past or conditional? Please rephrase.

We have rephrased the following sentence to be clearer.

(335-336) Hence, we created a symbolic bond graph module for a single cycle and reused this template for the other four cycles.

545: “As an improvement to our previous approach [17], the present framework overcame the aforementioned limitations: ..” these are indeed great achievements. Congratulation!

We thank the reviewer for their congratulations.

583: “Eventually, the generated mathematical equations in the bond graph environment can be converted into CellML for simulation and reproducibility.” Would the results models again have the form of a “normal model”, or do they still look very “bond-graph like”, e.g. with non-biological components representing junctions? (And I have the same question for a (potential, future) conversion to SBML models).

We have explained the form of the converted bond graph model in CellML in the following text.

(822-825) The regenerated bond graph model encoded in CellML will lose its graphical structure and the model will be expressed in a system of ODEs. Since we can convert the exported bond graph ODEs into MathML format, the biochemical equations would be also expressible in SBML.

We have explained the potential conversion to SBML models in the following text.

(782-791) Here, we have selected models encoded in CellML because CellML can deal with models that are note purely biochemical, but the approach can be applied to models in other formats, such as SBML, as long as they can include a semantic description of the system being modelled. While symbolic templates are required to apply our approach to CellML models, this step is not required for SBML models. This is because the reactant(s)-reaction-product(s) relationships are explicitly defined within SBML models while this information is not clearly provided in CellML models. In this paper, we aimed to illustrate a possible way to convert CellML biomodels into bond graphs and automatically compose them. In future, we intend to apply the same method on SBML models in a more automated way.

References: “Paynter H. Analysis and Design of Engineering Systems/Paynter HM;.” reference incomplete

We have fixed the reference (reference [18]).

Fig 7: The fonts are a bit small

We have increased the text size for Fig 8 in the new version of the manuscript (Fig 7 in the old version of the manuscript).

Fig 15: in the legend for subfigure C, maybe mention the little dip in the curve and how it is caused?

Since we have added a third module to our composed model, some simulations in the new version differ from the old version of the manuscript. The dip in Fig 15 (old version) is not noticeable in Fig 17 (new version). Instead, we have explained the appearance of a peak in Fig 15.B (new version) in the following text.

(663-666) The peak in the activation of MKKKP in Fig 15.B corresponds to an initial rise in MKKKP concentration from the upstream MKKK where it is immediately consumed by the downstream species to activate MKKPP and MKPP.

Finally, I would like to mention some “historical predecessors” of this work, which tried to establish ways to build biochemical network models from simple “standard elements” while taking thermodynamic feasibility into account.

Ederer M. and Gilles E.D. (2007), Thermodynamically Feasible Kinetic Models of Reaction Networks, Biophysical Journal, Volume 92, Issue 6, 1846-1857,

Stanford N.J., Lubitz T., Smallbone K., Klipp E., Mendes P., Liebermeister W. (2013), Systematic construction of kinetic models from genome-scale metabolic networks, PLoS ONE 8(11): E79195

While the present work is certainly more elegant, it may make sense to cite these earlier works.

We have added the above-mentioned works in References 15 & 16 and cited them in the following text.

(31-34) Several formulations and frameworks have been developed to ensure biochemical models follow the laws of thermodynamics ([15–17]) but most of them are purely mathematical and are difficult to implement for model composition.

Furthermore, the adjustment of parameters to become thermodynamically feasible seems to resemble parameter balancing (which is used in the Stanford et al paper and could also be cited).

We have compared our approach to parameter balancing in the following text.

(763-770) Our parameter optimisation method is similar to the parameter balancing method utilised by Stanford et al. in [16] in using the thermodynamic constants. While parameter balancing is based on assumptions about typical ranges of parameters and probability distributions [73], our parameter optimisation technique concerns the replication of the model performance with the least square error. In the future, we can utilise other techniques such as parameter balancing in our approach to incorporate the experimentally measured values of parameters and create more realistic bond graph models.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Lutz Brusch

10 May 2022

PONE-D-21-39103R1A semantics, energy-based approach to automate biomodel compositionPLOS ONE

Dear Dr. Shahidi,

Thank you for your careful revision. After feedback from all three original reviewers, I decided for a final minor revision by text edits that should further improve the presentation of your method and results. Therefore, I invite you to submit a revised version of the manuscript that may address the new suggestions by the reviewers, see below. The final decision can then be taken quickly after your re-submission.

Please submit your revised manuscript as soon as possible and by Jun 24 2022 11:59PM the latest. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Lutz Brusch, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have improved the manuscript significantly and I recommend publication. They have addressed the majority of the points I raised, including what I perceived as critical flaws.

In terms of results, they have added a detailed Ras-Sos model to complement their other two models, ultimately enabling a biochemically meaningful model composition. They have added simulation results from the composed model demonstrating the utility of composition. These improvements address the major complaint I had with the content of the previous manuscript.

In terms of exposition, they have more clearly explained the problem of thermodynamic consistency in reaction models and how bond graph tools can address them. They have explained bond graph terminology more clearly, they have added more explanation to bond graph composition, and also explained how bond graph models relate to ODE models and simulation. They have clarified semantic annotations, ontologies and elucidated the steps of the composition pipeline. They have more directly addressed potential problems with their approach, and discuss future directions. They have discussed related work like energy-based rule-based models.

In terms of narrative and readability, they have structured the manuscript so it reads better, with introduction and summary sentences for each paragraph and section. They have also addressed run-on sentences, passive voice usage and other copy-editing issues. They have improved some of the figures and added more detail to the captions.

Reviewer #2: The new version of the text is much improved and adequately addresses my major concerns. The github repo looks good too, but could perhaps benefit from a link to BondGraphTools (or a mention that it's pip installable).

Reviewer #3: Dear authors,

Thank you for answering my comments and for the modifications you made. You addressed all the questions I raised, but some of your points could still be explained more clearly. I won't insist on any one of them. I'm listing them below for your information and leave it to you to decide which changes you would like to make.

Sincerely,

Your reviewer #3

--------------------------------------------------------------------------------------

"We have modified the following text to be clearer.

(38-40) .... an energetic and multi-physics framework that explicitly models energy to

ensure adherence to the laws of physics and is executable in multi-physics modeling."

-> For me, the term "multi-physics framework" is still a bit vague, you could briefly explain what you mean by it in this specific context; also "energetic" and "models energy" will not be clear to some readers, because "energy", without further specification, can mean many things.

We have modified the following text to be clearer.

(116-118) ... provides a reliable and consistent framework that first tracks energy transfer;

secondly ensures that reactions can only operate in the direction of decreasing chemical

potential; ...

-> Again, "energy transfer" is not clear and should be explained.

Section 2.2 explains the usage of bond graph modeling of biochemical reactions, but it remains

unclear how parameters, rate law formulae, and other data attached to the nodes will be treated

during model composition. Is an enzymatic rate law a property attached to the reaction node?

We believe this has been addressed in the comment on line 109.

-> I don't see how this is addressed in the comment on line 109, maybe you can explain this more explicitly.

-> The font size in Figure 8 is now ok, but the font size in figure 6C could still be increased

583: “Eventually, the generated mathematical equations in the bond graph environment can be

converted into CellML for simulation and reproducibility.” Would the results models again have the

form of a “normal model”, or do they still look very “bond-graph like”, e.g. with non-biological

components representing junctions? (And I have the same question for a (potential, future)

conversion to SBML models).

We have explained the form of the converted bond graph model in CellML in the following

text.

(822-825) The regenerated bond graph model encoded in CellML will lose its graphical

structure and the model will be expressed in a system of ODEs. Since we can convert the

exported bond graph ODEs into MathML format, the biochemical equations would be also

expressible in SBML.

-> Thank you for the clarification. I can see that it is expressible in SBML; but does it still have the "natural" structure of an SBML model, with concentrations changes described by stoichiometric coefficients and reaction fluxes (described in "reaction" elements), or will it just be a collection of ODEs? Please clarify.

(782-791) Here, we have selected models encoded in CellML because CellML can deal

with models that are note purely biochemical,

-> typo "note"

We have added the above-mentioned works in References 15 & 16 and cited them in the

following text.

(31-34) Several formulations and frameworks have been developed to ensure

biochemical models follow the laws of thermodynamics ([15–17]) but most of them are

purely mathematical and are difficult to implement for model composition.

-> I don't think they would be difficult to implement for model composition (at least, if some standardised rate laws are used), I think the main point here is that they HAVEN'T been implemented for model composition.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Michael Clerx

Reviewer #3: Yes: Wolfram Liebermeister

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Jun 3;17(6):e0269497. doi: 10.1371/journal.pone.0269497.r004

Author response to Decision Letter 1


12 May 2022

Response to Reviewers

A semantics, energy-based approach to automate biomodel composition

We thank the reviewers for taking the time to read our revised manuscript. We have included the latest suggestions from Reviewer#2 and Reviewer#3 as detailed below. The reviewer comments will be shown in black, our responses in green and quotations from the revised manuscript in blue.

Reviewer #2:

The new version of the text is much improved and adequately addresses my major concerns. The github repo looks good too, but could perhaps benefit from a link to BondGraphTools (or a mention that it's pip installable).

We have added a link to BondGraphTools installation steps on our GitHub repository (Readme file).

Reviewer #3:

Dear authors,

Thank you for answering my comments and for the modifications you made. You addressed all the questions I raised, but some of your points could still be explained more clearly. I won't insist on any one of them. I'm listing them below for your information and leave it to you to decide which changes you would like to make.

Sincerely,

Your reviewer #3

(38-40) .... an energetic and multi-physics framework that explicitly models energy to ensure adherence to the laws of physics and is executable in multi-physics modelling."

-> For me, the term "multi-physics framework" is still a bit vague, you could briefly explain what you mean by it in this specific context; also "energetic" and "models energy" will not be clear to some readers, because "energy", without further specification, can mean many things.

We have added an example of “multi-physics” systems and “energetic modelling” in the following text.

(39-42) ... an energetic and multi-physics framework that explicitly models energy (expressing kinetic rate laws of biochemical reactions in terms of chemical energy level differences [17]) and ensures adherence to the laws of physics and is executable in multi-physics modelling (such as cardiomyocytes electromechanical coupling).

(116-118) ... provides a reliable and consistent framework that first tracks energy transfer; secondly ensures that reactions can only operate in the direction of decreasing chemical potential; ...

-> Again, "energy transfer" is not clear and should be explained.

We have modified the following text for clarity.

(119-121) ... provides a reliable and consistent framework that is consistent with energy conservation; secondly ensures that reactions can only operate in the direction of decreasing chemical potential; ...

Section 2.2 explains the usage of bond graph modeling of biochemical reactions, but it remains unclear how parameters, rate law formulae, and other data attached to the nodes will be treated during model composition. Is an enzymatic rate law a property attached to the reaction node?

We believe this has been addressed in the comment on line 109.

-> I don't see how this is addressed in the comment on line 109, maybe you can explain this more explicitly.

We have demonstrated this with an example given in Fig 2 (showing how merging in bond graphs occurs graphically) and how the rate law formulae and conservation laws change during model composition in S1 Text.C.

Yes, the enzymatic rate law is a property attached to the reaction node but instead of relating the reaction fluxes to concentrations, bond graphs relate fluxes to chemical potentials. We have discussed this in lines 40-41 of the manuscript.

-> The font size in Figure 8 is now ok, but the font size in figure 6C could still be increased

We have increased the font size in Figure 6.C.

583: “Eventually, the generated mathematical equations in the bond graph environment can be converted into CellML for simulation and reproducibility.” Would the results models again have the form of a “normal model”, or do they still look very “bond-graph like”, e.g. with non-biological components representing junctions? (And I have the same question for a (potential, future) conversion to SBML models).

We have explained the form of the converted bond graph model in CellML in the following text.

(822-825) The regenerated bond graph model encoded in CellML will lose its graphical structure and the model will be expressed in a system of ODEs. Since we can convert the exported bond graph ODEs into MathML format, the biochemical equations would be also expressible in SBML.

-> Thank you for the clarification. I can see that it is expressible in SBML; but does it still have the "natural" structure of an SBML model, with concentrations changes described by stoichiometric coefficients and reaction fluxes (described in "reaction" elements), or will it just be a collection of ODEs? Please clarify.

We believe that the conversion from bond graph models of biochemical systems to the natural structure of SBML models is possible due to the following reasons:

1. The constitutive equations for Re components (reactions) in bond graphs are expressed in terms of chemical potential (energy) differences and hence automatically account for energy conservation. These equations can be directly used as kinetic law expressions in SBML reactions too. There is no need to change the derived mathematical equations from bond graphs and they can be directly applied to SBML “reaction” elements.

2. The Ce components and TF transformers in bond graphs represent the “species” and “stoichiometry” in SBML models, respectively. Therefore, the transfer of these bond graph elements’ specific parameters to their SBML corresponding elements is conceivable.

3. As well as the reactant(s)-reaction-product(s) are extractable from SBML models, same relationships can be deduced from bond graph models of biochemical systems and inserted in SBML models.

We have modified the following text to include a summary of the above-mentioned ideas.

(825-830) The regenerated bond graph model encoded in CellML will lose its graphical structure and the model will be expressed as a system of ODEs. Since we can convert the exported bond graph ODEs into MathML format, the biochemical equations would be also expressible in SBML. The structure of such SBML models will be preserved since the required parameters, rate laws, and reactant(s)-reaction-product(s) relationships are extractable from the generated bond graph model.

(782-791) Here, we have selected models encoded in CellML because CellML can deal with models that are note purely biochemical,

-> typo "note"

We thank the reviewer for pointing out the typo and have corrected it.

(31-34) Several formulations and frameworks have been developed to ensure biochemical models follow the laws of thermodynamics ([15–17]) but most of them are purely mathematical and are difficult to implement for model composition.

-> I don't think they would be difficult to implement for model composition (at least, if some standardised rate laws are used), I think the main point here is that they HAVEN'T been implemented for model composition.

We have included the reviewer’s comment in the following text to make it clearer.

(31-35) Several formulations and frameworks have been developed to ensure biochemical models follow the laws of thermodynamics ([15–17]) but most of them are purely mathematical and are often difficult to implement for model composition due to non-standardised rate laws, and lacking an easy append/delete graphical structure.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 2

Lutz Brusch

23 May 2022

A semantics, energy-based approach to automate biomodel composition

PONE-D-21-39103R2

Dear Dr. Shahidi,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Lutz Brusch, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Lutz Brusch

25 May 2022

PONE-D-21-39103R2

A semantics, energy-based approach to automate biomodel composition

Dear Dr. Shahidi:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Lutz Brusch

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Ultrasensitivity in MAPK cascade.

    For an input kinase of Ras = 3 × 10−5 (μM), the concentration changes of the activated kinases (MKKKP, MKKPP, and MKPP) show the signal is amplified through each layer.

    (TIF)

    S2 Fig. The normalised activation of kinases in the MAPK cascade module for different input amounts (Ras).

    (TIF)

    S1 Table. Reactant(s) and product(s) of each step in EGFR pathway and the reaction rate equations.

    Steps 4, 8, and 16 are irreversible reactions, which are approximated by mass action kinetics. κi(i ∈ {Step}) in the reaction rate equations represent the reaction rate constants, Kx (x ∈ {Reactants, Products}) is the thermodynamic constant of each species, and qx (x ∈ {Reactants, Products}) is the concentration amount of each species.

    (PDF)

    S2 Table. Original and modified parameters of the species in the EGFR pathway model.

    (PDF)

    S3 Table. Original and modified parameters of the reactions in the EGFR pathway model.

    (PDF)

    S4 Table. Reactant(s) and product(s) of each step in the Ras activation pathway and the reaction rate equations.

    Steps 2 and 4 are irreversible reactions, which are approximated by mass action kinetics. κi(i ∈ {Step}) in the reaction rate equations represent the reaction rate constants, Kx (x ∈ {Reactants, Products}) is the thermodynamic constant of each species, and qx (x ∈ {Reactants, Products}) is the concentration amount of each species.

    (PDF)

    S5 Table. Original and modified parameters of the species in the Ras activation pathway model.

    (PDF)

    S6 Table. Original and modified parameters of the reactions in the Ras activation pathway model.

    (PDF)

    S1 Text. Supplementary material.

    Appendix A: Connectivity matrix example. B: Parameter estimation for step 4 in the EGFR pathway model. Appendix C: An example of composing two reactions in bond graphs. Fig A: An example network with its connectivity matrix. Fig B: The irreversible Michaelis-Menten and its equivalent approximated reversible mass action kinetics for step 4 in the EGFR signalling pathway model.

    (PDF)

    Attachment

    Submitted filename: REVIEW.pdf

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    The reference MAPK cascade model is available from: https://github.com/mic-pan/Modularity-SysBio The reference model of the EGFR pathway is available from: https://models.physiomeproject.org/e/47f/kholodenko_demin_moehren_hoek_1999.cellml/docgen All the model files for this manuscript are available on GitHub: https://github.com/Niloofar-Sh/EGFR_MAPK.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES