Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Aug 5;17(8):e1009227. doi: 10.1371/journal.pcbi.1009227

Relating simulation studies by provenance—Developing a family of Wnt signaling models

Kai Budde 1,*, Jacob Smith 2, Pia Wilsdorf 1, Fiete Haack 1, Adelinde M Uhrmacher 1
Editor: Pedro Mendes3
PMCID: PMC8407594  PMID: 34351901

Abstract

For many biological systems, a variety of simulation models exist. A new simulation model is rarely developed from scratch, but rather revises and extends an existing one. A key challenge, however, is to decide which model might be an appropriate starting point for a particular problem and why. To answer this question, we need to identify entities and activities that contributed to the development of a simulation model. Therefore, we exploit the provenance data model, PROV-DM, of the World Wide Web Consortium and, building on previous work, continue developing a PROV ontology for simulation studies. Based on a case study of 19 Wnt/β-catenin signaling models, we identify crucial entities and activities as well as useful metadata to both capture the provenance information from individual simulation studies and relate these forming a family of models. The approach is implemented in WebProv, a web application for inserting and querying provenance information. Our specialization of PROV-DM contains the entities Research Question, Assumption, Requirement, Qualitative Model, Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data as well as activities referring to building, calibrating, validating, and analyzing a simulation model. We show that most Wnt simulation models are connected to other Wnt models by using (parts of) these models. However, the overlap, especially regarding the Wet-lab Data used for calibration or validation of the models is small. Making these aspects of developing a model explicit and queryable is an important step for assessing and reusing simulation models more effectively. Exposing this information helps to integrate a new simulation model within a family of existing ones and may lead to the development of more robust and valid simulation models. We hope that our approach becomes part of a standardization effort and that modelers adopt the benefits of provenance when considering or creating simulation models.

Author summary

We revise a provenance ontology for simulation studies of cellular biochemical models. Provenance information is useful for understanding the creation of a simulation model because it not only contains information about the entities and activities that have led to a simulation model but also their relations, all of which can be visualized. It provides additional structure by explicitly recording research questions, assumptions, and requirements and relating them along with data, qualitative models, simulation models, and simulation experiments through a small set of predefined but extensible activities.

We have applied our concept to a family of 19 Wnt signaling models and implemented a web-based tool (WebProv) to store the provenance information from these studies. The resulting provenance graph visualizes the story line of simulation studies and demonstrates the creation and calibration of simulation models, the successive attempts of validation and extension, and shows, beyond an individual simulation study, how the Wnt models are related. Thereby, the steps and sources that contributed to a simulation model are made explicit.

Our approach complements other approaches aimed at facilitating the reuse and assessment of simulation products in systems biology such as model repositories as well as annotation and documentation guidelines.

Introduction

Mechanistic, biochemical models are implemented and questioned to deepen the understanding of biological systems. These models are usually the results of simulation studies that include phases of refinement and extension of simulation models together with the execution of diverse in silico (simulation) experiments.

A plethora of work has emerged over the last two decades to support the execution and documentation of simulation studies (e.g., modeling and simulation life cycles [1], workflows [2], conceptual models [3]). Depending on the application domain, different modeling approaches have their own documentation guidelines [46]. In the case of systems biology, the “Minimum Information Requested in the Annotation of Biochemical Models (MIRIAM)” [7] and the “Minimum Information About a Simulation Experiment (MIASE)” [8] are two community standards used for documenting simulation models and corresponding simulation experiments. A recent perspective by Porubsky et al. (2020) [9] looks at all stages of a biochemical simulation study and at tools supporting their reproducibility. When looking at an entire simulation study and at the generation process of the included simulation model, these guidelines provide some indication about what information might be useful for documenting a complete simulation study as well as for establishing relationships between different simulation models.

This is of particular interest when several simulation models for a system under consideration exist, offering different perspectives on the system, answering different questions, or reflecting the data and information available at the time of generation. Model repositories such as BioModels [10, 11], JWS Online [12], or the Physiome Model Repository 2 (PMR2) [13] provide different means to retrieve and use simulation models. For example, querying the BioModels database for biochemical and cellular simulation models that contain proteins such as Wnt, Janus kinase (Jak), or mitogen-activated protein kinase (MAPK), which are associated with corresponding signaling pathways, returns 22 simulation models for Wnt, 12 simulation models for Jak and 139 simulation models for MAPK (as of January 2021). This already shows that MAPK is an intensively studied signaling pathway. However, there is no way to easily compare these simulation models or examine their relationships to each other. Sometimes these relationships are represented in a model relationship map, such as the one created by Ajmera et al. (2013) [14] for diabetes models. Tools such as BiVeS [15] are helpful to compare different versions of one particular simulation model, but comparing different models—even of the same system—is a difficult task because the syntax of these models (e.g., the names of the species), as well as their reactions, might be completely different. Instead of analyzing the similarities and differences in the specifications of simulation models to infer possible relationships between these simulation models, we will focus on context information, such as how a simulation model has been generated.

Particularly, larger models are usually not built from scratch [16]. In general, simulation models are the outcome of extensive as well as interactive model and data generation activities. These include, in addition to executing various simulation experiments and successive model refinements, the adaptation of already existing models, for example, by composition or extension [1618]. Therefore, the complexity of a model grows over time as researchers add parts to the model or refine it. Keeping track of these generation processes is the subject of provenance.

Provenance provides “information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness” [19]. Thus, it can be applied to many fields of science, art, and technology, including biochemical and cellular simulation models. By exploiting standardized provenance data models, such as PROV-DM, this information is presented in a structured, queryable, and graphical form [20, 21]. In addition to providing crucial information about the generation of an individual simulation model and, thus, facilitating its reuse, provenance can be applied to reveal and exploit relationships between different simulation models.

In this publication, we will identify and structure relevant information needed to capture the provenance of simulation studies and elucidate how a family of simulation models can be established through relating the models to each other. As a case study, we will concentrate on 19 simulation studies of the Wnt/β-catenin signaling pathway. Among the different Wnt signaling pathways, the canonical Wnt or Wnt/β-catenin signaling pathway is the most intensively studied one, in vitro [22] as well as in silico [23]. This pathway is considered to be one of the key pathways in development and regeneration, including cell fate, cell proliferation, cell migration and adult homeostasis [24, 25]. In deregulated forms, it is involved in human cancers and developmental disorders [26, 27].

Our case study refers to the Wnt/β-catenin signaling pathway only, which we call Wnt throughout the text. We will present and use our web-based tool WebProv to store, display and query provenance information from these simulation studies. Different queries and analyses of the family of 19 Wnt simulation models will then be used for finding further insights into the family of Wnt signaling simulation models.

Materials and methods

Provenance of simulation models

Provenance data model

We consider the types and relations defined by the PROV data model: PROV-DM [19]. PROV-DM includes the following types: entity, activity, agent as well as the following relations: WasGeneratedBy, Used, WasInformedBy, WasDerivedFrom, WasAttributedTo, WasAssociatedWith, ActedOnBehalf-Of. Provenance information is usually depicted as a directed, acyclic graph where the arrowheads show towards the sources or predecessors of an entity, activity, or agent—thus, depicting its origin. For our case study, we are only focusing on the types entity and activity as well as on the relations WasGeneratedBy and Used. The reasons for our decision will be explained in the Results and discussion section.

Provenance ontology for simulation studies

Recently, Ruscheinski et al. (2018) [20] have applied PROV-DM for capturing provenance information from entire simulation studies and initiated a definition of a PROV-DM ontology for these studies. Important ingredients of this ontology have been identified to be “a) specific types of entities (e.g., data, theories, simulation experiments, and simulation models), b) specific roles between these entities (e.g., used as input, used for calibration, used for validation, used for adaptation, used for extension, used for composition), c) specific refinement of activities (i.e., successive refinement of activities down to a level where simulation experiment specifications define activities and thus are ready to be executed), and d) specific inference strategies (e.g., warnings if the same data has been used both by calibration and validation activities, or the option to reuse validation experiments among model descendants to check consistency)” [20]. The adaptation and application of this ontology for capturing the essential information of our case study is presented in the Results and discussion section.

Collecting provenance information

In order to gather all relevant information, the publications as well as the supporting materials—as they often contain model and experiment descriptions—were read thoroughly. Referenced publications were checked, as well, whenever they appeared to be important for the development of the simulation model. All information that resembled provenance entities were marked. While reading a study, a first sketch of a possible provenance graph was made. Afterwards, a revision of all markings helped to finalize the graph and to remove duplicate entities. Often, authors described their simulation study chronologically, which made it easy to determine the path of its development, but sometimes, the connections of the entities had to be inferred from the context. In general, tracing provenance information from an entire simulation study in retrospective involved some interpretation of the results presented in the publication.

Implementing the PROV-DM ontology: WebProv

We have developed WebProv, a web-based tool that can be used to store, access, and display provenance information from simulation studies. It allows one to insert and query provenance information based on a web interface as frontend and a graph-based database as backend. The frontend uses Vue, a popular JavaScript reactivity system, along with D3.js, a JavaScript visualization library, to create the front-end visualizations and power the node/relationship editor. As scalability was not a concern when designing the tool, all graph data is sent to the frontend when the website is first opened, allowing the frontend to perform approximate string matching and explore the entire graph without additional queries to the database. Although this reduces the responsibilities of the back-end system, the backend still provides an interface for loading different types of nodes, updating data and importing/exporting JSON data from Neo4j. Furthermore, the backend allows one to load in a set of nodes and relationships from JSON into Neo4j on startup to initialize the database.

The tool can also be installed locally for testing and replicability purposes. Details about its installation, as well as the code, can be found on GitHub. An informative video showing the usage of WebProv is also on YouTube.

Provenance nodes

The main concept of WebProv is the Neo4j Provenance Node and the dependency graph created from related Provenance Nodes using Neo4j relationships. A Provenance Node represents an entity or activity and, therefore, must have a classification (e.g., Simulation Model or Building Simulation Model) which defines the types of relationships that can be formed with other nodes depending on our PROV ontology. For example, the Simulation Model entity can only be created by a Building Simulation Model or a Calibrating Simulation Model activity (see the Results and discussion section for details). These classifications can be easily changed or extended if necessary. Additionally, Study nodes store information about a particular study (a reference to a study and the name of the signaling pathway it is concerned with) and group a set of Provenance Nodes together. Finally, InformationField nodes allow us to attach zero or more key–value pairs to a Provenance Node to store further information. In our case, we describe which information should be entered depending on the entity type in the Results and discussion section.

Queries

WebProv allows two types of queries: text queries and queries in Cypher—Neo4j’s query language. The text query field will perform a fuzzy search of the data contained within the nodes. If successful, a set of nodes are returned that contain the given phrase and the user can choose to add any of these nodes to the graph. Alternatively, the Cypher field passes the query as a string to the backend which forwards it to the Neo4j database. Since arbitrary queries can be performed, when the results are returned, the frontend attempts to parse the results using io-ts as a Provenance Node. If successful, these nodes are automatically shown on the graph. This method allows more complex queries, in particular, subgraphs to be extracted. Thus, structural information can be accessed.

Wnt signaling models

A comprehensive list of published simulation studies that deal with or include the Wnt signaling pathway is found in Table 1. Out of the 31 simulation studies ([2858]), which we have found, we have chosen to collect provenance information from 19 studies, shown in bold in Table 1. Some of these Wnt models have already been discussed in previous reviews [23, 59]. We have included all Wnt simulation studies where simulation models were stored in BioModels (6 studies) as well as all Wnt simulation studies published by our group (4 studies). The remaining nine studies were selected randomly.

Table 1. Wnt simulation models (as of Feb. 1, 2021) with those included in this study printed in bold.

Study BioModels MA SA Scale Add. Compartm. Add. Pathways/Models
[28] ODE det SC
[29] ODE det SC
[30] ODE det SC
[31] PDE det TOL
[32] ODE det SC Nuc MAPK/ERK
[33] ODE det SC Notch
[34] ODE det SC
[35] ODE det SC
[36] ODE det SC Nuc Notch, MAPK/ERK
[37] PDE det&stoch MC E-cadherin
[38] ODE det SC
[39] ODE det&stoch MC Cell cycle, E-cadherin
[40] ODE det SC
[41] ODE det SC
[42] PDE stoch MC
[43] ODE det SC MAPK/ERK
[44] PDE det&stoch MC Notch
[45] ODE det SC
[46] Rule det&stoch SC Nuc Cell cycle
[47] ODE det SC Nuc
[48] ODE det SC Nuc Notch
[49] ODE stoch SC Nuc, GA E-cadherin
[50] ODE det SC Nuc
[51] Rule stoch SC Nuc, Mem, LR ROS
[52] ODE det SC Nuc
[53] ODE det SC MAPK/ERK, PI3K/Akt
[54] Bool det SC PI3K/AKT, MAPK/ERK, Rho/Rac
[55] ODE det SC
[56] Rule det SC Nuc, End, Mem, LR
[57] Rule stoch SC Nuc ROS
[58] ODE det&stoch MC Cell cycle, Hippo

A list of published simulation studies of the Wnt signaling pathway is presented showing the references, the availability of the simulation models in BioModels, the modeling approaches (MA), the simulation approaches (SA), the scale of the models, additional compartments as well as additional pathways or sub-models included. The authors of the studies printed in bold are: [28]: Lee et al. (2003), [29]: Krüger and Heinrich (2004), [30]: Cho et al. (2006), [31]: Sick et al. (2006), [32]: Kim et al. (2007), [33]: Rodríguez-González et al. (2007), [34]: van Leeuwen et al. (2007), [35]: Wawra et al. (2007), [36]: Goldbeter and Pourquié (2008), [39] van Leeuwen et al. (2009), [41]: Mirams et al. (2010), [45]: Kogan et al. (2012), [46]: Mazemondet et al. (2012), [48]: Wang et al. (2013), [49]: Chen et al. (2014), [51]: Haack et al. (2015), [53]: Padala et al. (2017), [56]: Haack et al. (2020), and [57]: Staehlke et al. (2020). The BioModels IDs of the simulation models available in BioModels are: [28]: BIOMD0000000658, [32]: BIOMD0000000149, [34]: MODEL2001090001, [36]: BIOMD0000000201, [46]: MODEL1303140000, [53]: BIOMD0000000648. The modeling approaches (MA) are: ODE-based (ODE), PDE-based (PDE), rule-based or reaction-based (Rule), and Boolean network model (Bool). The simulation approaches (SA) are: det (deterministic), stoch (stochastic). The scale may be single cell (SC), multi cell (MC) or at a more abstract tissue/organ level (TOL). Every simulation model contains at least one compartment—usually the cytosol. We also denote additional compartments where reactions may take place and where some species may shuttle into or out of: Nucleus (Nuc), Membrane (Mem), Endosome (End), Golgi apparatus (GA), Lipid Raft (LR). Models without shuttling are considered to have only one compartment even though they describe processes in different places, for example, in the cytosol, nucleus and at the cell membrane.

All models include Wnt ligands and Wnt receptors (implicitly or explicitly) as well as the Wnt signal transducer protein β-catenin. There are two exceptions: the simulation model by Sick et al. 2006 [31] contains only Wnt and its antagonist Dkk and the model by Rodríguez-González et al. (2007) [33] lacks β-catenin. The number of species without considering compounds or different attributes or states, such as the phosphorylation state, ranges from 2 (in [31]) to 30 (in [53]). The dimension of a system may be higher if a model contains compounds or different states of the species. A schematic representation of the Wnt signaling model including relevant species and interactions from all 19 surveyed studies is shown in Fig 1.

Fig 1. Combined overview of all qualitative Wnt (sub-)models found within the 19 Wnt simulation studies.

Fig 1

Depicted are the components (species, compartments, and reactions) of the canonical Wnt signaling pathway (a) and its crosstalk with other signaling pathways (b) found within the 19 Wnt simulation studies. Note that the overview is a simplified and condensed representation. Interactions are simplified and some components of the submodels that do not directly affect the Wnt signaling pathway are omitted. Activated/phosphorylated proteins are indicated by (*). Inactive/unphosphorylated states of proteins have been omitted when possible. Submodels involving membrane-mediated processes, such as receptor/ligand interactions, destruction complex recruitment and endocytosis [31, 45, 51, 56], or cadherin-mediated cell adhesion [34, 39, 49] are incorporated in (a). Submodels involving crosstalk with ERK/FGF/PI3K/Akt [32, 36, 53], Notch [33, 36, 48], and ROS/Dvl-mediated pathways [51, 57] are shown in the lower panels of (b), respectively.

Results and discussion

In order to provide useful information about a set of simulation models as a kind of family, we need to answer the questions about which information regarding these models and their development processes are needed and how to describe them. Based on our earlier work on provenance of simulation models, we refine a specialization of the PROV Data Model (PROV-DM) and, thus, define a PROV ontology that is capable of both relating simulation models and reporting their generation processes. We also examine the level of detail, or granularity, that is necessary to capture relevant information of the provenance of simulation studies.

First, we will introduce and discuss our specialization of PROV-DM for cellular biochemical simulation models. This part contains descriptions and examples of all entity and activity types used in our provenance data model. Fig 2 and Table 2 provide overviews of these entity and activity types and should be consulted when skipping the first section.

Fig 2. UML class diagram of provenance entities in WebProv.

Fig 2

We have identified the following entities to be useful for providing provenance information from simulation studies in the field of systems biology: Research Question, Assumption, Requirement, Qualitative Model, Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data. The requested information for each entity type is kept minimal for demonstration purposes and can easily be extended. The Study (Reference) contains information of the publication, for instance, “Lee et al. (2003)” and determines which study an entity belongs to. The Description contains a brief explanation of a particular entity and may be a cited text from the publication. Furthermore, entity references should ideally consist of a digital object identifier (DOI) to make the artifact associated with the particular entity unambiguously accessible. Additional information can always be entered in the “Further Information” part of WebProv.

Table 2. Entities, activities and allowed relations in our PROV-DM specialization.

Entity wasGeneratedBy (Activity)
Research Question (RQ)
Assumption (A)
Requirement (R)
Qualitative Model (QM)
Simulation Model (SM) BSM | CSM
Simulation Experiment (SE) CSM | VSM | ASM
Simulation Data (SD) CSM | VSM | ASM
Wet-lab Data (WD)
Activity used (Entity)
Building Simulation Model (BSM) RQ | SM, {RQ | A | R | QM | SM | SE | SD | WD}
Calibrating Simulation Model (CSM) SM, {A | R | SD | WD}
Validating Simulation Model (VSM) SM, {A | R | SD | WD}
Analyzing Simulation Model (ASM) SM, {A | SD | WD}

Left column: Specified PROV-DM entity and activity types used for capturing provenance information from simulation studies. Right column: Relations of PROV-DM used for capturing provenance information from simulation studies as well as allowed connections of entities/activities from the first column. The PROV relation wasGeneratedBy connects entities with activities; used connects activities with entities. The Research Question, Assumption, Requirement, Qualitative Model, and Wet-lab Data entities are included in the provenance graph without their origins, thus without an activity generating them. The generation of the Simulation Model, Simulation Experiment, and Simulation Data are explicitly shown in the provenance graphs. For example, a Simulation Model can be created or updated based on a Building Simulation Model or Calibrating Simulation Model activity—the alternative is denoted by “|”. Regarding the activities, each activity has at least one entity it depends on (Research Question or Simulation Model). Other entity types are optional and several or none of each of them may be used by one particular activity—denoted by {…} in the extended Backus–Naur form (EBNF).

Second, we will discuss our findings, applying our specialization of PROV-DM and demonstrate the relationships as well as specific features of the provenance information from the 19 Wnt simulation studies covered in this publication.

Further steps towards a PROV-DM ontology for cellular biochemical simulation models

We have revised and refined the specialization of PROV-DM, which was introduced by Ruscheinski et al. (2018) [20]. For capturing provenance information from simulation studies of cellular biochemical simulation models and relating these, we are defining and using a) specific types of entities and activities and b) specific relations with their roles and constraints. During the process of collecting provenance information from the studies, we identified the types and relations as well as information that was useful for describing them. Our final set of entities, activities, and relations is shown in Table 2. Each entity and activity has already been mentioned for provenance, modeling or documentation purposes, or experiment design of simulation studies [1, 35, 8, 20, 21, 6063], but they have not all been used together.

In the following section, we will describe these entities, activities, and relations and discuss the information that should be included in WebProv. For each entity and activity, we will show examples of our specialization with provenance information obtained from the provenance graph of the simulation study by Lee et al. (2003) [28], shown in Fig 3, which also includes additional entities from three other studies [6466].

Fig 3. Provenance graph of the study by Lee et al. (2003) [28].

Fig 3

Besides the entities and activities that make up the provenance information from the study (see legend), additional entities from three other studies [6466], which were used by Lee et al. (2003), are shown. The colors of the ellipses show different entity types, the borders of the rectangles visualize different activity types. The gray areas separate the individual studies. The graph displays, for example, that the Building Simulation Model activity BSM1 used, among others, the entity WD1 of type Wet-lab Data from Lee et al. 2001 [64]. This activity then generated the Simulation Model SM1.

Provenance entities

For every Provenance Node (entity or activity), we require the following details to be provided: a) (PROV-DM) Type, b) Study (Reference), c) Description.

The (PROV-DM) Type declares the type of entity or activity. The Study (Reference) contains the name of the study a Provenance Node belongs to. The Description contains some textual explanation of the Provenance Node. For some entities, we are asking for further information, as seen in Fig 2, which we will elaborate on.

Research Question [RQ]:

The research question (or research objective or problem formulation) determines the goal of the research presented in a publication. For simulation studies, it typically forms the starting point of the modeling and simulation life cycle [1, 67] and is key to interpreting its outputs such as simulation data or a simulation model.

As for the provenance example shown in Fig 3, Lee et al. (2003) questioned in RQ1 the necessity of “the two scaffold proteins, APC and Axin” (for Wnt signaling) and whether “their roles differ” [28]. This research question determines a minimum number of model constituents and guides the modeler towards possible simulation experiments to be executed.

Assumption [A]:

We define assumptions of a simulation model to be all statements that refer to abstractions or specializations of the described model. Assumptions typically deal with the input of a model (e.g., assume the concentration of x to be constant or let the initial value of y be …) and may set the boundaries of the system under consideration or partially explain the thoughts behind a simulation model—always with the research question in mind.

In order to facilitate the analysis of assumptions, we adopted the Systems Biology Ontology (SBO) [68] to categorize the assumptions. SBO provides “structured controlled vocabularies, comprised of commonly used modeling terms and concepts” [69] and is primarily used to “describe the entities used in computational modeling (in the domain of systems biology)” [68]. By using SBO, we are trying to answer which part of the model contains assumptions rather than what was assumed.

In the provenance example, three assumptions with three different categories could be identified. A1, for instance, reads “Dsh, TCF, and GSK3β are degraded very slowly, we assume that their concentrations remain constant throughout the timecourse of a Wnt signaling event” [28] and was matched to ID 362 (Concentration conservation law) of SBO.

Requirement [R]:

Requirements define properties that the results of a simulation model need to show. These may be used for the purpose of calibrating or validating a simulation model. They also direct the modeler towards adaptation of a model if the requirements are not met. We do not consider other kinds of requirements (e.g., the need of using specific tools or approaches in performing a simulation study).

Typically, simulation data needs to be compared with real-world data—in our case wet-lab data. These real-world measurements determine the species of interest which should be part of the Requirement entity. Therefore, we record the main species considered by the requirement as well as its type (either qualitative or quantitative) and connect the Requirement to the wet-lab data it relates to. The list of main species will make it easier to compare, interrelate and reuse simulation models as they determine the focus of the model.

We were able to identify one requirement R1 in the provenance example of Lee et al. (2003). The quantitative requirement that “Axin stimulates the phosphorylation of β-catenin by GSK3β at least 24,000-fold” [28] actually refers to the wet-lab data WD1 obtained in another study by Dajani et al. (2003) [65]. Its main species are Axin, β-catenin, and GSK3β.

Qualitative Model [QM]:

We define the qualitative model to be a network diagram, such as a reaction scheme (chemical reaction network diagram), which contains the entities of the system (e.g., species) and their interactions. This diagram may be presented in a formal (e.g., using the Systems Biology Graphical Notation (SBGN) [70] or a Boolean network diagram [71]) or informal way. All textual descriptions of a simulation model that do not include quantitative information (e.g., reaction rate constants, initial values) can also be part of the qualitative model. It should be noted that the qualitative model is also called conceptual model [72] sometimes, whereas in other publications, the qualitative model forms part of the conceptual model [3].

We record a reference to the qualitative model, which, for example, could be a reference to a figure in the publication, or, ideally, a DOI. Furthermore, we denote a list of species and compartments used in the model. Multiple compartments require a shuttling of species from one compartment to another one and every compartment should, ideally, have an area (for the transfer rate or concentration in two-dimensional compartment) as well as a volume (for a three-dimensional compartment) [73].

In our provenance example of Lee et al. (2003), QM1 contains a qualitative model in the form of a chemical reaction network diagram which can be directly accessed via a DOI. It represents reactions of the species Wnt, Dsh, GSK3, Axin, APC, β-catenin, and TCF within a cell extract.

Simulation Model [SM]:

The simulation model is the actual mathematical or computational model [74] that can be executed by a suitable tool. In most cases of our domain, the simulation model contains equations (for ODE/PDE systems) or, in some cases, reaction rules (for rule-based systems). An integral part of these quantitative simulation models are the parameter values as well as the initial condition. The simulation model could also be described in another form (e.g., in a quantitative process algebra [75, 76] or with a combination of multiple formalisms [77]). Formal approaches to describe a system through qualitative models (e.g., Boolean models [78] or Petri nets [79]) come with their own means of analysis and are assigned to the Simulation Model entity as they are executable models. Usually, a parameter table complements the description of the simulation model and gives information about the parameter values and their origin.

A new Simulation Model entity is created whenever the reaction network changes or after a simulation model has been calibrated, which typically means that the set of parameter values and the initial condition have been (re-)defined. A validation activity (by itself) does not alter the simulation model, although a failed validation activity is likely to induce a change of the simulation model (see, for instance, [56]).

Again, we are relying on a reference of the simulation model for accessing it. It should be a link to the simulation model in Biomodels or a DOI to the description of the simulation model. Ideally, it is presented in a structured and widely accepted format such as SBML [80] or CellML [81].

As for the provenance example, the calibrated simulation model of Lee et al. (2003), SM2, can be found in BioModels.

Simulation Experiment [SE]:

The simulation experiment is an execution of the simulation model. Ideally, it can be linked to a complete experiment specification (e.g., as a SED-ML [82] or SESSL [83] file or simply as the execution code in a general purpose programming language) and to documentation in a standard format that applies reporting guidelines such as MIASE for simulation experiments [8]. Different simulation experiments might be used for the analysis, calibration, and validation of a simulation model.

To further structure the set of applied simulation experiments, we distinguish simulation experiments by whether they are used for optimization, sensitivity analysis, perturbation, parameter scan, steady-state analysis, or time course analysis. This list is neither complete nor are the categories disjoint, and, given a different set of simulation studies, they will likely be subject to renaming, extension, and refinement.

We have defined optimization experiments to be all experiments where an implicit or explicit objective function is used. This includes parameter estimation as well as manual parameter fitting experiments. If these succeed, the Calibrating Simulation Model activity will produce a new (calibrated) Simulation Model. In a sensitivity analysis, more than one parameter value is changed and some kind of sensitivity coefficient is calculated. We have interpreted experiments where the value of one (or more) parameter is changed to another (just one) value, for example, to mimic a knock-out experiment as perturbations. Parameter scans include the variation of at least one parameter value within a given range. A steady-state analysis is aimed at identifying the steady state of a system. We refer to time course analysis to be the analysis of trajectories without varying parameter values.

Eventually, an ontology about the various experiment types and analysis methods and their use in simulation studies will be crucial as simulation experiments play a central role in the provenance of simulation models. This would also help to exploit the provenance information effectively, for example, for automatically generating simulation experiments [84].

In the case of Lee et al. (2003), different simulation experiments have been executed. For example, SE1 contains a parameter scan in order to validate the simulation model. However, no further details are given in the paper, therefore, no reference could be included in the entity (the reference is “not available”).

Note that we have not included a Wet-lab Experiment entity. Our focus is on the result of the Wet-lab Experiment (i.e., the Wet-lab Data) and its role within the simulation study (e.g., being used in a Building Simulation Model, Calibrating Simulation Model, or Validating Simulation Model activity).

Simulation Data [SD] and Wet-lab Data [WD]:

Data is the result of an experiment. In our case, it can either stem from wet-lab or simulation experiments. It includes a reference to a plot or table or, ideally, to a database containing the raw data. Because simulation data is generated by a simulation experiment, a link needs to be established. In the case of a simulation experiment that serves the role of validation, information about the success of a validation facilitates the interpretation of the simulation model and further activities based on the simulation model. As no independent Wet-lab Experiment entity is supported, details about the wet-lab experiment can be summarized in the description of the Wet-lab Data or by referencing, for example, a research protocol on Protocols.io [85]. The type of the wet-lab experiment (in vitro or in vivo) as well as the used organism and organ/ tissue/ cell line should be recorded.

In the provenance example, Lee et al. (2003) have executed in vitro wet-lab experiments with an egg extract of Xenopus and have shown in WD1 that the “turnover of GSK3β, Dsh, and TCF is relatively slow” [28]. The data from this wet-lab experiment is not shown in the publication. The simulation data SD2 contains the results of the successful validation of the simulation model SM2. The simulation data is presented in Figure 2 of their publication. The way the provenance graph and the metadata of SD2 is visualized in WebProv can be seen in Fig 4.

Fig 4. Screenshot of WebProv.

Fig 4

This screenshot shows the provenance graph of the study by Lee et al. (2003) [28] with additional entities from three other studies [6466], which are automatically colored differently. The node SD2 has been clicked on, which opens a box on the right with the stored and editable metadata.

Provenance activities and relations

The provenance graph is formed by explicitly relating entities and activities. This is done by declaring which entities are being used or which entities are being generated by which activities. We currently distinguish four activities: building, calibrating, validating, and analyzing the simulation model.

Products of activities (i.e., entities) are connected to these activities via the relation wasGeneratedBy. For example, Simulation Experiments or Simulation Data may be the result of all but the Building Simulation Model activity. Activities are connected to entities via the relation used. For example, the Calibrating Simulation Model activity may use the Simulation Model as the object to be calibrated, some Simulation Data or Wet-lab Data for calibration, and Requirements to confirm the calibration results. All connections that we currently distinguish are shown in Table 2.

It should be noted that provenance activities in simulation studies can be defined at various levels of granularity. We have opted for a rather coarse-grained approach identifying only crucial activities of a simulation study without explicitly denoting how an activity has used a specific entity. Thus, we aggregate activities as much as possible and leave out intermediate steps, focusing on the entities and not on the activities. From the moment provenance information is recorded automatically during the course of a simulation study, a higher level of detail could be achieved and an abstraction-based filter could be applied to zoom out to reach our granularity [2].

Building Simulation Model [BSM]:

The Building Simulation Model activity, also called model derivation [86], can use all entity types as any entity described above can contribute to the model building process, but it needs to have at least one link to a Research Question or Simulation Model. The only result of the building simulation activity is a Simulation Model entity. Not every update of a simulation model within a simulation study will be reflected in the provenance graph—only those changes to the model that are considered essential by the authors.

In our provenance graph of the study of Lee et al. (2003), two Building Simulation Model activities are shown. BSM1 is using wet-lab data, the research question, the qualitative model, a requirement and assumptions to develop a “provisional reference state model” [28], which forms the not yet calibrated simulation model in the study. The Building Simulation Model activity BSM2 extends the simulation model SM2.

Calibrating Simulation Model [CSM]:

The calibration of a simulation model is used to determine parameter values. Sometimes, switching parts of a model on or off (e.g., individual rules or model components) or choosing between entire models can also be interpreted as a discrete parameter value to be determined using methods of model selection [87]. This activity uses a Simulation Model and typically needs reference data (Wet-lab Data or Simulation Data) for the parameter estimation procedure and produces a specification or documentation of a Simulation Experiment as well as a Simulation Data entity. If the calibration is successful, the result of this activity will always be a (calibrated) Simulation Model. Ideally, it also takes an explicit requirement into account, which, in some cases, if formally defined, can also be used for calibrating the simulation model [88, 89]. It may also use an Assumption.

In the case of the activity CSM1 from Lee et al. (2003), several wet-lab data from their own experiments (WD1WD5) as well as from Salic et al. (2000) [66] (WD1WD3) are used during the calibration of the model SM1 which produces a Simulation Experiment SE1, the corresponding Simulation Data SD1 as well as the calibrated Simulation Model SM2.

Validating Simulation Model [VSM]:

The validation of a simulation model is used to test its validity (with regard to some requirements). Unlike calibration activities, here, the result is typically a binary answer, yes or no, which may be determined based on a specific distance measure and error threshold. The activity uses a Simulation Model and traditionally relies on reference data (Wet-lab Data or Simulation Data) that has not been used for calibration. For example, the Simulation Data from other studies may be used for the intercomparison of simulation models when performing an equal simulation experiment. Additionally, the required behavior can be formally specified in a Requirement (e.g., in a temporal logic) and automatically be checked [90, 91]. The Validating Simulation Model activity may also use an Assumption. It produces at least one entity of type Simulation Experiment as well as the corresponding Simulation Data entity. The Simulation Data entity of validation experiments stores the information whether the validation has been successful or not.

In our provenance example of Lee et al. (2003), the simulation model SM2 is validated in the activity VSM1 by comparing it with their own in vitro measurements shown in WD6 and WD7. Neither a distance measure nor an explicit requirement are mentioned.

Analyzing Simulation Model [ASM]:

Similar to validation and calibration experiments, this activity provides insights into the simulation model and thus also into the system under study. The activity uses a Simulation Model and creates a Simulation Experiment as well as resulting Simulation Data. The use of Assumptions, Simulation Data, or Wet-lab Data is optional and might give an indication about the purpose of an analysis. The Analyzing Simulation Model activity aggregates all simulation experiments that are not explicitly aimed at calibration and validation.

Lee et al. (2003) analyze both the calibrated simulation model SM2 as well as the extended simulation model SM3 by applying parameter scans, perturbations, and sensitivity analyses which results in the Simulation Experiment SE3SE11 and Simulation Data SD3SD11.

Extensibility and applicability of the approach

All of these entities, activities and relations show major steps of the development of a simulation model and, as we will see in the following section, help to interrelate different simulation studies. Still, PROV-DM would allow even more details. We have not yet considered the type agent from PROV-DM in our approach because this information appeared less relevant in the analyzed simulation studies. In the future, the provenance information could include the name of the agent an entity is attributed to, the agent an activity is associated with, or the name of the agent another agent acted on behalf of [19]. This would be of particular relevance if models are not only validated but also accredited, which typically involves a different group of people other than those who have developed the simulation model [92].

We have also not included the direct connection between two activities or two entities, such as the possibility to have a model being derived from another model. Thus, we have not included the following relations: a) WasInformedBy, which relates an activity to another one and b) WasDerivedFrom, which describes a direct transformation (update) of an entity into a new one. However, these relations can partly be inferred via the existing relations. For example, a simulation model that has been generated by a Building Simulation Model activity that used another simulation model indicates that the former has been derived from the latter. Additionally, a validation activity that failed and that is followed by a Building Simulation Model activity obviously holds some information for the latter.

In our experience, it is best to capture provenance information manually or (semi-)automatically during the modeling (and simulation) process. This could be done, for example, within an artifact-based workflow system [2]. However, this would rely on a fixed life cycle definition (i.e., constraints regarding the allowed activities). Other approaches are based on automatically analyzing scripts, either with [93] or without [94] the help of user annotations. The latter has potential for a fully automatic and transparent provenance capture, however, it is difficult to implement, highly system-dependent, and only accounts for provenance information contained explicitly in the scripts, thus leaving out important information, such as research question, assumptions, or qualitative models. WebProv, on the contrary, is a standalone tool that works system-independently. Although it requires user input or a valid JSON input file, it also provides great flexibility regarding the information to be captured.

Exploring the provenance of and among the Wnt simulation models of 19 simulation studies

Based on the entities and activities that were identified and defined above, we have recorded the provenance information from the 19 studies shown in bold in Table 1 as well as the provenance of entities from other publications that were used by the 19 studies. The references to the additionally used studies are found in S1 Appendix. The complete provenance information is presented in S1 Data. Screenshots and presentations of the provenance information can also be found on GitHub. We will now discuss the observations we have made during the process of capturing the provenance information and later show how the studies and simulation models relate to each other.

Provenance of individual Wnt simulation models

It is important to remember that we have manually collected all provenance information (entities, activities and relations), as described in the Materials and methods section. Collecting this information based on publications only is a demanding task and requires some interpretation, as natural language descriptions tend to be ambiguous. Also, the nonlinearity of the text—it is not a lab protocol after all—makes it hard to identify the relations between the entities and activities as well as the order of their execution. This would likely hamper an effective use of text mining or machine learning methods to complement or replace the manual work. Therefore, provenance information should be collected during the simulation study and ideally without an intervention of the modeler.

The Research Question was usually repeated within the abstract and throughout the introduction and discussion sections. Sometimes, there was more than one research question to be answered. In this case, we have combined these into a single entity.

Many Assumptions were introduced by the word “assume” or its derivatives. Other expressions such as “hypothesis”, “is believed”, “consider”, “approximate”, “simplify”, “suggest”, “suppose”, or “propose” were also used by the authors to mark an assumption. However, not every sentence containing one of these words was identified to be an assumption of the simulation model. Occasionally, there were also assumptions which did not use one of the key words from above. Furthermore, two out of 19 studies did not explicitly state assumptions ([29, 53]). Generally, identifying assumptions involves many uncertainties. On the one hand, the authors might not have stated all assumptions made during the derivation of the simulation model. On the other hand, we could have easily missed an assumption or interpreted statements as assumptions that were not meant as such by the authors. Consequently, the assumptions might look different if the authors had defined them themselves.

In order to further analyze the assumptions, we categorized them using the Systems Biology Ontology (SBO) [68]. However, the assumptions could not always be unambiguously matched to an SBO vocabulary and some assumptions dealing with biological mechanisms are not covered by SBO. For example, an autocrine signaling assumed by Mazemondet et al. (2012) [46] cannot be expressed by SBO. Some assumptions also include more than one detail which is reflected by multiple SBO categories per assumption. In this case, the assumption entity is duplicated and every assumption entity will receive its own category. The categorization of 106 collected assumptions shows that the three most used categories of assumptions deal with kinetic constants (13 times), transport (9 times), and equivalence (8 times). The result of the categorization can be found in S1 Table.

In many cases, Requirements were not given explicitly in a formal way or even as textual descriptions. We could only identify Requirements in the publications of Lee et al. (2003) [28], Wang et al. (2013) [48], Chen et al. (2014) [49], and Haack et al. (2020) [56]. The lack of Requirements was especially obvious when calibration or validation experiments were carried out without explicitly explaining the objective function.

In the surveyed publications, it was common to include a reaction scheme of the simulation model, which we referred to in the Qualitative Model entities. When recording all species, we disregarded di- or multimeric compounds established by monomers already mentioned. We also ignored different states of the species (e.g., phosphorylation states). In all provenance graphs but the one by Mirams et al. (2010) [41], at least one Qualitative Model was used by a Building Simulation Model activity to produce the first Simulation Model. Mirams et al. (2010) have directly worked with the simulation model SM2 from Lee et al. (2003).

The Simulation Models were either part of the manuscript or, more often, part of the supporting material. In 13 out of 19 cases, it was a system of ordinary differential equations. There were two simulation studies using PDEs and four using a rule-based approach (see Table 1). Although the Wnt signaling pathway has also been used to illustrate features of rule-based modeling [95, 96], only few simulation models have been developed based on a rule-based approach. The reason for this might be partly because support for thorough experimentation with rule-based models including calibration and validation has only become available during the last decade [9799].

We categorized all 145 Simulation Experiments that we found depending on which experiment type they served (see UML class diagram shown in Fig 2). The results of the categorization are shown in Fig 5 and the details in S2 Table. Most Simulation Experiments were parameter scans, followed by time course analyses and perturbations. None of the 19 simulation studies have used steady-state analysis alone without applying another type of experiment at the same time, which we then recorded because it was more specific. The detection of steady states is typically part of an optimization, parameter scan, perturbation, and sensitivity analysis, because steady-state values are often the starting and end point of each simulation and are used for calculations.

Fig 5. Categories of the simulation experiments conducted within the analyzed 19 Wnt signaling studies.

Fig 5

All of these simulation experiments have been categorized. Most simulation experiments were parameter scans.

Sometimes, simulation or wet-lab experiments have been conducted, however, the corresponding Simulation Data or Wet-lab Data is not shown. Instead, they are briefly described, and thus are without references in the provenance graphs. Authors often refer to this by stating “data not shown”. For example, Lee et al. (2003) state that unpublished measurements showed that the “turnover of GSK3β, Dsh, and TCF is relatively slow” [28]. Usually, both Simulation Data and Wet-lab Data are shown in figures in the studies or published in tables or figures as part of the supplemental material. In recent years, more and more journals, such as PLOS Computational Biology, have been recommending (but not requiring) to adhere to checklists of the FAIRsharing [100] portal when reporting data, with FAIR standing for: findable, accessible, interoperable and reusable [101].

In the case of Simulation Data, the focus lies on FAIR simulation models and experiments as it should be possible to easily regenerate the data. This could be achieved, for example, by publishing a COMBINE archive [61], which is a “single file that aggregates all data files and information necessary to reproduce a simulation study in computational biology” [102]. However, the publications we have analyzed date back up to 17 years, so most data has not been published in a FAIR way.

When looking at Wet-lab Data, six out of 19 publications recorded their own wet-lab data [28, 31, 32, 45, 51, 57]. All other simulation studies either used wet-lab data from other publications or did not explicitly refer to wet-lab data at all [29, 30, 33, 34, 39, 41, 53]. The latter could only be done because the authors relied on other Simulation Models and their respective parameter and initial values. Interestingly, the wet-lab data obtained by the 19 Wnt signaling studies and the other studies which were used in the 19 studies stemmed from different organisms and cell lines. Besides human and murine cell lines (each 11 studies), xenopus, rat, hamster, and kangaroo rat were used as a model organisms. The experiments included, among others, (tumor) cell lines from the kidney (BHK, HEK293, PtK2), bone (MG-63), cervix (HeLa), brain (neural progenitor cells), and fibroblasts (L cells, NIH/3T3). All different cell lines directly used in the studies are presented by the colored rectangles in Fig 6. The cell lines and organisms that were included in the simulation studies are shown in Table 3.

Fig 6. Provenance graph of all Wnt/β-catenin simulation studies considered here (black outlines) as well as additionally used studies (gray outlines).

Fig 6

The colors indicate the cell lines used in wet-lab experiments of that study (see legend below graph). Gray boxes represent pure Wnt/β-catenin simulation studies without acquiring wet-lab data. White boxes display publications used by some of the Wnt/β-catenin simulation studies that are either text books or simulation studies without published wet-lab data. The figure was created using the R package DiagrammeR [103]. The references to the additionally used studies are found in S1 Appendix.

Table 3. Cell lines and organisms used building the simulation models.
Reference Egg extract Embryo Fibroblasts L cells Fibroblasts NIH/3T3 cells Mammary gland (C57MG) cells Osteoblast‑like cells (MG-63) Presomitic mesoderm (PSM) cells Skin Cervical cancer epithelial (HeLa S3) cells Colon carcinoma RKO cells Embryonic kidney epithelial (HEK 293) cells Neural progenitor (ReNcell VM) cells Baby hamster kidney (BHK) cells Epithelial kidney cells (PtK2) Pheochromocytoma cells Not applicable
[28] x
[29] x
[30] x
[31] x
[32] o x
[33] x x
[34] x
[35] x x x
[36] x x
[39] o
[41] x
[45] x x
[46] x x
[48] o x
[49] x x x
[51] x x x x x x x
[53] o x x
[56] o x x x o x o o x
[57] x
{ { { { { {
Cell line/ organ Xenopus Mouse Human Hamster Kangaroo rat Rat

The top row shows the origin of the cells used by the 19 studies. Not applicable means that the parameters values were obtained from theoretical calculations/considerations from a textbooks or without concrete reference to a wet-lab study. The colors denote the organisms of the cell line/organ used (see bottom row). The “x” denotes directly used, the “o” denotes indirectly used by using (parts of) a referenced simulation model.

When looking at the CSM and VSM activities, we detect that only a few simulation models were both calibrated and validated based on wet-lab data, namely those of Lee et al. (2003) [28], Kogan et al. (2012) [45], and Haack et al. (2015) [51]. Some authors just calibrated their simulation models ([33, 46, 48, 49, 53, 57]) and in two studies ([30, 56]) they were only validated (having assumed plausible ranges for parameter values). The simulations models developed in [29, 32, 3436, 39, 41] had neither been calibrated nor validated explicitly, because they used parameter values from other studies. Sick et al. (2006) [31] used arbitrary parameter values in their simulation models.

Some studies develop a story line where a simulation model has been successively refined, extended, or composed ([28, 33, 34, 36, 41, 46, 48, 49, 51, 56]). These studies are characterized by a Simulation Model being used by a Building Simulation Model activity, both nodes of the same study.

Some simulation studies resulted in the development of multiple simulation models that are neither extensions nor compositions but rather form a revision or alternative to other simulation models developed in the same study: [33, 39, 41, 46, 48, 49]. This can be seen in the provenance graph of a single study when the last simulation model is not connected by a directed path to other simulation models of that study or when the simulation models are part of disjoint branches of the provenance graph. For example, in Mazemondet et al. (2012) [46], the core model of the Wnt/β-catenin signaling pathway has been calibrated with wet-lab data from Lee et al. (2003) [28], but this calibrated simulation model was not used and instead, a new calibration with wet-lab data from another study took place. Two simulation studies ([35, 41]) show disconnected graphs. This shows that these researchers considered, built, and analyzed multiple simulation models independently of each other.

Beyond individuals: A family of Wnt simulation models

Whereas before we have looked at the properties of individual simulation studies, we are now going to investigate the interrelations between the 19 Wnt/β-catenin signaling simulation models. We will identify features that transform a set of simulation models into a family.

Fig 6 shows an overview of all simulation studies considered in our research by zooming out of the individual provenance graphs. All but two simulation models are (indirectly) connected to the model proposed by Lee et al. (2003) [28], which was the first validated simulation model of the canonical Wnt signaling pathway. This shows that models are usually not built independently from one another, but are often extensions or revisions of formerly published models or use the parts of the reactions or parameter values. The two exceptions not linked to Lee et al. (2003) [28] are the simulation models by Sick et al. (2006) [31] and by Rodríguez-González et al. (2007) [33]. Sick et al. (2006) use a reaction-diffusion system (Gierer-Meinhardt equations [104]) to model the interplay between Wnt and its antagonist Dkk with just two equations. Rodríguez-González et al. (2007) consider Wnt and Axin—two key players of the Wnt/β-catenin signaling pathway—but only in the context of the Notch signaling pathway and thus the model only contains a fragment of the canonical Wnt pathway.

We have also included the cell lines/tissue used for wet-lab experiments within each study. As already seen in Table 3, we observe that different simulation studies use data or models obtained using other cell lines. This may be valid as the Wnt/β-catenin signaling pathway is evolutionarily conserved [24], which means that data can be shared. Still, care must always be taken when using, for instance, parameter values determined with one cell line in a study that uses another cell line.

When looking at the same graph using a circular layout, we observe four clusters of two or more studies, as shown in S1 Fig. We have also colored the studies according to additional pathways they include and observe that the clusters separate the studies depending on these additional cellular mechanisms. The central cluster includes the Wnt model by Lee et al. (2003) [28] as well as the studies [30, 34, 35, 41]. A second cluster forms around the simulation studies [45, 46, 51, 56, 57] and either includes the same wet-lab data from [105, 106], the Cell cycle [46] or ROS [51, 57]. A third cluster includes the pathways of Notch [33, 48] and Notch + MAPK/ERK [36]. Even though the algorithms locate [29] in the same cluster, it is content-wise rather part of the central cluster. A forth cluster forms around studies that include MAPK/ERK [32] or MAPK/ERK + PI3K/Akt [53]. All other models are not part of a cluster and are either completely disconnected from the other studies [31] or include E-cadherin and the cell cycle [39] or just E-cadherin [49].

When comparing Fig 6 and S1 Fig, we find that the wet-lab data from just three studies, namely from [105107], have been reused by simulation studies. On the one hand, this is surprising as wet-lab data can be reused for parameter estimation or model validation. On the other hand, when authors use parts or entire simulation models published by others, they do not necessarily recite the references that were used for obtaining the parameter and initial values that come with the model. Thus, a direct connection from the new simulation model to the wet-lab data used by another simulation study is not made.

Conclusion

Provenance of simulation models provides information about how a simulation model has been generated and about the steps and various sources that contributed to its generation. Here, we have developed a specialization of PROV-DM focusing on entities and activities. It builds on an earlier PROV-DM specialization in which Simulation Model, Simulation Experiment, Simulation Data, and Wet-lab Data have been identified as crucial entities of simulation studies [20]. Additionally, we have taken knowledge of modeling and simulation life cycles [1] into account and identified the Research Question, Assumptions, Requirements, the Qualitative Model to be important ingredients of the provenance of simulation models. We also distinguish between Building Simulation Model, Calibrating Simulation Model, Validating Simulation Model, and Analyzing Simulation Model activities and connect the entities and activities by using the relations wasGeneratedBy and used. In our definitions of the entities and activities, we aimed at achieving the minimal level of detail, or granularity, of the provenance graph to understand the course of a simulation study. We also kept the necessary metadata of the entities and activities to a minimum to convey both the main idea of the simulation study and the content of each entity and activity. For storing, visualizing, and querying the provenance information, we have created the web-based tool WebProv that allows for each entity and activity to store (customized) metadata and references.

In order to examine our specialization of PROV-DM, the extensive analysis of 19 simulation studies of the canonical Wnt signaling pathway provided a suitable case study. We were able to explicitly show that most studies are connected to one or more other Wnt simulation studies, using (parts of) their simulation models, in addition to various data from wet-lab studies. Our results show the outstanding role of the Wnt simulation model by Lee et al. (2003) [28] as the origin for most other models in our survey. Thus, a family of Wnt signaling models could be revealed.

In conclusion, provenance information provides added value to the existing list of documentation requirements and could complement and enrich the effort of “harmonizing semantic annotations for computational models in biology” [108]. Together with the exploitation of community standards and ontologies, provenance information opens up further possibilities of reusing and analyzing simulation models, for example, to help with model selection, model merging, or model difference detection. Of course, to be fully accepted, our specialization of PROV-DM should be subject to a standardization initiative. We think that WebProv, or a similar tool, would be a valuable extension to model repositories such as BioModels, as one could see where a simulation model comes from, whether there are other models connected to it, and in which way they are connected. This would help to quickly interpret the increasing number of published simulation models and find a suitable one for your research.

Supporting information

S1 Appendix. Additional references for entities used by Wnt simulation studies.

We show the references to additional studies that contain entities used by some of the 19 Wnt simulation studies.

(PDF)

S1 Data. Complete provenance information from 19 Wnt simulation studies.

This file contains the provenance information from the 19 analyzed simulation studies of the Wnt signaling pathway. It was exported from WebProv and may be imported into another instance of the tool.

(JSON)

S1 Fig. Provenance graph of all 19 Wnt/β-catenin simulation studies and their depending studies using a circular layout.

Studies which include additional pathways have been colored.

(PDF)

S1 Table. Categorized assumptions.

We present the results of the categorization of all assumptions found in the 19 simulation studies using SBO. We have have also added information about the key words that accompanied the assumptions.

(CSV)

S2 Table. Categorized simulation experiments.

We present the results of the categorization of the simulation experiments found in the 19 simulation studies using our categories.

(CSV)

Acknowledgments

We thank Nadja Schlungbaum for her excellent technical support.

Data Availability

All relevant data are within the manuscript and its Supporting information files. Additional material is available at https://github.com/SFB-ELAINE/SI_Provenance_Wnt_Family.

Funding Statement

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) (Grant no. SFB 1270/1—299150580: K.B., F.H.; Grant no. 320435134: P.W.) and by the Deutscher Akademischer Austauschdienst (German Academic Exchange Service) through the Research Internships in Science and Engineering (RISE) program (Grant no. 57467143: J.S.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Balci O. A life cycle for modeling and simulation. SIMULATION. 2012;88(7):870–883. doi: 10.1177/0037549712438469 [DOI] [Google Scholar]
  • 2.Ruscheinski A, Wilsdorf P, Dombrowsky M, Uhrmacher AM. Capturing and Reporting Provenance Information of Simulation Studies Based on an Artifact-Based Workflow Approach. In: Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. SIGSIM-PADS ‘19. New York, NY, USA: Association for Computing Machinery; 2019. p. 185–196.
  • 3.Wilsdorf P, Haack F, Uhrmacher A. Conceptual Models in Simulation Studies: Making it explicit. In: Proceedings of the 2020 Winter Simulation Conference. WSC ‘20. IEEE Press; 2020.
  • 4.Monks T, Currie CSM, Onggo BS, Robinson S, Kunc M, Taylor SJE. Strengthening the reporting of empirical simulation studies: Introducing the STRESS guidelines. Journal of Simulation. 2018; p. 1–13. [Google Scholar]
  • 5.Erdemir A, Guess TM, Halloran J, Tadepalli SC, Morrison TM. Considerations for reporting finite element analysis studies in biomechanics. Journal of Biomechanics. 2012;45(4):625–633. doi: 10.1016/j.jbiomech.2011.11.038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Grimm V, Berger U, DeAngelis DL, Polhill JG, Giske J, Railsback SF. The ODD protocol: a review and first update. Ecological modelling. 2010;221(23):2760–2768. doi: 10.1016/j.ecolmodel.2010.08.019 [DOI] [Google Scholar]
  • 7.Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature biotechnology. 2005;23(12):1509. doi: 10.1038/nbt1156 [DOI] [PubMed] [Google Scholar]
  • 8.Waltemath D, Adams R, Beard DA, Bergmann FT, Bhalla US, Britten R, et al. Minimum Information About a Simulation Experiment (MIASE). PLoS Computational Biology. 2011;7(4):e1001122. doi: 10.1371/journal.pcbi.1001122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Porubsky VL, Goldberg AP, Rampadarath AK, Nickerson DP, Karr JR, Sauro HM. Best Practices for Making Reproducible Biochemical Models. Cell Systems. 2020;11(2):109–120. doi: 10.1016/j.cels.2020.06.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Malik-Sheriff RS, Glont M, Nguyen TVN, Tiwari K, Roberts MG, Xavier A, et al. BioModels—15 years of sharing computational models in life science. Nucleic Acids Research. 2020;48(D1):D407–D415. doi: 10.1093/nar/gkz1055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Glont M, Nguyen TVN, Graesslin M, Hälke R, Ali R, Schramm J, et al. BioModels: expanding horizons to include more modelling approaches and formats. Nucleic Acids Research. 2018;46(D1):D1248–D1253. doi: 10.1093/nar/gkx1023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Olivier BG, Snoep JL. Web-based kinetic modelling using JWS Online. Bioinformatics. 2004;20(13):2143–2144. doi: 10.1093/bioinformatics/bth200 [DOI] [PubMed] [Google Scholar]
  • 13.Yu T, Lloyd CM, Nickerson DP, Cooling MT, Miller AK, Garny A, et al. The Physiome Model Repository 2. Bioinformatics. 2011;27(5):743–744. doi: 10.1093/bioinformatics/btq723 [DOI] [PubMed] [Google Scholar]
  • 14.Ajmera I, Swat M, Laibe C, Le Novère N, Chelliah V. The impact of mathematical modeling on the understanding of diabetes and related complications. CPT: Pharmacometrics & Systems Pharmacology. 2013;2(7):54. doi: 10.1038/psp.2013.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Scharm M, Wolkenhauer O, Waltemath D. An algorithm to detect and communicate the differences in computational models describing biological systems. Bioinformatics. 2016;32(4):563–570. doi: 10.1093/bioinformatics/btv484 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cvijovic M, Almquist J, Hagmar J, Hohmann S, Kaltenbach HM, Klipp E, et al. Bridging the gaps in systems biology. Molecular Genetics and Genomics. 2014;289(5):727–734. doi: 10.1007/s00438-014-0843-3 [DOI] [PubMed] [Google Scholar]
  • 17.Peng D, Warnke T, Haack F, Uhrmacher AM. Reusing simulation experiment specifications in developing models by successive composition—a case study to support developing models by successive extension. Simulation Modelling Practice and Theory. 2017;68:33–53. doi: 10.1016/j.simpat.2016.07.006 [DOI] [Google Scholar]
  • 18.Peng D, Warnke T, Haack F, Uhrmacher AM. Reusing simulation experiment specifications to support developing models by successive extension. Simulation Modelling Practice and Theory. 2016;68:33–53. doi: 10.1016/j.simpat.2016.07.006 [DOI] [Google Scholar]
  • 19.Belhajjame K, B’Far R, Cheney J, Coppens S, Cresswell S, Gil Y, et al. Prov-DM: The prov data model. World Wide Web Consortium (W3C); 2013.
  • 20.Ruscheinski A, Gjorgevikj D, Dombrowsky M, Budde K, Uhrmacher AM. Towards a PROV Ontology for Simulation Models. In: International Provenance and Annotation Workshop. Springer; 2018. p. 192–195.
  • 21.Ruscheinski A, Uhrmacher AM. Provenance in Modeling and Simulation Studies: Bridging Gaps. In: Proceedings of the 2017 Winter Simulation Conference. WSC ‘17. IEEE Press; 2017.
  • 22.MacDonald BT, Tamai K, He X. Wnt/β-Catenin Signaling: Components, Mechanisms, and Diseases. Developmental Cell. 2009;17(1):9–26. doi: 10.1016/j.devcel.2009.06.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lloyd-Lewis B, Fletcher AG, Dale TC, Byrne HM. Toward a quantitative understanding of the Wnt/ β -catenin pathway through simulation and experiment. Wiley Interdisciplinary Reviews: Systems Biology and Medicine. 2013;5(4):391–407. [DOI] [PubMed] [Google Scholar]
  • 24.Steinhart Z, Angers S. Wnt signaling in development and tissue homeostasis. Development. 2018;145(11):dev146589. doi: 10.1242/dev.146589 [DOI] [PubMed] [Google Scholar]
  • 25.Giles RH, van Es JH, Clevers H. Caught up in a Wnt storm: Wnt signaling in cancer. Biochimica et Biophysica Acta (BBA)—Reviews on Cancer. 2003;1653(1):1–24. doi: 10.1016/S0304-419X(03)00005-2 [DOI] [PubMed] [Google Scholar]
  • 26.Clevers H, Nusse R. Wnt/β-catenin signaling and disease. Cell. 2012;149(6):1192–1205. doi: 10.1016/j.cell.2012.05.012 [DOI] [PubMed] [Google Scholar]
  • 27.Logan CY, Nusse R. The Wnt Signaling Pathway in Development and Disease. Annual Review of Cell and Developmental Biology. 2004;20(1):781–810. doi: 10.1146/annurev.cellbio.20.010403.113126 [DOI] [PubMed] [Google Scholar]
  • 28.Lee E, Salic A, Krüger R, Heinrich R, Kirschner MW. The Roles of APC and Axin Derived from Experimental and Theoretical Analysis of the Wnt Pathway. PLoS Biology. 2003;1(1):e10. doi: 10.1371/journal.pbio.0000010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Krüger R, Heinrich R. Model reduction and analysis of robustness for the Wnt/β-catenin signal transduction pathway. Genome Informatics. 2004;15(1):138–148. [PubMed] [Google Scholar]
  • 30.Cho KH, Baek S, Sung MH. Wnt pathway mutations selected by optimal β-catenin signaling for tumorigenesis. FEBS Letters. 2006;580(15):3665–3670. doi: 10.1016/j.febslet.2006.05.053 [DOI] [PubMed] [Google Scholar]
  • 31.Sick S, Reinker S, Timmer J, Schlake T. WNT and DKK determine hair follicle spacing through a reaction-diffusion mechanism. Science. 2006;314(5804):1447–1450. doi: 10.1126/science.1130088 [DOI] [PubMed] [Google Scholar]
  • 32.Kim D, Rath O, Kolch W, Cho KH. A hidden oncogenic positive feedback loop caused by crosstalk between Wnt and ERK pathways. Oncogene. 2007;26(31):4571–4579. doi: 10.1038/sj.onc.1210230 [DOI] [PubMed] [Google Scholar]
  • 33.Rodríguez-González J, Santillán M, Fowler A, Mackey MC. The segmentation clock in mice: interaction between the Wnt and Notch signalling pathways. Journal of theoretical biology. 2007;248(1):37–47. doi: 10.1016/j.jtbi.2007.05.003 [DOI] [PubMed] [Google Scholar]
  • 34.van Leeuwen IMM, Byrne HM, Jensen OE, King JR. Elucidating the interactions between the adhesive and transcriptional functions of -catenin in normal and cancerous cells. Journal of Theoretical Biology. 2007;247(1):77–102. doi: 10.1016/j.jtbi.2007.01.019 [DOI] [PubMed] [Google Scholar]
  • 35.Wawra C, Kühl M, Kestler HA. Extended analyses of the Wnt/β-catenin pathway: Robustness and oscillatory behaviour. FEBS Letters. 2007;581(21):4043–4048. doi: 10.1016/j.febslet.2007.07.043 [DOI] [PubMed] [Google Scholar]
  • 36.Goldbeter A, Pourquié O. Modeling the segmentation clock as a network of coupled oscillations in the Notch, Wnt and FGF signaling pathways. Journal of Theoretical Biology. 2008;252(3):574–585. doi: 10.1016/j.jtbi.2008.01.006 [DOI] [PubMed] [Google Scholar]
  • 37.Ramis-Conde I, Drasdo D, Anderson AR, Chaplain MA. Modeling the influence of the E-cadherin-β-catenin pathway in cancer cell invasion: a multiscale approach. Biophysical journal. 2008;95(1):155–165. doi: 10.1529/biophysj.107.114678 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Goentoro L, Kirschner MW. Evidence that fold-change, and not absolute level, of β-catenin dictates Wnt signaling. Molecular cell. 2009;36(5):872–884. doi: 10.1016/j.molcel.2009.11.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.van Leeuwen IMM, Mirams GR, Walter A, Fletcher A, Murray P, Osborne J, et al. An integrative computational model for intestinal tissue renewal. Cell Proliferation. 2009;42(5):617–636. doi: 10.1111/j.1365-2184.2009.00627.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jensen PB, Pedersen L, Krishna S, Jensen MH. A Wnt Oscillator Model for Somitogenesis. Biophysical Journal. 2010;98(6):943–950. doi: 10.1016/j.bpj.2009.11.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Mirams GR, Byrne HM, King JR. A multiple timescale analysis of a mathematical model of the Wnt/β-catenin signalling pathway. Journal of Mathematical Biology. 2010;60(1):131–160. doi: 10.1007/s00285-009-0262-y [DOI] [PubMed] [Google Scholar]
  • 42.Murray PJ, Kang JW, Mirams GR, Shin SY, Byrne HM, Maini PK, et al. Modelling Spatially Regulated β-Catenin Dynamics and Invasion in Intestinal Crypts. Biophysical Journal. 2010;99(3):716–725. doi: 10.1016/j.bpj.2010.05.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Shin SY, Rath O, Zebisch A, Choo SM, Kolch W, Cho KH. Functional Roles of Multiple Feedback Loops in Extracellular Signal-Regulated Kinase and Wnt Signaling Pathways That Regulate Epithelial-Mesenchymal Transition. Cancer Research. 2010;70(17):6715–6724. doi: 10.1158/0008-5472.CAN-10-1377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Buske P, Galle J, Barker N, Aust G, Clevers H, Loeffler M. A Comprehensive Model of the Spatio-Temporal Stem Cell and Tissue Organisation in the Intestinal Crypt. PLoS Computational Biology. 2011;7(1):e1001045. doi: 10.1371/journal.pcbi.1001045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kogan Y, Halevi-Tobias KE, Hochman G, Baczmanska AK, Leyns L, Agur Z. A new validated mathematical model of the Wnt signalling pathway predicts effective combinational therapy by sFRP and Dkk. Biochemical Journal. 2012;444(1):115–125. doi: 10.1042/BJ20111887 [DOI] [PubMed] [Google Scholar]
  • 46.Mazemondet O, John M, Leye S, Rolfs A, Uhrmacher AM. Elucidating the Sources of β-Catenin Dynamics in Human Neural Progenitor Cells. PLoS ONE. 2012;7(8):e42792. doi: 10.1371/journal.pone.0042792 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Schmitz Y, Rateitschak K, Wolkenhauer O. Analysing the impact of nucleo-cytoplasmic shuttling of β-catenin and its antagonists APC, Axin and GSK3 on Wnt/β-catenin signalling. Cellular Signalling. 2013;25(11):2210–2221. doi: 10.1016/j.cellsig.2013.07.005 [DOI] [PubMed] [Google Scholar]
  • 48.Wang Hy, Huang Yx, Qi Yf, Zhang Y, Bao Yl, Sun Lg, et al. Mathematical models for the Notch and Wnt signaling pathways and the crosstalk between them during somitogenesis. Theoretical Biology and Medical Modelling. 2013;10(1):27. doi: 10.1186/1742-4682-10-27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Chen J, Xie ZR, Wu Y. Computational Modeling of the Interplay between Cadherin-Mediated Cell Adhesion and Wnt Signaling Pathway. PLoS ONE. 2014;9(6):e100702. doi: 10.1371/journal.pone.0100702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Tan C, Gardiner BS, Hirokawa Y, Smith DW, Burgess AW. Analysis of Wnt signaling β-catenin spatial dynamics in HEK293T cells. BMC Systems Biology. 2014;8(1):44. doi: 10.1186/1752-0509-8-44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Haack F, Lemcke H, Ewald R, Rharass T, Uhrmacher AM. Spatio-temporal Model of Endogenous ROS and Raft-Dependent WNT/Beta-Catenin Signaling Driving Cell Fate Commitment in Human Neural Progenitor Cells. PLoS Computational Biology. 2015;11(3):e1004106. doi: 10.1371/journal.pcbi.1004106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.MacLean AL, Rosen Z, Byrne HM, Harrington HA. Parameter-free methods distinguish Wnt pathway models and guide design of experiments. Proceedings of the National Academy of Sciences. 2015;112(9):2652–2657. doi: 10.1073/pnas.1416655112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Padala RR, Karnawat R, Viswanathan SB, Thakkar AV, Das AB. Cancerous perturbations within the ERK, PI3K/Akt, and Wnt/β-catenin signaling network constitutively activate inter-pathway positive feedback loops. Molecular BioSystems. 2017;13(5):830–840. doi: 10.1039/C6MB00786D [DOI] [PubMed] [Google Scholar]
  • 54.Siegle L, Schwab JD, Kühlwein SD, Lausser L, Tümpel S, Pfister AS, et al. A Boolean network of the crosstalk between IGF and Wnt signaling in aging satellite cells. PLoS ONE. 2018;13(3):e0195126. doi: 10.1371/journal.pone.0195126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Cavallo JC, Scholpp S, Flegg MB. Delay-driven oscillations via Axin2 feedback in the Wnt/β-catenin signalling pathway. Journal of Theoretical Biology. 2020;507:110458. doi: 10.1016/j.jtbi.2020.110458 [DOI] [PubMed] [Google Scholar]
  • 56.Haack F, Budde K, Uhrmacher AM. Exploring the mechanistic and temporal regulation of LRP6 endocytosis in canonical WNT signaling. Journal of Cell Science. 2020;133(15). [DOI] [PubMed] [Google Scholar]
  • 57.Staehlke S, Haack F, Waldner AC, Koczan D, Moerke C, Mueller P, et al. ROS Dependent Wnt/β-Catenin Pathway and Its Regulation on Defined Micro-Pillars—A Combined In Vitro and In Silico Study. Cells. 2020;9(8):1784. doi: 10.3390/cells9081784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ward D, Montes Olivas S, Fletcher A, Homer M, Marucci L. Cross-talk between Hippo and Wnt signalling pathways in intestinal crypts: Insights from an agent-based model. Computational and Structural Biotechnology Journal. 2020;18:230–240. doi: 10.1016/j.csbj.2019.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kofahl B, Wolf J. Mathematical modelling of Wnt/β-catenin signalling. Biochemical Society Transactions. 2010;38(5):1281–1285. doi: 10.1042/BST0381281 [DOI] [PubMed] [Google Scholar]
  • 60.Yilmaz L, Chakladar S, Doud K. The Goal-Hypothesis-Experiment Framework: A Generative Cognitive Domain Architecture for Simulation Experiment Management. In: Proceedings of the 2016 Winter Simulation Conference. WSC ‘16. IEEE Press; 2016. p. 1001–1012.
  • 61.Bergmann FT, Adams R, Moodie S, Cooper J, Glont M, Golebiewski M, et al. COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics. 2014;15(1):369–377. doi: 10.1186/s12859-014-0369-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Carusi A, Burrage K, Rodríguez B. Bridging experiments, models and simulations: an integrative approach to validation in computational cardiac electrophysiology. American Journal of Physiology-Heart and Circulatory Physiology. 2012. doi: 10.1152/ajpheart.01151.2011 [DOI] [PubMed] [Google Scholar]
  • 63.Lorig F, Lebherz DS, Berndt JO, Timm IJ. Hypothesis-Driven Experiment Design in Computer Simulation Studies. In: Proceedings of the 2017 Winter Simulation Conference. WSC ‘17. IEEE Press; 2017.
  • 64.Lee E, Salic A, Kirschner MW. Physiological regulation of β-catenin stability by Tcf3 and CK1ε. Journal of Cell Biology. 2001;154(5):983–994. doi: 10.1083/jcb.200102074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Dajani R, Fraser E, Roe SM, Yeo M, Good VM, Thompson V, et al. Structural basis for recruitment of glycogen synthase kinase 3beta to the axin-APC scaffold complex. The EMBO Journal. 2003;22(3):494–501. doi: 10.1093/emboj/cdg068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Salic A, Lee E, Mayer L, Kirschner MW. Control of Beta-Catenin Stability: Reconstitution of the Cytoplasmic Steps of the Wnt Pathway in Xenopus Egg Extracts. Molecular Cell. 2000; p. 10. [DOI] [PubMed] [Google Scholar]
  • 67.Robinson S. Conceptual modelling for simulation Part I: definition and requirements. Journal of the Operational Research Society. 2008;59(3):278–290. doi: 10.1057/palgrave.jors.2602368 [DOI] [Google Scholar]
  • 68.Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, et al. Controlled vocabularies and semantics in systems biology. Molecular Systems Biology. 2011;7(1):543. doi: 10.1038/msb.2011.77 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Juty N, le Novère N. In: Dubitzky W, Wolkenhauer O, Yokota H, Cho KH, editors. Systems biology ontology. Springer-Verlag; New York; 2013. p. 2063–2063. [Google Scholar]
  • 70.Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, et al. The Systems Biology Graphical Notation. Nature Biotechnology. 2009;27(8):735–741. doi: 10.1038/nbt.1558 [DOI] [PubMed] [Google Scholar]
  • 71.Glass L, Kauffman SA. The logical analysis of continuous, non-linear biochemical control networks. Journal of Theoretical Biology. 1973;39(1):103–129. doi: 10.1016/0022-5193(73)90208-7 [DOI] [PubMed] [Google Scholar]
  • 72.Torres NV, Santos G. The (Mathematical) Modeling Process in Biosciences. Frontiers in Genetics. 2015;6:354. doi: 10.3389/fgene.2015.00354 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Hofmeyr JHS. Kinetic modelling of compartmentalised reaction networks. Biosystems. 2020;197:104203. doi: 10.1016/j.biosystems.2020.104203 [DOI] [PubMed] [Google Scholar]
  • 74.Fisher J, Henzinger TA. Executable cell biology. Nature Biotechnology. 2007;25(11):1239–1249. doi: 10.1038/nbt1356 [DOI] [PubMed] [Google Scholar]
  • 75.Ciocchetta F, Hillston J. Bio-PEPA: A framework for the modelling and analysis of biological systems. Theoretical Computer Science. 2009;410(33-34):3065–3084. doi: 10.1016/j.tcs.2009.02.037 [DOI] [Google Scholar]
  • 76.Boemo MA, Cardelli L, Nieduszynski CA. The Beacon Calculus: A formal method for the flexible and concise modelling of biological systems. PLoS computational biology. 2020;16(3):e1007651. doi: 10.1371/journal.pcbi.1007651 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Karr JR, Takahashi K, Funahashi A. The principles of whole-cell modeling. Current opinion in microbiology. 2015;27:18–24. doi: 10.1016/j.mib.2015.06.004 [DOI] [PubMed] [Google Scholar]
  • 78.Wang RS, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of methodology and applications. Physical Biology. 2012;9(5):055001. doi: 10.1088/1478-3975/9/5/055001 [DOI] [PubMed] [Google Scholar]
  • 79.Chaouiya C. Petri net modelling of biological networks. Briefings in Bioinformatics. 2007;8(4):210–219. doi: 10.1093/bib/bbm029 [DOI] [PubMed] [Google Scholar]
  • 80.Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–531. doi: 10.1093/bioinformatics/btg015 [DOI] [PubMed] [Google Scholar]
  • 81.Lloyd CM, Halstead MD, Nielsen PF. CellML: its future, present and past. Progress in biophysics and molecular biology. 2004;85(2-3):433–450. doi: 10.1016/j.pbiomolbio.2004.01.004 [DOI] [PubMed] [Google Scholar]
  • 82.Köhn D, Le Novère N. In: Heiner M, Uhrmacher AME, editors. SED-ML—An XML Format for the Implementation of the MIASE Guidelines. vol. 5307. Springer Berlin Heidelberg; 2008. p. 176–190.
  • 83.Ewald R, Uhrmacher AM. SESSL: A domain-specific language for simulation experiments. ACM Transactions on Modeling and Computer Simulation. 2014;24(2):1–25. doi: 10.1145/2567895 [DOI] [Google Scholar]
  • 84.Wilsdorf P, Haack F, Budde K, Ruscheinski A, Uhrmacher AM. Conducting systematic, partly automated simulation studies–Unde Venis et Quo Vadis. AIP Conference Proceedings. 2020;2293(1):020001. doi: 10.1063/5.0026939 [DOI] [Google Scholar]
  • 85.Teytelman L, Stoliartchouk A, Kindler L, Hurwitz BL. Protocols.io: Virtual Communities for Protocol Development and Discussion. PLOS Biology. 2016;14(8):1–6. doi: 10.1371/journal.pbio.1002538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Vera J, Lischer C, Nenov M, Nikolov S, Lai X, Eberhardt M. Mathematical Modelling in Biomedicine: A Primer for the Curious and the Skeptic. International Journal of Molecular Sciences. 2021;22(2). doi: 10.3390/ijms22020547 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of The Royal Society Interface. 2009;6(31):187–202. doi: 10.1098/rsif.2008.0172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Palaniappan SK, Gyori BM, Liu B, Hsu D, Thiagarajan P. Statistical model checking based calibration and analysis of bio-pathway models. In: International Conference on Computational Methods in Systems Biology. Springer; 2013. p. 120–134.
  • 89.Mitra ED, Suderman R, Colvin J, Ionkov A, Hu A, Sauro HM, et al. PyBioNetFit and the biological property specification language. IScience. 2019;19:1012–1036. doi: 10.1016/j.isci.2019.08.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Jha SK, Clarke EM, Langmead CJ, Legay A, Platzer A, Zuliani P. A bayesian approach to model checking biological systems. In: International conference on computational methods in systems biology. Springer; 2009. p. 218–234.
  • 91.Agha G, Palmskog K. A Survey of Statistical Model Checking. ACM Transactions on Modeling and Computer Simulation. 2018;28(1). doi: 10.1145/3158668 [DOI] [Google Scholar]
  • 92.Balci O. Verification, Validation and Accreditation of Simulation Models. In: Proceedings of the 29th Conference on Winter Simulation. WSC ‘97. USA: IEEE Computer Society; 1997. p. 135–141.
  • 93.McPhillips T, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K, et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts; 2015.
  • 94.Murta L, Braganholo V, Chirigati F, Koop D, Freire J. noWorkflow: Capturing and Analyzing Provenance of Scripts. In: Ludäscher B, Plale B, editors. Provenance and Annotation of Data and Processes. Cham: Springer International Publishing; 2015. p. 71–83. [Google Scholar]
  • 95.Boutillier P, Maasha M, Li X, Medina-Abarca HF, Krivine J, Feret J, et al. The Kappa platform for rule-based modeling. Bioinformatics. 2018;34(13):i583–i592. doi: 10.1093/bioinformatics/bty272 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Groß A, Kracher B, Kraus JM, Kühlwein SD, Pfister AS, Wiese S, et al. Representing dynamic biological networks with multi-scale probabilistic models. Communications biology. 2019;2(1):1–12. doi: 10.1038/s42003-018-0268-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Warnke T, Helms T, Uhrmacher AM. Reproducible and flexible simulation experiments with ML-Rules and SESSL. Bioinformatics. 2017;34(8):1424–1427. doi: 10.1093/bioinformatics/btx741 [DOI] [PubMed] [Google Scholar]
  • 98.Thomas BR, Chylek LA, Colvin J, Sirimulla S, Clayton AHA, Hlavacek WS, et al. BioNetFit: a fitting tool compatible with BioNetGen, NFsim and distributed computing environments. Bioinformatics. 2015;32(5):798–800. doi: 10.1093/bioinformatics/btv655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Sorokin A, Sorokina O, Douglas Armstrong J. In: Hlavacek W, editor. RKappa: Software for Analyzing Rule-Based Models. Springer; New York; 2019. p. 363–390. [DOI] [PubMed] [Google Scholar]
  • 100.Sansone SA, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, et al. FAIRsharing as a community approach to standards, repositories and policies. Nature Biotechnology. 2019;37(4):358–367. doi: 10.1038/s41587-019-0080-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3(1):160018. doi: 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Scharm M, Waltemath D. A fully featured COMBINE archive of a simulation study on syncytial mitotic cycles in Drosophila embryos. F1000Research. 2016;5:2421. doi: 10.12688/f1000research.9379.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Iannone R. DiagrammeR: Graph/Network Visualization; 2020. Available from: https://CRAN.R-project.org/package=DiagrammeR.
  • 104.Gierer A, Meinhardt H. A theory of biological pattern formation. Kybernetik. 1972;12(1):30–39. doi: 10.1007/BF00289234 [DOI] [PubMed] [Google Scholar]
  • 105.Bafico A, Liu G, Yaniv A, Gazit A, Aaronson SA. Novel mechanism of Wnt signalling inhibition mediated by Dickkopf-1 interaction with LRP6/Arrow. Nature Cell Biology. 2001;3(7):683–686. doi: 10.1038/35083081 [DOI] [PubMed] [Google Scholar]
  • 106.Hannoush RN. Kinetics of Wnt-driven β-catenin stabilization revealed by quantitative and temporal imaging. PLoS ONE. 2008;3(10). doi: 10.1371/journal.pone.0003498 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Dequéant ML, Glynn E, Gaudenz K, Wahl M, Chen J, Mushegian A, et al. A Complex Oscillating Network of Signaling Genes Underlies the Mouse Segmentation Clock. Science. 2006;314(5805):1595–1598. doi: 10.1126/science.1133141 [DOI] [PubMed] [Google Scholar]
  • 108.Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, et al. Harmonizing semantic annotations for computational models in biology. Briefings in Bioinformatics. 2019;20(2):540–550. doi: 10.1093/bib/bby087 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009227.r001

Decision Letter 0

Jason M Haugh, Pedro Mendes

30 Mar 2021

Dear Mr. Budde,

Thank you very much for submitting your manuscript "Relating simulation studies by provenance—Developing a family of Wnt signaling models" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Jason Haugh

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Much has been made recently of the importance of reproducibility of scientific research. In biological simulation studies, such as this paper considers, this means providing the models and the analysis instructions in standard machine readable forms. This paper takes this a step further to look at the issue of provenance for these models. Namely, how are models related to previous models and experimental studies. In particular, the authors looks at a family of 19 models of the Wnt signaling pathway, which that manually link together using the PROV-DM ontology. To construct these relationships, they have developed a web tool, WebProv, to link studies with PROV-DM types and relations. As the authors point out, extracting these relationships from published studies is a highly laborious process that requires many assumptions along the way. Ideally, in the future, these provenance networks should be developed at the time of model construction during the simulation study.

This paper is an important demonstration of both the process of creating and utility of provenance networks for simulation studies. The prototype software tool presented should help facilitate future such activities. This proof-of-concept presented in this paper should become a tutorial for others that would undertaken this task for their models and simulation studies. The key issue remaining is how to motivate and facilitate others to create this information. This is not a problem that the authors can solve, but rather one that the community and the journal publishers should devote time and energy to in order to further improve the reproducibility of science.

Reviewer #2: This is an interesting paper that looks in detail at 19 Wnt related modeling papers. As a practicing modeler myself the most interesting pages started on page 14 (Provenance of individual Wnt simulation models) which discusses the various issues encountered, problems with the current ontolgies etc. I think this is the most important part of the paper from the point of view of the plos comp bio readership, such analyses have not been done as exhaustively as this one.

Some of the material, particular the descriptions of the provenance entries (which seems to dominate the paper) could be summarized in a table and the textual component moved to an appendix. This would allow the reader to get straight to the most interesting papers of the paper.

Caption: The captions to some of the figures could be improved.

Fig 2: This caption starts with the word ‘additionally’ which doesn’t sit well. Also the caption is too short, given that this is probably one of the more important figures. It took me a while to realize what the terms ASM, CSM etc meant (They were in Table 2). I would spell out these abbreviations (ASM, CSM, VSM, BSM) in the caption (Table 2 can remain unchanged), this will save the reader for having to search for their meaning. The caption would also add one sentence on how to read the figure. I know that earlier on the authors explain what an arrow means but

that was some pages away and since plos comp bio are generally no computer scientists I would add that explanation of the arrow to the caption as well.

Fig 3: In general, notation used in UML diagrams is not familiar to most modelers but in this case the diagram looks simple enough that its seems fairly self-explanatory. No action required.

Minor: Typo in caption first sentence : ‘prociding’, I’m not sure what that word means, probably a typo but not sure what word should be there instead?

Software: As it stands it is likely that very few people will use WebProv, the reason is that it requires far too much work to install, plus does it also require a backend sever?

The tool looks useful so why make it difficult to get hold of?

What I would recommend is move everything if possible to the client (including the database which doesn’t seem large) and host it as a github web page project so that when a user clicks on the url the application will show up, no installation necessary (which I think is one of the main attractions of web software) – see https://pages.github.com/. I strongly recommend something like this otherwise your work will not have the impact it should.

Documentation: There is no easily accessible documentation for the software. It looks like users are expected to download the github repo then select the indexl.html file in Docs. It would be better to host a proper (eg readthedocs) documentation on the github account itself. The only documentation link in the readme take a user to process manger 2 page.

Minor:

1. Introduction, second paragraph, first sentence, ‘conduction’, is that the right word?

2. There is no reference to the youtube video on page 4 (footnote 2), also put the github ural there as well since the github repository is mentioned.

Reviewer #3: In this manuscript, the authors present a continuation of previous work to make use of the PROV-DM to represent provenance of biosimulation studies. The authors then use this data model to encode the provenance of a number of Wnt signalling models from the literature, and use the encoded provenance knowledge to examine the relationships between these published studies. The authors have also developed an open-source web-based tool providing a graphical user interface for creating, editing, exploring, and searching the provenance knowledge encoded in this manner. This seems to be a very useful approach for capturing provenance knowledge of systems biology modelling studies and appears to be extensible to capture richer provenance semantics as the collection/recording methods improve in future and this approach is applied in different domains or different types of models.

The authors should be commended for making all the software and data used in this manuscript freely available and documented in a manner sufficient to enable others to repeat the analysis presented here.

I suggest the entire manuscript is thoroughly proof-read as some of the grammar and word choices are a bit unusual. "cell biological systems" in the abstract is one example that could be tidied up.

As the authors state, the knowledge they have extracted from the literature and encoded in the example provenance graph used in this work makes a useful contribution to the community of potential users of these Wnt signally models. I wonder if the authors have any plans or thoughts on the integration of this knowledge into a community repository, perhaps in a way that others could contribute to? For the subset of models that are available in the Biomodels database, for example, could the provenance knowledge be contributed back to the database?

Following that thought, some of the provenance knowledge captured here is similar to that represented in the Biomodels database using the "isDerivedFrom" predicate in the SBML model annotations (see for example the analysis of diabetes models presented in https://dx.doi.org/10.1038%2Fpsp.2013.30). Have the authors compared this knowledge for the subset of Wnt models available in the Biomodels database to see if similar (although less semantically rich) patterns of model evolution are present to their analysis presented in this manuscript?

Using the SBO to annotate the assumptions seems an odd choice to me. Looking at Table S1, it seems that the SBO terms are giving a very high level annotation as to the type of model entity mentioned in the assumption, but doesn't provide any semantics about what the assumption is. Looking at assumptions annotated with SBO:0000009 (kinetic constant), for example, a user can search for assumptions that have something to do with a kinetic constant, but doesn't help to examine if its an assumption based on time scale analysis (e.g., row 3) or perhaps just an assumption that certain behaviour is assumed (e.g., row 13). I wonder if something like the Evidence and Conclusion Ontology (https://evidenceontology.org/) might provide a source of more meaningful terms to use in annotating assumptions? I may simply be missing something here, so perhaps a bit more explanation about how the SBO annotations are being used to annotate assumptions would help clarify things (or future work to extend the current work with enriched semantics?).

The authors define a minimal set of PROV-DM entities and activity types they have found useful for capturing provenance information of simulation studies when extracting provenance knowledge from the published literature. This minimal set does seem sufficient for the Wnt signalling demonstration presented here and the authors briefly explore how this set could be expanded in future. But I worry that the wet-lab data entity seems under-specified and perhaps less useful than it could be. While I understand that often in the literature the source of experimental data is not clearly described, with the recent growth of platforms like https://www.protocols.io/ which enable scientists to provide rich descriptions of their protocols in a reusable manner, I wonder if the authors have considered how to incorporate that type of knowledge into their provenance graphs?

Minor comments

--------------

It may not be obvious to the reader exactly what PROV is when first mentioned in the abstract.

The assertion in the abstract that this provenance information is all that is required to answer the question of an "appropriate starting point" is perhaps overstating things. The provenance information contributes to that answer, but it is not the only knowledge that is required to make an informed decision.

Figure 3 caption: "prociding" - perhaps meant to be providing?

I completely agree with the authors that provenance information should be collected during the simulation study, but I wonder if the authors have given any thought to how their WebProv tool could be utilised as part of a typical modelling lifecycle to help encourage modellers to do so?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Chris J. Myers

Reviewer #2: No

Reviewer #3: Yes: David P Nickerson

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009227.r003

Decision Letter 1

Jason M Haugh, Pedro Mendes

29 Jun 2021

Dear Mr. Budde,

We are pleased to inform you that your manuscript 'Relating simulation studies by provenance—Developing a family of Wnt signaling models' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Jason Haugh

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: None

Reviewer #3: I thank the authors for their detailed response to my orginal review.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: David P. Nickerson

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1009227.r004

Acceptance letter

Jason M Haugh, Pedro Mendes

23 Jul 2021

PCOMPBIOL-D-21-00298R1

Relating simulation studies by provenance—Developing a family of Wnt signaling models

Dear Dr Budde,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Additional references for entities used by Wnt simulation studies.

    We show the references to additional studies that contain entities used by some of the 19 Wnt simulation studies.

    (PDF)

    S1 Data. Complete provenance information from 19 Wnt simulation studies.

    This file contains the provenance information from the 19 analyzed simulation studies of the Wnt signaling pathway. It was exported from WebProv and may be imported into another instance of the tool.

    (JSON)

    S1 Fig. Provenance graph of all 19 Wnt/β-catenin simulation studies and their depending studies using a circular layout.

    Studies which include additional pathways have been colored.

    (PDF)

    S1 Table. Categorized assumptions.

    We present the results of the categorization of all assumptions found in the 19 simulation studies using SBO. We have have also added information about the key words that accompanied the assumptions.

    (CSV)

    S2 Table. Categorized simulation experiments.

    We present the results of the categorization of the simulation experiments found in the 19 simulation studies using our categories.

    (CSV)

    Attachment

    Submitted filename: response.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files. Additional material is available at https://github.com/SFB-ELAINE/SI_Provenance_Wnt_Family.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES