A Novel Method to Verify Multilevel Computational Models of Biological Systems Using Multiscale Spatio-Temporal Meta Model Checking

Ovidiu Pârvu; David Gilbert

doi:10.1371/journal.pone.0154847

. 2016 May 17;11(5):e0154847. doi: 10.1371/journal.pone.0154847

A Novel Method to Verify Multilevel Computational Models of Biological Systems Using Multiscale Spatio-Temporal Meta Model Checking

Ovidiu Pârvu ^1,^*, David Gilbert ¹

Editor: Attila Csikász-Nagy²

PMCID: PMC4871515 PMID: 27187178

Abstract

Insights gained from multilevel computational models of biological systems can be translated into real-life applications only if the model correctness has been verified first. One of the most frequently employed in silico techniques for computational model verification is model checking. Traditional model checking approaches only consider the evolution of numeric values, such as concentrations, over time and are appropriate for computational models of small scale systems (e.g. intracellular networks). However for gaining a systems level understanding of how biological organisms function it is essential to consider more complex large scale biological systems (e.g. organs). Verifying computational models of such systems requires capturing both how numeric values and properties of (emergent) spatial structures (e.g. area of multicellular population) change over time and across multiple levels of organization, which are not considered by existing model checking approaches. To address this limitation we have developed a novel approximate probabilistic multiscale spatio-temporal meta model checking methodology for verifying multilevel computational models relative to specifications describing the desired/expected system behaviour. The methodology is generic and supports computational models encoded using various high-level modelling formalisms because it is defined relative to time series data and not the models used to generate it. In addition, the methodology can be automatically adapted to case study specific types of spatial structures and properties using the spatio-temporal meta model checking concept. To automate the computational model verification process we have implemented the model checking approach in the software tool Mule (http://mule.modelchecking.org). Its applicability is illustrated against four systems biology computational models previously published in the literature encoding the rat cardiovascular system dynamics, the uterine contractions of labour, the Xenopus laevis cell cycle and the acute inflammation of the gut and lung. Our methodology and software will enable computational biologists to efficiently develop reliable multilevel computational models of biological systems.

Introduction

Multilevel computational models of complex biological systems are abstract representations of living systems that span multiple levels of organization. They encode the hierarchical organization of biological systems explicitly, and therefore enable reasoning about how events initiated at one level of organization reflect across multiple levels of organization. In systems biology [1, 2] multilevel, also commonly referred to as multiscale [3] computational models can be employed for gaining a better understanding of the underlying mechanisms of living systems, and to generate new hypotheses for driving experimental studies. Conversely in systems medicine it is argued [4] that multilevel computational models could potentially facilitate delivering personalized treatments by providing a patient specific understanding of how diseases and their treatment reflect across multiple levels of organization [5].

However any insights gained from model simulation results can be successfully translated into real-life applications only if the correctness of the models has been verified first. Computational models of biological systems can be validated either in the in vitro environment by checking if the model simulation results can be reproduced experimentally, or in the in silico environment by verifying if the model simulation results conform to a formal specification describing the desired/expected system behaviour. An in silico approach that automates the process of verifying models relative to formal specifications is called model checking [6, 7]; see S1 Text for a brief description of model checking. Due to the complex, stochastic nature of biological systems only approximate probabilistic model checking approaches are considered throughout this paper.

Validating multilevel computational models in the in vitro environment is challenging because there is a need for experimental data from all levels of organization and the interactions between different levels, which is often not available. Moreover in vitro validation procedures need to account for the variability inherent in biological systems [8, 9] which can be of different orders of magnitude at different levels. Conversely, verifying multilevel computational models in the in silico environment is challenging because there is a lack of model checking approaches that can explicitly distinguish between different levels of organization. Existing model checking approaches can be employed to verify submodels corresponding to each level of organization individually without the possibility of referring to interactions between different levels.

In this paper we address this issue by developing a novel multiscale model checking methodology for automatically verifying multilevel computational models relative to given specifications. Our approach is generic and supports computational models encoded using various high-level modelling formalisms because it is defined relative to time series data representing the model simulation results and not the models themselves. Moreover our methodology could be potentially employed for analysing time series data recorded in the wet-lab as well. This could enable checking if a computational model correctly describes a physical system, or that a physical system correctly implements an in silico design, but this is beyond the scope of this paper.

Both spatial and non-spatial computational models can be verified using our approach. The specifications against which the computational models are verified can describe both how numeric values (e.g. concentration of protein X) and properties of (emergent) spatial structures, called spatial entities, (e.g. area of multicellular population) are expected to change over time and across multiple levels of organization. For instance, assuming we would like to verify a computational model describing tumour growth, the specification could state that if the concentration of protein X in a cancerous cell rises above a certain threshold level (e.g. 0.8 M), then the cell will divide and the cellular density or area of the tumour (structure) will increase.

Assuming that the computational model considered is spatial, the type of spatial entities and their properties, called spatial measures, can differ between case studies. For instance given a tumour growth computational model one could be potentially interested in how the area of the tumour structure changes over time, whereas in case of a migrating multicellular population tracking the position of the population over time could be of interest.

We defined an abstraction of our approach, called multiscale spatio-temporal meta model checking that enables the automatic reconfiguration of the model checking methodology according to case study specific spatial entity types and measures. The spatio-temporal meta model checking approach resembles the meta-programming [10] concept from computer science where an abstract type is defined that acts as a template for creating specific type instances tailored to particular applications. Our spatio-temporal meta model checking approach is not restricted to biologically relevant spatial entity types and properties, and therefore could be employed to adapt the methodology to case studies from other fields of science. However we do not illustrate this in this paper. Due to the intended general applicability of the approach, and the fact that hierarchical systems in multiple domains of science (e.g. astrophysics, energy, engineering, environmental science and materials science [11]) are commonly referred to as multiscale, our approach is called multiscale rather than multilevel spatio-temporal meta model checking.

To enable the automatic verification of multilevel computational models of biological systems relative to formal specifications we have implemented the model checking method in the software tool Mule which is made freely available online (http://mule.modelchecking.org) in binary and source code format. Moreover a Docker [12] image has been created that provides a self-contained environment for running Mule without additional setup on all major operating systems.

We illustrate the applicability of Mule by verifying the correctness of four multilevel computational models previously published in the literature. The models considered are of different complexity, have been encoded using different modelling formalisms and software, are deterministic, stochastic or hybrid, and encode space explicitly or not. The case studies corresponding to the four multilevel computational models are the rat cardiovascular system dynamics [13], the uterine contractions of labour [14], the Xenopus laevis cell cycle [15], and the acute inflammation of the gut and lung [16]. The formal specifications against which the models are verified were derived from the original papers introducing the models. The main reason for this is that in the following we focus on describing the model verification methodology and not on presenting novel biologically relevant results.

In brief, the main contributions of our paper are:

Definition of a multiscale spatio-temporal model checking methodology for verifying multilevel computational models of biological systems relative to formal specifications describing the desired/expected system behaviour.
Definition of the spatio-temporal meta model checking concept which enables automatically reconfiguring the methodology according to case study specific spatial entity types and measures.
Implementation of the multiscale spatio-temporal meta model checking approach in the freely available software Mule. Both Bayesian and frequentist model checking algorithms can be employed to verify multilevel computational models (considering user-defined error bounds).
Illustrative examples of how to verify multilevel computational models of biological systems using multiscale spatio-temporal meta model checking.

Related work

In computational (systems) biology, model checking approaches have been employed for model verification [17–32], parameter estimation/synthesis [33–42], model construction (i.e. both model parameters and structure/topology) [43, 44], and robustness computation (considering various perturbations) [39, 44–47]; see recent review papers [48–50] for a more detailed description.

One common characteristic of these model checking approaches is that they only consider how numeric values (e.g. concentrations) change over time. They are appropriate for small scale systems where the spatial domain is usually not represented explicitly (e.g. cell cycle [23, 27, 32, 36, 44, 46, 51], gene expression/regulatory networks [20, 35, 39, 52, 53], signalling pathways [17, 22, 25, 28–30, 38, 46, 54–56]). These model checking approaches cannot be directly employed to verify either spatial computational models because they do not consider how spatial properties change over time, or multilevel computational models because they do not distinguish between different levels of organization.

In previous work [57] we have defined a model checking methodology which enables verifying computational models of biological systems with respect to both how numeric values and spatial properties change over time. However the main limitation of this approach is that it cannot explicitly distinguish between different levels of organization and therefore cannot be employed to verify multilevel computational models of biological systems. Moreover the types of spatial entities and measures are hardcoded in the methodology and cannot be reconfigured according to the model verification requirements of different case studies.

Methods

Using the novel model checking approach introduced in this paper multilevel computational models of biological systems can be verified relative to formal specifications as described by the workflow depicted in Fig 1, which comprises four steps:

Fig 1 — The first step (1) in the workflow is using biological observations and/or information from the literature to construct the multilevel computational model of the biological system considered. Next (2) the model is simulated to produce time series data in which spatial entities from multiple scales are automatically detected and analysed using a multiscale spatio-temporal analysis module. Then (3) the specification against which the model is verified is translated from natural language to a formal multiscale spatio-temporal language called PBLMSTL. Finally (4) using the model checker Mule the model is automatically verified relative to the given PBLMSTL specification considering the processed time series data representing the modelled system behaviour. If the model is declared incorrect relative to the given specification then it is updated and the steps (2) and (4) are repeated.

Model construction: Using biological observations and/or relevant references from the literature to construct the computational model.
Multiscale spatio-temporal analysis: Each time the model is simulated time series data are generated in which spatial entities from multiple scales are automatically detected and analysed.
Formal specification: The specification of the system is mapped from natural language into formal logic.
Model checking: The model checker takes as input the processed time series data (representing the behaviour of the modelled system) and the formal specification, and verifies if the model is correct relative to the specification using the model checking algorithm chosen by the user (e.g. frequentist statistical model checking). In the case that the model is incorrect it is updated and verified again.

Model construction

The biological systems considered here are assumed to be inherently complex, stochastic, and to span multiple levels of organization [58], where different levels of organization correspond to different spatio-temporal scales. Moreover we assume in the following that biological systems which are multilevel (i.e. span multiple levels of biological organization) are inherently multiscale (i.e. span multiple spatio-temporal scales). Therefore the terms multiscale and multilevel are used interchangeably in this paper. However, since our methodology is “multiscale” instead of “multilevel” we will refer to “scales” rather than “levels” when describing it. The multiscale system representation is assumed to be hierarchical, with the most coarse-grained scales represented at the top of the hierarchy and the most fine-grained scales at the bottom. Time can be represented either in a discrete (using non-negative integer values) or continuous (using non-negative real values) manner. Whenever space is represented explicitly, we assume throughout, similarly to our previous work [57], that it is discretised and represented in pseudo-3D i.e. 2D space in which pile up is allowed, where the degree of pile up for each spatial position is computed using a density measure (e.g. representing cellular density). However adapting the methodology to other numbers of spatial dimensions requires minor changes which are described later. Furthermore we consider that the behaviour of such systems can be represented as sequences of discrete states where the system probabilistically transitions between states only when an event (e.g. a biochemical reaction) occurs.

Such systems are usually represented using high-level modelling languages (e.g. agent based models, cellular automata etc.), examples of which are given in the Results section. However, for model checking purposes, the behaviour of the computational models is usually described using an equivalent low level representation (e.g. a state transition system). The main reason for this is to enable defining the model checking algorithms relative to a single common rather than multiple different model representations.

Low level modelling formalisms often employed to encode systems that have the above mentioned properties are stochastic discrete-event systems (SDES) [59] when no constraint is imposed on the representation of time, respectively discrete-time/continuous-time Markov chains (DTMC/CTMC) when time is assumed to be discrete/continuous. One limitation of SDESs (and DTMCs/CTMCs) is that they do not explicitly distinguish between how numeric and spatial properties of the system change over time and across multiple scales. An extension of SDESs called stochastic spatial discrete-event systems (SSpDES) was defined in [57] to enable explicitly differentiating between numeric and spatial properties. However, similarly to SDESs, SSpDESs do not enable distinguishing between different scales.

In order to address this issue a multiscale extension of SSpDESs called Multiscale Stochastic Spatial Discrete Event Systems, or MSSpDES for short, is defined next. Formally an MSSpDES $M$ is a 9-tuple 〈S, T, μ, NSV, SpSV, NV, CSpV, MA, SVSS〉 where:

S = {s₀, s₁, …, s_k} is the set containing all possible states of the system.
T is the set representing time and it is typically equal to the set of non-negative integer numbers in case of a discrete-time representation (i.e. T = ℤ₊), respectively the set of non-negative real numbers in case of a continuous-time representation (i.e. T = ℝ₊).
μ is a probability measure employed to compute the probability of the system to transition along the sequences of states described by a collection of model simulation traces. In case of biological systems it is often assumed that the Markov (memoryless) property holds i.e. the probability of the systems to transition between states depends only on the current and not on previous states. Considering this assumption, if a discrete-time representation is employed then μ is defined similarly as for DTMCs [60] relative to a transition probability function P: S × S → [0, 1] which records the probability of transitioning between any two states s_i, s_j ∈ S. Conversely, if a continuous-time representation is employed then μ is defined similarly as for CTMCs [61] considering a transition rate matrix Q: S × S → ℝ which records the rate at which a system transitions between any two states s_i, s_j ∈ S and from which the corresponding state transition probabilities can be derived.
NSV = {nsv₁, nsv₂, …, nsv_l} is the set of numeric state variables describing the state of the system.
SpSV = {spsv₁, spsv₂, …, spsv_m} is the set of spatial state variables describing the state of the system.
NV: S × NSV → ℝ is the numeric value assignment function employed to compute for a given state of the system s ∈ S the value val_NSV ∈ ℝ of the numeric state variable nsv ∈ NSV, where val_NSV = NV(s, nsv).
CSpV = {SpV₁, SpV₂, …, SpV_n} is the collection of spatial value assignment functions, where each spatial value assignment function SpV_i ∈ CSpV, $S p V_{i} : S \times S p S V \to ℝ^{m_{i} \times n_{i}}$ , is employed to compute for a given state of the system s ∈ S the value $v a l_{S p S V} \in ℝ^{m_{i} \times n_{i}}$ of spatial state variable spsv ∈ SpSV that corresponds to a discretised spatial domain of size m_i × n_i, where val_SpSV = SpV_i(s, spsv).
MA = (V_MA, E_MA) is the multiscale architecture graph encoding the hierarchical multiscale structure of the system under consideration.
SVSS: NSV ∪ SpSV → V_MA is the state variable scale and subsystem assignment function which associates each state variable sv ∈ NSV ∪ SpSV with a vertex v_scsubsys ∈ V_MA encoding a particular scale and subsystem, where v_scsubsys = SVSS(sv).

The multiscale architecture graph MA = (V_MA, E_MA) is employed to formally encode the hierarchical top-down structure of multiscale systems and is represented as a rooted (directed) tree, where V_MA represents the set of vertices and E_MA the set of directed edges. The main reason for choosing the rooted directed tree representation is that its structure is inherently hierarchical and therefore similar to the organization of biological organisms. We assume throughout that vertices higher in the tree correspond to coarse-grained scales, and vertices lower in the tree correspond to fine-grained scales. Each vertex v ∈ V_MA is encoded as a tuple (sc, subsys) where subsys represents a particular biological subsystem (e.g. heart) and sc its corresponding scale (e.g. organ). Both scales and subsystems are recorded by the MA graph to enable distinguishing between different scales (e.g. organ and cellular), and/or different subsystems (e.g. heart and liver) corresponding to the same scale (e.g. organ). Directed edges (v, v_i) ∈ E_MA, $i = \bar{1, m}$ , link the biological subsystem represented by vertex v to all its m constituent subsystems from finer-grained scales represented by vertices v_i.

The assumption made here is that biological systems can be decomposed in a top-down manner from coarse-grained (e.g. population/organism) to fine-grained (e.g. intracellular/molecular) scales. Moreover at each scale (e.g. organ) one or multiple biological subsystems (e.g. heart and kidney) could be explicitly considered. The number and type of biological subsystems and/or scales considered differs depending on the biological question addressed. A description of how to construct the MA graph corresponding to a given biological system is given in S2 Text.

Considering that the MA graph is represented as a rooted directed tree, a strict partial order < can be defined over the set of vertices V_MA, where v₁ < v₂, for all v₁, v₂ ∈ V_MA, if the unique path from the root to v₁ passes through v₂. Similarly a non-strict partial order ≤ can be defined over V_MA, where v₁ ≤ v₂ if the unique path from the root to v₁ passes through v₂, or v₁ = v₂. One of the main practical benefits of defining these partial orders is that they enable writing expressions for referring to all subsystems v_i of a system v_j (v_i ≤ v_j), and all ancestor/parent systems v_k of a subsystem v_l (v_l < v_k) in a concise manner. Therefore such expressions could be employed to write shorter formal specifications against which the computational models are verified.

A simple illustrative example of how to construct a (discrete-time) MSSpDES model for a biological system spanning multiple levels of organization is given below.

Example 1 Simple illustrative example of how to construct an MSSpDES model

Let us assume that we would like to model the movement (considering the von Neumann neighbourhood relation) of a unicellular microorganism in a fixed size environment (here a discretised rectangular grid of size 2 × 2). In order to move, the cell requires energy which it can chemically convert from an abstractly denoted nutrient A; the chemical reaction for converting A to energy is A → Energy. If nutrient A is available intracellularly then it can be converted directly to energy. Otherwise it has to be assimilated from the environment first; the cell can only assimilate nutrients from the position of the discretised space which it currently occupies. The probability of the cell to move is 20%, respectively 30% to convert A to energy and 50% to assimilate A from the environment.

Although the system considered in this example is much simpler than a real-life one, it suffices to illustrate the principles of abstractly representing a multiscale stochastic spatial discrete-event system. Throughout this example a discrete time representation is employed.

The spatial state variables employed to describe the behaviour of the system are Cell—encoding the position of the cell in the discretised space, and A_extracellular—representing the distribution of nutrient A in the environment. Conversely the employed numeric state variables are A_intracellular—encoding the intracellular availability of nutrient A, and Energy—representing the cell’s energy supply. The considered subsystems and corresponding scales are energy production reaction network at the intracellular scale, microorganism at the cellular scale, and growth media at the environment scale. State variables associated with the energy production reaction network (intracellular scale) are A_intracellular and Energy, respectively Cell with the microorganism (cellular scale), and A_extracellular with the growth media (environment scale). In the initial state (S₀) of the system, depicted in Fig 2, the cell is positioned in the lower right part of the environment, A_extracellular is uniformly distributed across the entire environment (A_extracellular[i, j] = 1, for all $i, j = \bar{1, 2}$ ), and the initial levels of A_intracellular and Energy are zero.

Fig 2 — *Cell* and A_extracellular are the spatial state variables representing the position of the cell, respectively distribution of nutrient A in the environment. A_intracellular and *Energy* represent the intracellular availability of nutrient A, respectively energy.

Starting from the initial state S₀ the system can (in)directly transition to any of the states depicted in Fig 3.

Given that in S₀ the cell has no supplies of intracellular nutrient A or energy, the only possible action is for it to assimilate A from its environment (S₀ → S₁, probability 100%). Since only one supply of nutrient A is available the only possible next action is to convert the newly gained intracellular A supply to energy (S₁ → S₂, probability 100%). Once a supply of energy is available the cell can move either above (S₂ → S₄) or to its left (S₂ → S₃). The probability of moving to either of the neighbouring positions is therefore equal to 100% / 2 = 50%. Continuing from either state S₃ or S₄ the cell will try to assimilate new A nutrient supplies, which can be converted to energy and then used to move in the environment. This process is repeated multiple times until the cell reaches a state in which it has no A nutrients available extracellularly/intracellularly, respectively no supplies of energy (i.e. S₁₀, S₁₁, S₁₈, S₁₉, S₂₅, S₂₆). In such cases the cell becomes dormant and the system reaches its final state.

Using the notations above we formally define the corresponding MSSpDES model $M$ and (state) transition probability function P as follows:

$M$ = 〈S, T, μ, NSV, SpSV, NV, CSpV, MA, SVSS〉, where:
- S = {S₀, S₁, S₂, S₃, S₄, S₅, S₆, S₇, S₈, S₉, S₁₀, S₁₁, S₁₂, S₁₃, S₁₄, S₁₅, S₁₆, S₁₇, S₁₈, S₁₉, S₂₀, S₂₁, S₂₂, S₂₃, S₂₄, S₂₅, S₂₆}.
- T = ℤ₊ is the set representing time.
- μ is the function used to compute the probability associated with a set of paths Paths(S₀) starting from S₀ having a common finite prefix σ_finite = {s₀, s₁, …, s_n}, which means that for all σ ∈ Paths(S₀), $σ [i] = σ_{f i n i t e} [i] = s_{i}, i = \bar{0, n}$ , where σ[i] denotes the i-th state in σ. The probability value corresponding to Paths(S₀) is computed by multiplying the probabilities of the state transitions associated with the common finite path prefix σ_finite. For instance given the finite state sequence σ_finite = {S₀, S₁, S₂, S₃, S₅, S₇, S₁₀}, μ({σ ∈ Paths(S₀) | σ[i] = σ_finite[i], 0 ≤ i ≤ 6}) = P(S₀, S₁) ⋅ P(S₁, S₂) ⋅ P(S₂, S₃) ⋅ P(S₃, S₅) ⋅ P(S₅, S₇) ⋅ P(S₇, S₁₀), where the probability values P(S_i, S_j) with S_i, S_j ∈ S are recorded by the transition probability function P provided below.
- NSV = {A_intracellular,Energy}, and NV is the function used to compute the value of A_intracellular and Energy in a given state of a computation path. The values of the numeric state variables for each state (e.g. NV(S₀, Energy) = 0) are depicted in Fig 3 and therefore will not be explicitly restated here.
- SpSV = {Cell, A_extracellular}, and CSpV = {SpV} is the collection containing the spatial value assignment function SpV used to evaluate Cell and A_extracellular in a given state of a computation path. The values of the spatial state variables for each state (e.g. SpV(S₀,Cell) = [0, 0;0, 1]) are depicted in Fig 3 and therefore will not be explicitly restated here.
- MA is the multiscale architecture graph depicted in Fig 4 encoding the hierarchical organization of the considered subsystems, namely the growth media (environment scale), the microorganism (cellular scale) and the energy production reaction network (intracellular scale).
- SVSS is the state variable scale and subsystem assignment function which associates state variables to particular subsystems encoded as vertices in the MA graph. The values returned by SVSS for the considered state variables are: SVSS(A_intracellular) = (Intracellular, EnergyProductionReactionNetwork), SVSS(Energy) = (Intracellular, EnergyProductionReactionNetwork), SVSS(Cell) = (Cellular, Microorganism), and SVSS(A_extracellular) = (Environment, GrowthMedia).
P is the transition probability function which records the probability of transitioning between any two states of the system s_i, s_j ∈ S. Due to page size constraints it is not possible to represent P explicitly. Instead only its non-zero entries are given below: P(S₀, S₁) = 100%, P(S₁, S₂) = 100%, P(S₂, S₃) = 50%, P(S₂, S₄) = 50%, P(S₃, S₅) = 100%, P(S₄, S₆) = 100%, P(S₅, S₇) = 100%, P(S₆, S₈) = 100%, P(S₇, S₉) = 50%, P(S₇, S₁₀) = 50%, P(S₈, S₁₁) = 50%, P(S₈, S₁₂) = 50%, P(S₉, S₁₃) = 100%, P(S₁₂, S₁₄) = 100%, P(S₁₃, S₁₅) = 100%, P(S₁₄, S₁₆) = 100%, P(S₁₅, S₁₇) = 50%, P(S₁₅, S₁₈) = 50%, P(S₁₆, S₁₉) = 50%, P(S₁₆, S₂₀) = 50%, P(S₁₇, S₂₁) = 100%, P(S₂₀, S₂₂) = 100%, P(S₂₁, S₂₃) = 100%, P(S₂₂, S₂₄) = 100%, P(S₂₃, S₂₅) = 50%, P(S₂₃, S₂₆) = 50%, P(S₂₄, S₂₅) = 50%, P(S₂₄, S₂₆) = 50%.

Fig 4 — Each vertex in the graph (e.g. (Environment, GrowthMedia)) corresponds to a subsystem (e.g. growth media) and its associated scale (e.g. environment). Directed edges between vertices (e.g. ((Environment, GrowthMedia), (Cellular, Microorganism))) indicate how one subsystem from a coarse-grained scale (e.g. (Environment, GrowthMedia)) can be decomposed in one or multiple subsystems from more fine-grained scales (e.g. (Cellular, Microorganism)).

In spite of the simplicity of the scenario described above the same model development principles apply to more complex multiscale real-life systems. However due to the inherent complexity of such systems the size of the state space is expected to be larger.

The main reason for encoding multiscale stochastic biological systems using a low-level modelling formalism such as MSSpDES is to enable our model checking approach to be employed for the general class of SDESs, which MSSpDESs extend, instead of restricting it to a particular high-level modelling formalism.

Although MSSpDES models are restricted to a two-dimensional spatial representation (see codomain of spatial value assignment functions SpV_i ∈ CSpV), extending the models from a two- to, for instance three-dimensional spatial representation, requires only replacing the codomain $ℝ^{m_{i} \times n_{i}}$ of each SpV_i ∈ CSpV with $ℝ^{m_{i} \times n_{i} \times p_{i}}$ .

MSSpDESs are multiscale extensions of SSpDESs 〈S, Tr, μ, NSV, SpSV, NV, SpV〉, where the semantics of S, μ, NSV, SpSV and NV is preserved, the transition rates matrix Tr was replaced by the set T representing time and the state transition probabilities are defined by a transition probability function P for discrete-time systems, respectively are derived from a transition rates matrix Q for continuous-time systems. The single spatial value assignment function SpV in an SSpDES is replaced by CSpV, the MA graph is defined to explicitly encode the hierarchical representation of the systems under consideration, and SVSS is introduced to associate state variables with particular scales and subsystems encoded as vertices in the MA graph. The main advantage of defining MSSpDESs as extensions of SSpDESs is backwards compatibility. SSpDESs can be encoded as MSSpDESs where the set T and probability measure μ are defined accordingly, CSpV contains a single element SpV, and the MA graph contains only one vertex to which all state variables are assigned using SVSS. Due to this, multiple SSpDESs employing the same representation of time can be easily integrated into a single MSSpDES by defining the set T and probability measure μ accordingly, gathering all spatial value assignment functions SpV into a single collection, constructing a corresponding MA graph, mapping state variables to appropriate vertices in the graph and adding interactions between submodels.

Multiscale spatio-temporal analysis

Detection and analysis of spatial entities

Let us denote execution traces (or time series data) generated by MSSpDES models as σ = {(s₀, t₀), (s₁, t₁), …}, where s₀, s₁, … represent the states of the execution trace and t₀, t₁, … the time durations spent in each corresponding state. Typically in case of a continuous-time representation the time durations are represented by non-negative real values t₀, t₁, … ∈ ℝ₊, whereas in case of a discrete-time representation by non-negative integer values t₀, t₁, … ∈ ℤ₊.

Given an execution trace σ = {(s₀, t₀), (s₁, t₁), …}, a numeric state variable nsv and a spatial state variable spsv, it is possible to reason about how the values of nsv and spsv change over time by evaluating them for each state in σ using NV(s₀, nsv), NV(s₁, nsv), …, respectively SpV(s₀, spsv), SpV(s₁, spsv), …. Although the sequence SpV(s₀, spsv), SpV(s₁, spsv), … describes how the entire discretised spatial domain $D S D = ℝ^{m_{s p s v} \times n_{s p s v}}$ corresponding to spsv changes over time, we are interested in reasoning about how emergent spatial structures, called spatial entities, identified by subsets of positions in DSD change over time. For instance assuming that spsv records the cellular density in a 2D environment DSD and that we would like to reason about spatial entities denoting multicellular populations, then only the subsets comprising at least x (e.g. x = 20) neighbouring positions in DSD having the cellular density value greater than 0 would be considered. To reason about such spatial entities there is a need for an additional processing step which automatically detects and analyses how the spatial entities change over time.

This processing step is denoted as the multiscale spatio-temporal analysis and its associated workflow is depicted in Fig 5. The first step in the workflow is to split up the time series data corresponding to all spatial state variables such that each resulting time subseries corresponds to a single subsystem and scale. Next each time subseries is passed to a uniscale spatio-temporal analysis module which automatically detects, analyses and annotates spatial entities with their corresponding scale and subsystem. Finally, during the last step the collections of detected spatial entities are merged such that spatial entities corresponding to the same time point are grouped together.

The uniscale spatio-temporal analysis module assumes that the problem of detecting and analysing spatial entities at a given time point is transformed into an image processing problem. This transformation is possible because the spatial domain is assumed to be discretised and (the value of) each position in the discretised space can be mapped to (the intensity of) a pixel in an image. One of the main advantages of this is that existing image processing approaches for detecting and analysing objects in images can be directly reused.

We define parameterized detection and analysis modules for two generic types of spatial entities, namely regions and clusters [57].

Regions represent subsets of neighbouring positions in the discretised space (considering the Moore neighbourhood relation) with associated values (e.g. concentrations) above a user-defined threshold. For instance considering a computational model that encodes the evolution of a population of cells in a 2D environment, regions could represent patches of neighbouring cells where the cellular density is greater than a user-defined value. More formally a region R is defined with respect to a state s and spatial state variable spsv as a subset ${0, 1}^{m_{s p s v} \times n_{s p s v}}$ (i.e. positions of the discretised space included in R are marked with 1, all others with 0) of neighbouring positions in Sp V(s,spsv) such that for all positions of the discretised space (i, j) ∈ R marked with 1, the corresponding value Sp V(s, spsv)[i, j] ≥ THRESHOLD, and the number of positions included in R is greater than ϵ_size, where THRESHOLD ∈ ℝ, ϵ_size ∈ ℕ are user-defined parameters. The module for detecting and analysing regions is an implementation of Algorithm 1 in [57] using image processing functions from the open source Computer Vision library OpenCV [62].

Conversely clusters represent subsets of neighbouring regions in the discretised space where the maximum distance between two neighbouring regions is bounded above by a user-defined threshold. For instance considering again the computational model encoding the evolution of a population of cells, clusters could represent groups of patches of cells where the distance between neighbouring patches is less or equal to a user-defined threshold value. Clusters are computed using an improved version of the DBSCAN algorithm [63]. The output of this algorithm depends on the given set of regions REG, the pseudometric d used to compute the distance between any two regions in REG, the maximum distance ϵ_distance between two neighbouring regions, and the minimum number of regions ϵ_size neighbouring a core region, where a region is denoted as core if its number of neighbouring regions is greater or equal to ϵ_size. The pseudometric d considered here is defined with respect to a set of regions REG, d: REG × REG → ℝ₊, $d (A, B) = \sqrt{{(x_{B} - x_{A})}^{2} + {(y_{B} - y_{A})}^{2}}$ , where (x_A, y_A) and (x_B, y_B) are the centroids of regions A, respectively B. Moreover two regions REG₁, REG_n ∈ REG are called density-reachable if there exists a sequence of regions REG₁, REG₂, …, REG_n ∈ REG, where i ≥ 1 and n ≥ 2 such that for all i < n, REG_i is a core region, and REG_i+1 is a neighbour of REG_i. Using the notations above a cluster C is defined as a maximal subset ${0, 1}^{m_{1} \times n_{1}} \times {0, 1}^{m_{2} \times n_{2}} \times . . . \times {0, 1}^{m_{p} \times n_{p}}$ (i.e. regions’ positions included in C are marked with 1, all others with 0) of the given set of regions REG = {REG₁, REG₂, …, REG_p} such that all regions in C are density-reachable from an arbitrary core region of C [63].

Each detected region/cluster is characterized by a set of general quantitative spatial measures that enable describing how the spatial entity changes over time. A description of the set of spatial measures considered is given in Table 1.

Table 1. Description of the spatial measures considered.

Name	Values	Description
clusteredness	[0, 1]	Indicates if regions contain holes (clusteredness <1) or not (clusteredness = 1), respectively measures if the average distance between all positions considered in a cluster is small (clusteredness →1) or large (clusteredness →0).
density	[0, 1]	Computes the average value associated with the discretised spatial positions defining a region/cluster.
area	ℝ₊	Represents the number of positions in the discretised space associated with a region/cluster.
perimeter	ℝ₊	Represents the length of the outer contour of a region, respectively the convex hull of a cluster.
distance from the origin	ℝ₊	Computes the minimum distance between the outer contour of a region, respectively the convex hull of a cluster, and the centre point of the discretised spatial domain.
angle	[0, 360] (degrees)	Determined by the lines that pass through the discretised spatial domain’s centre point and are tangent to a region’s outer contour, respectively cluster’s convex hull.
triangle/rectangle/circle measure	[0, 1]	Indicates if the shape of the region’s outer contour, respectively cluster’s convex hull, is similar to a triangle/rectangle/circle (triangle/rectangle/circle measure →1) or not (triangle/rectangle/circle measure →0).
centroid Ox/Oy coordinate	ℝ₊	Represents the Ox/Oy coordinate of the geometric centre of the region’s outer contour, respectively cluster’s convex hull.

Open in a new tab

Each spatial measure considered has a name (column “Name”), an associated range of valid values (column “Values”) and a corresponding description (column “Description”). In case of spatial measures which have similar semantics the table rows have been merged and the spatial measure names are separated by the “/” symbol (see last two table rows).

The spatial entity types and measures were chosen relative to the case studies considered here. Therefore depending on case study specific requirements different sets of spatial entity types and/or measures may need to be employed. For instance, extending the spatial representation from two to three dimensions requires employing appropriate types of spatial entities (e.g. 3D structure) and measures (e.g. volume), and updating the multiscale spatio-temporal analysis module (implementation) accordingly. Moreover (the value corresponding to) each position in the discretised space is mapped to (the intensity of) a voxel, rather than a pixel in an image. The model checking approach is adapted automatically to different spatial entity types and/or measures using the spatio-temporal meta model checking concept described later.

The output of the multiscale spatio-temporal analysis is time series data describing how the values of the spatial measures considered change over time for each detected spatial entity, scale and subsystem.

Multiscale Spatial Temporal Markup Language

The MSSpDES model simulation results are represented by time series data produced by the multiscale spatio-temporal analysis and time series data describing the evolution over time of numeric state variables values.

To represent these model simulation results in a uniform manner which facilitates exchange of data sets and integration of software tools a corresponding standard data representation format is required. To the best of our knowledge such a standard data representation format does not exist.

One of the main requirements for the data representation format is that it supports recording different numbers of values at different time points because the collection of (emergent) spatial entities considered could potentially change over time. Traditional tabular (e.g. csv) representation formats are not suitable because they assume that the number of recorded values (or columns) is constant throughout the entire time series. Moreover defining a representation format similar to csv that does not annotate numeric values with their meaning could be potentially difficult to interpret.

For portability, structuring and readability purposes an eXtensible Markup Language (XML) based standard representation format is defined called Multiscale Spatial Temporal Markup Language (MSTML). The rules and constraints for the structure of MSTML files are formalised in XML Schema Definition (xsd) files. The latest version of the MSTML format is made available at http://mule.modelchecking.org/standards, a description of the format is given in S3 Text, and an example of an MSTML formatted file is depicted in Listing 1.

Listing 1. An example MSTML file recording multiscale spatio-temporal time series data.

1 <?xml version=“1.0” encoding=“utf −8”?>

2 <experiment>

3 <timepoint value=“1”>

4 <spatialEntity spatialType=“cluster” scaleAndSubsystem=“Organ.Liver”>

5 <clusteredness>0.01</clusteredness>

6 <density>0.4</density>

7 <area>15</area>

8 <perimeter>28</perimeter>

9 <distanceFromOrigin>81</distanceFromOrigin>

10 <angle>10.5</angle>

11 <triangleMeasure>0.5</triangleMeasure>

12 <rectangleMeasure>1.0</rectangleMeasure>

13 <circleMeasure>0.1</circleMeasure>

14 <centroidX>703.4999</centroidX>

15 <centroidY>118.087</centroidY>

16 </spatialEntity>

17 <numericStateVariable scaleAndSubsystem=“Cellular.Hepatocyte”>

18 <name>dysfunction</name>

19 <value>0.1</value>

20 </numericStateVariable>

21 </timepoint>

22 …

23 </experiment>

For model checking purposes the number of MSTML files #MSTML generated for an MSSpDES model assuming fixed parameter values varies depending if the model is deterministic (#MSTML = 1) or stochastic (#MSTML ≥ 1), and if the required level of confidence for the model checking result is high (e.g. 99%) or low (e.g. 70%).

To determine the correctness of a model the model checker verifies if its behaviour captured by a corresponding set of MSTML files conforms to a given formal specification.

Formal specification

The temporal logic employed to write the formal specification needs to enable reasoning about how values of numeric state variables and/or spatial measures, which are the state variables considered, are expected to change over time and multiple scales.

To the best of our knowledge the only formal language for reasoning about numeric and spatial properties corresponding to computational models of biological systems is called Bounded Linear Spatial Temporal Logic (BLSTL), which we have previously introduced in [57]. One of the main limitations of BLSTL is that it does not enable different scales to be explicitly distinguished. Therefore it is not possible to relate how changes at one scale reflect at another scale and vice versa.

Bounded Linear Multiscale Spatial Temporal Logic

To address the issue of relating changes between scales we define the Bounded Linear Multiscale Spatial Temporal Logic (BLMSTL) which enables explicitly distinguishing between state variables corresponding to different scales and subsystems. Throughout it is assumed that the scales and subsystems considered are the same as the ones defined in the MA graph of the corresponding MSSpDES model. Although MSSpDESs can be employed to represent both discrete- and continuous-time stochastic discrete-event systems, the semantics of a temporal logic usually varies with the considered representation of time. Therefore in this paper we restrict the semantics of BLMSTL to a continuous-time representation (similarly to CSL [64] and in contrast to BLSTL). However adapting BLMSTL to a discrete-time representation requires changing only the semantics of the time dependent operators, whereas the definition of all other atomic propositions (related to different scales and subsystems, numeric state variables, and spatial entities) is preserved.

BLMSTL enables reasoning about how collections, or more formally bags, of spatial measures values from one time point, and collections of numeric state variables and spatial measures values corresponding to multiple time points change over time using statistical functions. Transfer relations between state variables from the same and/or different scales are encoded using standard arithmetic functions. An informal natural language description of the most relevant BLMSTL features is given below; see S4 Text for a formal definition of the BLMSTL syntax and semantics.

Similarly to BLSTL, BLMSTL employs temporal and Boolean operators for describing how a system changes over time, respectively for composing simple logic statements into more complex ones. BLMSTL atomic propositions enable describing relations between numeric state variables and/or spatial measures associated to subsets of spatial entities.

Numeric state variables are specified by their name (e.g. heartBeat) and their associated scale and subsystem (e.g. (organ, heart)); the corresponding BLMSTL notation for specifying scales and subsystems is scale.subsystem (e.g. organ.heart). Conversely spatial measures associated with subsets of spatial entities are specified by their spatial measure type (e.g. area), associated spatial entity type (e.g. regions) and their corresponding scale and subsystem. Similarly to MSTML the sets of spatial entity types and spatial measures considered are SET_considered = {clusters, regions}, respectively SM_considered = {clusteredness, density, area, perimeter, distanceFromOrigin, angle, triangleMeasure, rectangleMeasure, circleMeasure, centroidX, centroidY}.

Instead of considering all spatial entities of a given type it is possible to select only a subset of spatial entities by imposing constraints over the spatial measure values (e.g. spatial entities with area > 10), by using subset operators \ (difference), ∩ (intersection) and ∪ (union), or specifying one or multiple scales and subsystems using the partial orders < and ≤ defined over the set of vertices V_MA (e.g. spatial entities whose corresponding scale and subsystem < (organ, heart)).

The resulting collection of spatial measures values corresponding to multiple spatial entities (e.g. value of the area for all detected spatial entities) can be described using unary (e.g. mean), binary (e.g. covariance) or binary quantile (e.g. percentile) statistical functions. These statistical functions can be additionally employed to reason about collections of numeric state variables and spatial measures values corresponding to multiple time points (e.g. the value of numeric state variable X for all time points in the time interval [0, 100]). By considering different numbers of time points for different state variables it is possible, for instance, to describe how values corresponding to one time point (and a coarse-grained scale) relate to other values corresponding to multiple time points (and a fine-grained scale), or vice versa.

Transfer functions defined over state variables from different scales can be encoded using unary (e.g. square root) and binary (e.g. add) arithmetic functions. For instance if the value of a state variable sv_cg from a coarse-grained scale is equal to the arithmetic mean of four state variables sv_fg₁, sv_fg₂, sv_fg₃, sv_fg₄ from a more fine-grained scale, this can be written as sv_cg = (sv_fg₁+sv_fg₂+sv_fg₃+sv_fg₄)/4; in BLMSTL “+” and “/” would be replaced by the arithmetic functions add, respectively div.

Illustrative examples of statements written both in natural language and BLMSTL are given below. For simplicity the number of scales and subsystems explicitly specified is two in all examples.

Natural language: Always during the time interval [0, 95] if the concentration of EGFR (corresponding to scale and subsystem (Intracellular, RasERKPathway)) increases over 20 M, then the cancerous cell (corresponding to scale and subsystem (Cellular, Cancerous)) will divide i.e. the cell count will increase.

BLMSTL: G[0, 95] (({EGFR}(scaleAndSubsystem =

Intracellular.RasERKPathway) > 20) ⇒

(d(count(density(filter(regions, scaleAndSubsystem =

Cellular.Cancerous)))) > 0)).
Natural language: If the concentration of drug X (corresponding to scale and subsystem (Organism, Human)) eventually increases during time interval [5, 10], then the area of the aorta cross section (corresponding to scale and subsystem (OrganSystem, Aorta)) will be larger during time interval [10, 30] than [0, 10].

BLMSTL: (F [5, 10] d({X}(scaleAndSubsystem = Organism.Human)) >0) ⇒

(min([10, 30] min(area(filter(regions, scaleAndSubsystem =

OrganSystem.Aorta)))) >

max([0, 10] max(area(filter(regions, scaleAndSubsystem =

OrganSystem.Aorta))))).
Natural language: Always during the time interval [0, 100] the liver dysfunction measure (corresponding to scale and subsystem (Organ, Liver)) is equal to the average density of damaged liver tissues (corresponding to scales and subsystems ≤ (Tissue, DamagedLiverTissue)). The assumption made here is that the density value represents the degree of damage suffered by the liver tissue.

BLMSTL:G [0, 100] ({LiverDysfunction} (scaleAndSubsystem =

Organ.Liver) = avg(density(filter(regions, scaleAndSubsystem ≤

Tissue.DamagedLiverTissue)))).

To enable the explicit encoding of the probability with which a BLMSTL statement is expected to hold, a probabilistic extension of BLMSTL called Probabilistic Bounded Linear Multiscale Spatial Temporal Logic is defined.

Probabilistic Bounded Linear Multiscale Spatial Temporal Logic

A Probabilistic Bounded Linear Multiscale Spatial Temporal Logic (PBLMSTL) property ϕ is a logic property of the form P_⋈θ[ψ] where ⋈ ∈ {<, < =, > =, >}, θ ∈ (0, 1) and ψ is a BLMSTL property.

An illustrative example of a natural language probabilistic statement mapped into PBLMSTL is given below:

Natural language: The probability is greater than 0.99 that always during the time interval [0, 95] if the concentration of EGFR (corresponding to scale and subsystem (Intracellular, RasERKPathway)) increases over 20 M, then the cancerous cell (corresponding to scale and subsystem (Cellular, Cancerous)) will divide i.e. the cell count will increase.

PBLMSTL: P > 0.99 [G[0, 95] (({EGFR}(scaleAndSubsystem = Intracellular.RasERKPathway) > 20) ⇒

(d(count(density(filter(regions, scaleAndSubsystem =

Cellular.Cancerous)))) > 0))].

A PBLMSTL property ϕ ≡ P_⋈θ[ψ] holds for an MSSpDES $M$ if and only if the probability of ψ to hold for a model simulation is ⋈θ. Therefore in order to determine the truth value of a PBLMSTL property ϕ the likelihood of ψ being true needs to be computed.

Model checking

The multiscale spatio-temporal model checking problem is to automatically verify if an MSSpDES $M$ satisfies a PBLMSTL property ϕ.

In order to solve the model checking problem only approximate probabilistic model checking approaches are considered throughout. As illustrated in Table 2 the approaches considered are either Bayesian or frequentist, and estimate or hypothesis testing based; a brief description of each approach was given in our previous work [57] and will not be restated here.

Table 2. Considered approximate probabilistic model checking approaches.

Name	Type	Input	Description	Sample size	Ref.
Chernoff-Hoeffding bounds based	FE	ϵ, δ	The absolute difference between the estimated p and true p′ probability of ψ to hold is greater than ϵ with probability less than δ (i.e. P[\|p − p′\| > ϵ] < δ).	$n = \frac{4}{ϵ^{2}} log (\frac{2}{δ})$	[65]
Improved frequentist statistical hypothesis testing	FH	α, β	Wald’s sequential probability ratio test [66] is employed to decide if the null hypothesis H₀ is rejected in favour of the alternative hypothesis H₁ considering the upper bounds on the probability of type I and type II errors α, respectively β.	The value of n is determined during the execution of the model checking approach considering α, β and the number and order of MSTML files against which ψ evaluates true; see ([67] [p. 21]) for an approach on how to compute an upper bound for n.	[59, 68]
Probabilistic black-box	FH	-	The p-value associated with the null and alternative hypotheses H₀, respectively H₁ is computed after evaluating the n MSTML files against ψ. The hypothesis with the lowest corresponding p-value holds.	n > 0	[69, 70]
Bayesian mean and variance based	BE	α, β, T	The probability ρ and variance ν of ψ to hold are estimated considering the given MSTML files and the Beta prior parameters α and β. New MSTML files are evaluated against ψ until the condition ν < T holds.	The value of n is determined during the execution of the model checking approach considering α, β, T and the number and order of MSTML files against which ψ evaluates true.	[71]
Bayesian statistical hypothesis testing	BH	α, β, T	A measure $B$ of confidence in the null hypothesis H₀ relative to the alternative hypothesis H₁ is computed considering the Beta prior parameters α and β. New MSTML files are evaluated against ψ until either $B > T$ or $B < 1 / T$ .	The value of n is determined during the execution of the model checking approach considering α, β, T and the number and order of MSTML files against which ψ evaluates true.	[72, 73]

Open in a new tab

Each table body row corresponds to a different approximate probabilistic model checking approach. The columns from left to right record the name, type (i.e. F—Frequentist, B—Bayesian, E—Estimate, H—Hypothesis testing), input parameters (excluding ϕ and MSTML files), description, sample size (i.e. n) and reference corresponding to a model checking approach. The null (i.e. H₀) and alternative (i.e. H₁) hypotheses represent ϕ (e.g. P_>θ[ψ]), respectively the opposite of ϕ (e.g. P_≤θ[ψ]). Bayesian methods consider prior knowledge when deciding if a logic property holds. Conversely frequentist approaches assume that no prior knowledge is available. All methods except probabilistic black-box take as input a user-defined upper bound on the approximation error. They request additional model simulations until the result is sufficiently accurate. Conversely probabilistic black-box model checking takes a fixed number of model simulations as input and computes a p-value as the confidence measure of the result.

By means of approximate probabilistic model checking approaches the verification of a PBLMSTL specification against an MSSpDES model is guaranteed to terminate. Therefore the corresponding multiscale spatio-temporal model checking problem is well-defined; see S5 Text for a formal proof. Intuitively the main idea behind the proof is to show that in order to verify an MSSpDES model the number of required model simulations is finite, and that the number of time points considered for each model simulation is bounded. Therefore the PBLMSTL specification is evaluated against a finite number of time points and model simulations, which can be done in a finite number of steps.

Spatio-temporal meta model checking

One of the main limitations of our methodology, as described up to this point, is that the evolution over time of spatial properties can be described only with respect to the predefined collections of spatial entity types SET_considered = {clusters, regions} and spatial measures SM_considered = {clusteredness, density, area, perimeter, distanceFromOrigin, angle, triangleMeasure, rectangleMeasure, circleMeasure, centroidX, centroidY}.

In order to overcome this limitation and enable automatically reconfiguring the methodology according to case study specific spatial entity types and measures, we define a generalized version of the multiscale spatio-temporal model checking methodology called multiscale spatio-temporal meta model checking in which SET_considered and SM_considered are replaced with meta collections of spatial entity types SET, and spatial measures SM, defined as follows:

$\begin{array}{l} S E T = & {s e t y | s e t y is a spatial entity type for which there \\ exists a corresponding spatial detection mechanism f_{s e t y}, \\ f_{s e t y} : S p S V^{p} \to {0, 1}^{m_{1} \times n_{1}} \times {0, 1}^{m_{2} \times n_{2}} \times \dots \times {0, 1}^{m_{p} \times n_{p}}, \\ which detects sets of spatial entities S E of type s e t y in the \\ discretised spatial domain} . \end{array}$

Considering the spatial state variable tuples spsvt ∈ SpSV^p, f_sety computes which positions of the discretised space are occupied (1) by spatial entities or not (0); see [57] for examples of spatial detection mechanisms corresponding to the spatial entity types clusters and regions.
SM = {sm | sm is a spatial measure, sm: SE → SMV ⊆ ℝ, where SE is a set of spatial entities and SMV is the corresponding domain of valid spatial measure values}; similarly see [57] for examples of spatial measures corresponding to the spatial entity types clusters and regions.

These collections are called meta because they provide only a description of the conditions which should hold for each spatial entity type and spatial measure but do not explicitly define instances thereof.

The multiscale spatio-temporal meta model checking methodology enables the creation of different multiscale spatio-temporal model checking methodology instances by replacing SET and SM with case study specific collections of spatial entity types and spatial measures. These instances can then be used to verify corresponding MSSpDES models. For instance, in order to verify computational models considering a 3D representation of space a corresponding model checking methodology instance could be created that replaces SET and SM with SET_3D = {cuboid, cylinder, sphere} and SM_3D = {volume, centroidX, centroidY, centroidZ}.

A graphical description of the workflow employed to create multiscale spatio-temporal model checking methodology instances is given in Fig 6. For simplicity a single multiscale model checking methodology instance is considered throughout this paper corresponding to the collections of spatial entity types and measures SET_considered, respectively SM_considered.

Whenever creating new multiscale model checking methodology instances there is an additional need to define corresponding image processing functions for automatically detecting and analysing spatial entities in time series data. However such functions can often be defined based on existing approaches from the image processing literature.

Finally following on from S5 Text, when verifying an MSSpDES model relative to a formal PBLMSTL specification, the number of required model simulations and the number of required state transitions for each model simulation do not depend directly on the considered collections of spatial entity types and spatial measures. Therefore regardless of the considered instances of SET and SM the multiscale spatio-temporal model checking problem is well-defined.

Implementation

The multiscale spatio-temporal meta model checking approach was implemented in the model checking software Mule which enables automatically verifying multilevel computational models of biological systems relative to formal specifications; the model checker name is a concatenation of the first and last two letters in the word “Multiscale”. For efficiency purposes Mule was implemented in C++ and supports all approximate probabilistic model checking approaches described in Table 2.

Depending on the approximate probabilistic model checking approach employed the number of MSTML files required to verify if the computational model is valid relative to a PBLMSTL specification is computed differently. In case of Chernoff-Hoeffding bounds based and probabilistic black-box model checking approaches the number of required MSTML files can be computed before running Mule (i.e. statically). Conversely in case of the improved frequentist and Bayesian statistical hypothesis testing, and Bayesian mean and variance based model checking approaches the number of required MSTML files is determined only during the execution of Mule (i.e. dynamically). To support generating MSTML files on-demand Mule can take as input the path to a script (in our case Bash script) that simulates a computational model and stores the resulting output in MSTML files; run Mule with the command line argument —help for more execution details.

The workflow for generating multiscale spatio-temporal model checker instances was implemented as described in Fig 7. The main idea behind the implementation is to use two instead of one compilation (or translation) steps. The first compilation step takes a description of the spatial entity types and measures as input and produces C++ source code as output. The second compilation step translates the generated C++ source code in binary (i.e. executable) format. Conceptually this approach is called “meta” because Mule is an abstract multiscale spatio-temporal (meta) model checker that can be instantiated according to case study specific spatial entity types and measures. From a practical point of view the user modifies only the description of the spatial entity types and measures, while the source code and the corresponding executables are automatically generated for him/her.

The main advantage of the workflow depicted in Fig 7 is that it enables the considered spatial entity types and measures to be compiled into the model checking executable instead of being (dynamically) loaded at runtime, which could negatively impact the model checker performance.

Mule was implemented as an offline model checker and takes as input model simulation traces rather than the computational models used to generate them. Using trace analysis each model simulation trace is evaluated against the PBLMSTL specification. The trace analysis results corresponding to multiple model simulation traces are used by the employed model checking approach to determine if the PBLMSTL specification holds for the model.

The main advantage of implementing Mule as an offline model checker is that it is decoupled from the specific modelling formalisms employed to encode the computational models. Consequently Mule can be employed to verify computational models encoded using various modelling formalisms provided that the corresponding computational models satisfy the constraints of an MSSpDES model without requiring the explicit translation of the computational models to MSSpDES. In addition given that Mule takes simulation traces (i.e. time series data) as input it can be employed to evaluate PBLMSTL specifications both against time series data generated in silico or recorded in vitro. Conversely the main disadvantages of Mule are that the computational models need to be constructed and simulated using external tools, and the model simulation output needs to be stored in or translated to csv format. To generate model simulations on demand Mule needs to be able to execute the model simulator from the command line.

In contrast to Mule inline approximate probabilistic model checkers (e.g. COSMOS [74], PLASMA [75], PRISM [76], UPPAAL-SMC [77], Ymer [78]) are integrated modelling and verification environments that can be employed not only to verify, but also to construct and simulate computational models. In addition inline model checkers are usually more efficient than their offline counterparts, because model simulations can be generated on-demand, in-memory and potentially stopped early (i.e. as soon as the considered logic statement is accepted/rejected). However inline model checkers typically require explicitly encoding computational models in the model checker specific modelling formalism, and they can not be employed to evaluate formal specifications against time series data recorded in vitro.

Both the source code and the executable corresponding to the Mule instance employed throughout this paper are made freely available online at http://mule.modelchecking.org; this Mule instance is defined with respect to the collection of spatial entity types SET_considered and spatial measures SM_considered. Moreover a corresponding Docker image has been created providing a self-contained environment for executing/updating model checker instances which can be run on all major operating systems without additional setup (except installing the freely available software Docker).

Results

We illustrate the applicability of the model checker based on four multiscale systems biology case studies published in the literature. The case studies were chosen such that the corresponding computational models are of different types (i.e. deterministic/hybrid/stochastic), span different levels of organization (e.g. cellular/organ) and are encoded using different modelling formalisms (e.g. ordinary differential equations/cellular automata) and software (e.g. Morpheus/NetLogo); see Table 3 for a brief comparison of the multilevel computational models considered.

Table 3. Considered multilevel systems biology computational models against which the proposed model checking methodology and implementation were validated.

	M1	M2	M3	M4
Description	Rat cardiovascular system dynamics	Uterine contractions of labour	Xenopus laevis cell cycle	Acute inflammation of the gut and lung
Model type	Deterministic	Deterministic	Hybrid	Stochastic
Modelling formalism(s)	Ordinary differential equations (ODE)	Cellular automata (CA)	ODEs + Cellular Potts model (CPM)	Agent based modelling (ABM)
Modelling software	JSim	Mathematica	Morpheus	NetLogo
Explicit spatial representation	N	Y	Y	Y
Levels of organization	Cellular + Organ system	Cellular + Tissue	Intracellular + Cellular	Cellular + Tissue + Organ
Case study reference	[13]	[14]	[15]	[16]
Model download link	http://virtualrat.org/sites/default/files/downloads/Workflow_Model_Files_12April2012.zip	http://s3-eu-west-1.amazonaws.com/files.figshare.com/1720626/Supporting_Information_S1	http://imc.zih.tu-dresden.de/wiki/morpheus/doku.php?id=examples:multiscale#odes_in_cpm_cellscell_cycle_and_proliferation	http://bionetgen.org/SCAI-wiki/images/7/7d/GutLungAxis2.1.nlogo

Open in a new tab

Each model (M1–M4) has an associated description and type (i.e. deterministic, stochastic or hybrid), was encoded using specific modelling formalisms and software, represents space explicitly or not (Y—Yes, N—No), spans different levels of organization, and has a corresponding reference paper and download link.

Since Mule is implemented as an offline model checker and all approximate probabilistic model checking algorithms employed here (see Table 2) are defined relative to simulation traces, the computational models M1–M4 were not explicitly translated to an MSSpDES representation. Instead the computational models encoded using high-level modelling formalisms were simulated and the simulation output was stored in MSTML files. These MSTML files were then provided as input to the model checker Mule. There are two main reasons for employing the computational models encoded in high-level modelling formalisms (as developed by their original authors) instead of MSSpDES. First of all simulating an MSSpDES computational model on a computer requires defining an MSSpDES operational semantics, which was not given here. Secondly approximations inherent to the translation of computational models between different modelling formalisms could potentially impact the outcome of the model checker execution.

In case of the deterministic continuous-state computational model M1 an alternative approach, which is not considered here, would have been to translate M1 into a stochastic discrete-state computational model. Using the approach described by Wilkinson [79] and under the assumption that the volume of the media containing the species in the model is known, concentrations can be converted into discrete numbers of molecules, and deterministic into stochastic kinetic rate constants. The main reason for not translating M1 into a stochastic model is that we want to illustrate that Mule can be employed to verify existing deterministic continuous-state computational models relative to PBLMSTL specifications without the need to initially alter the models. The probability of a PBLMSTL specification to hold for the deterministic continuous-state model M1 is either 1 (i.e. true) or 0 (i.e. false).

The natural language and corresponding formal specifications, against which the models were verified, have been derived from the original papers introducing the case studies. Quotes from the original papers have been employed to create initial natural language statements describing the expected system behaviour. The initial natural language statements were then rephrased to match the constructs and structure typical to formal PBLMSTL statements; the resulting statements are called rephrased natural language statements. Finally the rephrased natural language statements were manually mapped into corresponding PBLMSTL statements. Where insufficient information was available (e.g. probabilities) the numeric values employed in the formal specification are quantitative approximations of the corresponding natural language descriptions (e.g. with high probability ⇒ 0.9). The main purpose of the PBLMSTL statements considered is to illustrate the expressivity of the methodology and not to predict previously unknown biologically relevant properties. For reproducibility purposes the mapping between quotes from the original papers, derived natural language statements and corresponding PBLMSTL specifications is documented in the supplementary materials.

The model checking approach employed to verify the deterministic computational models (M1 and M2) was probabilistic black-box because it does not place a lower bound on the required number of model simulations and therefore is suitable for computational models which are simulated only once. Conversely for the verification of the hybrid (M3) and stochastic (M4) computational models improved frequentist statistical hypothesis testing was employed setting the values of both input parameters α (i.e. probability of type I errors) and β (i.e. probability of type II errors) to 5%. Therefore the number of model simulations considered for the verification of computational models M3 and M4 was variable and computed relative to the values of the input parameters α and β, respectively fixed and was equal to one for computational models M1 and M2.

All approximate probabilistic model checking approaches supported by Mule (see Table 2) were previously introduced by other authors and are not directly dependent on PBLMSTL. Therefore a comparison between the different model checking approaches, although interesting, goes beyond the scope of this paper.

The computational models have been simulated, analysed and verified using the same regular desktop computer (Linux x64, Intel Core i5-2500 CPU @1.6 GHz, 16 GB DDR3 RAM memory). To assess the performance of the approach execution times have been recorded for all relevant steps of the model checking workflow.

Finally, for comparison purposes, the case studies and the corresponding computational models will not be described individually but in parallel considering the steps of the model checking workflow (i.e. model construction, multiscale spatio-temporal analysis, formal specification, model checking).