Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2023 Nov 8;39(11):btad674. doi: 10.1093/bioinformatics/btad674

FitMultiCell: simulating and parameterizing computational models of multi-scale and multi-cellular processes

Emad Alamoudi 1, Yannik Schälte 2,3,4, Robert Müller 5, Jörn Starruß 6, Nils Bundgaard 7, Frederik Graw 8,9,10, Lutz Brusch 11, Jan Hasenauer 12,13,14,
Editor: Pier Luigi Martelli
PMCID: PMC10666203  PMID: 37947308

Abstract

Motivation

Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyse and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as approximate Bayesian computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes.

Results

Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology.

Availability and implementation

FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit.

1 Introduction

Biological tissues are complex entities composed of cells and extracellular components. Tissues occur in different developmental stages and compositions, are highly dynamic and often heavily structured. Specific tissue properties are relevant for a broad range of processes, including tissue homeostasis, viral infection, and tumor development and treatment. For the experimental analysis of biological tissues, imaging techniques are widely used. Common approaches include light and fluorescence microscopy, but more recently also imaging mass cytometry, spatial transcriptomics and related methods are employed [see Lewis et al. (2021) for a review]. These experimental techniques provide a variety of quantitative information about biological tissues. Yet, mechanisms underlying specific pattern formation or tissue dynamics often remain elusive. To address these aspects, computational modeling has established itself as a key element to obtain a comprehensive understanding of causal relationships in multi-cellular spatio-temporal systems. Computational models of multi-cellular processes usually capture multiple spatial and temporal scales and describe the emergence of the system’s behavior based on individual building blocks, e.g. individual cells and their interactions. There are several modeling approaches, including discrete, continuous, and hybrid model formalisms (Anderson and Quaranta 2008, Waclaw et al. 2015, Meyer et al. 2020). Cells and their interactions can, for instance, be described using (energy-based) Cellular Potts Models (CPM) or (force-based) vertex models. Several software packages have been developed, including MCell (Kerr et al. 2008), FLAME (Richmond et al. 2010), CompuCell3D (Swat et al. 2012), Chaste (Mirams et al. 2013), Morpheus (Starruß et al. 2014), and PhysiCell (Ghaffarizadeh et al. 2018) to simplify and standardize the demanding task of model formulation and implementation. Yet, while computational modeling has substantially improved the understanding of multi-cellular systems, it remains a challenging task (Fletcher and Osborne 2022).

For multi-scale models of multi-cellular processes, a key challenge is parameter estimation, the process in which values for the unknown model parameters are determined by fitting the model simulations to experimentally observed data. Parameter estimation is necessary to obtain quantitative models of processes, to analyse processes, to compare competing hypotheses about processes, and to predict the dynamics of processes (e.g. in response to perturbations) (Toni et al. 2011, MacLean et al. 2014, Jagiella et al. 2017, Imle et al. 2019, Durso-Cain et al. 2021, Joslyn et al. 2021). Systematic, rigorous and uncertainty-aware parameter estimation is only just becoming accessible for multi-scale models with advanced methods and growing computational resources. Reasons for this are that (i) the simulation of multi-scale models accounts for different biophysical processes (e.g. intra- and extracellular signaling as well as physical interaction), which requires computationally efficient simulation algorithms (e.g. hybrid discrete-continuum approaches), and that (ii) the stochasticity of most models (e.g. due to randomness arising from small cell numbers) necessitates repeated simulations.

To cope with the challenges of parameter estimation for advanced computational models, approximate Bayesian computation (ABC) methods have been developed and applied (Pritchard et al. 1999, Beaumont et al. 2002, Sisson et al. 2018). These methods generate samples from an approximation of the Bayesian parameter posterior distribution, without evaluating a likelihood function which can quickly become inaccessible for complex stochastic models. To enable the application of ABC methods to multi-scale models, they have been parallelized on high-performance computing (HPC) infrastructure (Johnston et al. 2014, Sottoriva et al. 2015, Babtie and Stumpf 2017, Jagiella et al. 2017). Generic implementations are provided by, in particular, pyABC (Klinger et al. 2018), ABCpy (Dutta et al. 2017), and ELFI (Kangasrääsiö et al. 2016). Yet, while simulation and inference tools are available, the parameter estimation for multi-scale models of multi-cellular processes still requires a high level of technical expertise. The available tools are not interfaced and code from published application examples is often difficult to reuse and extend. Thus, there is a need for a platform that facilitates and streamlines the entire workflow, from the construction of multi-scale models of various types based on biological principles, to systematic uncertainty-aware data-driven parameter estimation (Hasenauer et al. 2015). A first software tool tackling this problem is the spatial model editor (SME) (https://github.com/spatial-model-editor) (Keegan et al. 2023). However, this only supports partial differential equation models of reaction-diffusion systems, while stochastic and multi-scale models cannot be considered and parameter uncertainty analysis is not supported.

In this work, we introduce the FitMultiCell pipeline, an open-source, user-friendly, and scalable end-to-end platform that integrates modeling, simulation, and parameter estimation, to simplify the analysis of multi-scale and multi-cellular systems. FitMultiCell integrates the state-of-the-art tools Morpheus (Starruß et al. 2014) for model building and simulation, and pyABC (Schälte et al. 2022) for parameter estimation. It builds on an extension of the PEtab standard (Schmiester et al. 2021) for the specification of parameter estimation problems that we additionally introduce in this manuscript. We demonstrate and evaluate the FitMultiCell pipeline using models of viral infection, tumor growth, and liver regeneration.

2 Methods

2.1 Problem description

We consider the problem of developing quantitative computational models of multi-cellular processes. These models might account for a broad range of biochemical and biophysical processes, including

  • cellular signal transduction, metabolism, and gene regulation,

  • cell movement, proliferation and death, and

  • cell–cell communication (e.g. via direct cell–cell-contact areas or the secretion/uptake of biochemical substances).

The spectrum of modeling frameworks for such integrated processes is broad and the models are often stochastic, like the behaviors and decisions by individual biological cells are inherently stochastic. Mathematically, we can write any such model as

y=M(θ,ξ),

where y denotes the vector of observed properties and θ denotes the vector of unknown properties of biochemical, biophysical, or observation processes (e.g. reaction rates). Process noise—arising, for instance, from low molecular numbers—as well as measurement noise is described via the vector of random variables ξ. Marginalizing the model simulation over the random variables ξ yields the likelihood π(y|θ), which is the conditional probability of observing y given θ.

Quantitative mathematical modeling requires the inference of the unknown parameters θ from data. In this study, we consider Bayesian inference and aim to approximate the Bayesian posterior distribution

π(θ|yobs)π(yobs|θ)π(θ)

of the unknown model parameters for experimentally observed data yobs and prior knowledge π(θ). The posterior distribution encodes all available information about the model parameter and, hence, allows for the assessment of parameter and prediction uncertainties as well as for the design of validation experiments. Experimental data used for the parameterization of the considered models are often obtained using imaging techniques. These can provide information about, e.g. spatial profile, temporal changes and changes between conditions. Overall, the spectrum of experimental setups is broad.

2.2 FitMultiCell pipeline

To facilitate quantitative computational modeling of multi-cellular processes, we developed the FitMultiCell pipeline (Fig. 1). This pipeline supports its users in

Figure 1.

Figure 1.

Overview of the FitMultiCell pipeline. First row: User perspective on the biological system, experiment, model formulation, and parameter estimation problem. Second row: Tool and format overview. Third row: Features and details for all components of the workflow. Fourth row: Visualization of the ABC-SMC parameter estimation algorithm.

  1. formulating the modeling problem, i.e.

    1. defining computational models and

    2. defining data obtained from experiments,

  2. estimating unknown model parameters, and

  3. evaluating parameter and model predictions (including planning validation experiments).

This pipeline streamlines the parameter estimation by providing a tight integration of tools for model specification and simulation, and parameter estimation and reporting. This is achieved using a Python interface with a broad spectrum of functionalities. The current version supports (i) multi-scale modeling and simulation using Morpheus (Starruß et al. 2014) and (ii) parameter estimation and uncertainty quantification using pyABC (Klinger et al. 2018). These state-of-the-art tools cover modeling applications; yet, other modeling, simulation, and parameter estimation tools can be easily interfaced. Besides this Python interface, FitMultiCell offers a GUI to control pipeline execution and to analyse results. It also provides a collection of scripts and encapsulation of Morpheus that streamline pipeline execution on large computer clusters and cloud resources, making them accessible to non-expert users. Morpheus has been extended by GUI features for easily selecting parameters for estimation (auto-generated and interactive model graph to guide the user to relevant model parameters and automatic XPath creation to reference selected parameters) and by major performance improvements including adaptive parallel solvers for intracellular and partial differential equation (PDE) sub-models and the statistically exact multicore parallelization of the CPM formalism (see Supplementary data). Regarding pyABC, the construction of summary statistics has been simplified and robust adaptive distance functions were added. Lastly, PEtab-MS was developed to ensure reproducibility and ease of problem definition, often eliminating the need for coding.

The FitMultiCell pipeline enables parameter estimation for application problems with multiple experimental conditions, datasets, and data types. Furthermore, it offers a broad range of built-in visualization and analysis tools. To facilitate reproducibility and reusability, the FitMultiCell pipeline supports the MorpheusML and the PEtab-MS standards.

In the following, we describe the key components of the FitMultiCell pipeline and their features.

2.2.1 Standardization

The FitMultiCell pipeline uses standardized data formats to ensure interoperability, reproducibility, and reusability—important aspects of the FAIR principles (Wilkinson et al. 2016). The multi-cellular models are encoded using MorpheusML, an established XML-based standard at https://doi.org/10.25504/FAIRsharing.78b6a6. The parameter estimation problems are encoded using the PEtab-MS format, a newly developed tsv-file-based standard.

We developed PEtab-MS as an extension of the Parameter Estimation tabular format (PEtab) (Schmiester et al. 2021), which was developed in the field of ordinary differential equation (ODE) based modeling and is already supported by various simulators and parameter estimation toolboxes. PEtab-MS conserves most core components of PEtab, including the tables defining model parameters and experimental conditions. Yet, PEtab-MS allows for model expression using MorpheusML, flexible data tables (including references to image files), as well as functions for computing summary statistics. These aspects are essential for multi-cellular processes and were not captured by PEtab. The specification of PEtab-MS and a tool for the validation of files can be found at https://gitlab.com/fitmulticell/libpetab-python-MS.

As imaging data are often processed to obtain informative summary statistics, the FitMultiCell pipeline includes a class definition for the construction of summary statistics. Several common summary statistics for imaging data are already provided (see Section 3) and customized statistics can be implemented. Furthermore, it is possible to automatically construct informative summary statistics (Fearnhead and Prangle 2012, Schälte and Hasenauer 2022).

2.2.2 Simulation

The FitMultiCell pipeline is designed for the simulation-based analysis of multi-cellular processes. Accordingly, we allow for simulation of advanced models M for cells and tissues (using cellular automata and cellular Potts models), extracellular concentration fields (using partial differential equations) and intracellular dynamics (using ordinary or stochastic differential equations or continuous-time Markov jump processes).

The FitMultiCell pipeline implements a comprehensive interface to Morpheus, including parameter mapping and simulation result extraction. This renders the functionalities of Morpheus (e.g. model construction, simulation, and visualization) as well as features such as a comprehensive GUI for rapid prototyping available within the FitMultiCell pipeline. Morpheus is broadly applicable, widely used for a variety of biological problems, and computationally efficient [see, e.g. Köhn-Luque et al. (2011), Imle et al. (2019), Vu et al. (2019) and currently more than 80 models shared in the Morpheus model repository].

Additionally, the modular architecture of the FitMultiCell pipeline allows for the interfacing of additional modeling and simulation toolboxes, as well as user-provided simulation codes.

2.2.3 Inference

The FitMultiCell pipeline is designed for simulation-based inference, a class of approaches circumventing the evaluation of the likelihood function. This is important for the study of stochastic multi-cellular processes, but also allows for the application to deterministic models.

The FitMultiCell pipeline implements a comprehensive interface to pyABC (Schälte et al. 2022), including parameter and condition mapping for multi-experiment and multi-data type inference. This renders the state-of-the-art approximate Bayesian computation sequential Monte Carlo (ABC-SMC) method easily accessible to users of the FitMultiCell pipeline. ABC-SMC methods approximate the posterior distribution by constructing a sequence of particles which resembles the data (in terms of summary statistics) more and more closely (Toni and Stumpf 2010). Well-tested adaptation methods, e.g. for acceptance thresholds, proposal distributions, and population sizes, render pyABC accessible to non-expert users. Furthermore, pyABC allows for exact inference (Schälte and Hasenauer 2020) and we recently implemented automatic data normalization, summary statistics construction, and measurement noise handling to enable the robust estimation of parameters of multi-scale models (Schälte et al. 2021, Schälte and Hasenauer 2022). For the combined use of Morpheus and pyABC, we implemented an early rejection mechanism that discards simulations as early as possible, e.g. if they exceed an upper time limit. This makes the analysis robust to unexpectedly long-running simulations, e.g. due to an excessive number of cells.

We provided the interface to pyABC as this tool has been successfully used in a broad spectrum of applications, e.g. cardiac electrophysiology (Cantwell et al. 2019), human atrial cells (Houston et al. 2020), cancer (Colom et al. 2021), gene expression (Coulier et al. 2021), universe expansion (Bernardo and Said 2021), and bee colonies (Minucci et al. 2021). This underlines its applicability to a broad spectrum of inference tasks for multi-cellular processes. Additionally, the modular architecture of the FitMultiCell pipeline allows for the interfacing of additional inference tools, as well as user-provided inference algorithms.

2.2.4 Distributed execution

Parameter estimation often requires thousands to millions of stochastic model simulations, which is computationally demanding for complex multi-cellular processes. Distributed execution of the computational tasks is thus paramount (Fig. 2). Within the FitMultiCell pipeline, parallelization can happen on three levels to efficiently exploit HPC infrastructure:

Figure 2.

Figure 2.

Illustration of the parallel framework of FitMultiCell. Inference level: A main process manages the pyABC analysis workflow and SQL data storage. Population level: In each ABC-SMC generation, the main process delegates to a number of distributed worker processes the task of sampling parameters (from proposal distributions p(x)) and synthetic datasets, until sufficiently many fulfil the generation-specific acceptance criterion (such as the acceptance thresholds E). Work distribution, communication, and storage of intermediate results across various computational nodes and processing units are managed via a central Redis server. Particle level: Each single simulation of synthetic data (or groups of simulations for related perturbation scenarios in a multi-experiment setting) is performed using Morpheus, optionally with shared-memory parallelization via OpenMP.

  • “Individual simulations” can be parallelized within the employed simulation toolbox. Morpheus supports the use of multiple threads using OpenMP. In addition to ODEs and PDEs, Morpheus also provides a parallel and exact solver for CPMs.

  • “Individual summary statistics evaluations” can be parallelized within the FitMultiCell pipeline if multiple individual simulations are required. This is for instance the case if experimental replicates are available for stochastic processes and multiple experimental conditions are considered. As the respective simulations are independent, they can be trivially parallelized across multiple threads.

  • “Parameter estimation” can be parallelized by parameter estimation toolbox. PyABC supports single-machine multicore execution and multi-machine distributed execution. A main process manages the communication across SMC generations and post-processing of accepted particle populations, while the computation-heavy simulations and summary statistic evaluations are delegated to a set of parallel worker processes (see Fig. 2). pyABC provides two parallelization strategies, static and dynamic scheduling, which distribute work across computational resources, to minimize the overall CPU time, or the wall-time, respectively (Klinger et al. 2018).

To facilitate the setup of the required server-worker architecture, the FitMultiCell pipeline comes with bash scripts that handle the resource allocation and communication between nodes, and which can be used to run the pipeline on HPC clusters with minimum code modification. While parallelization is available at multiple levels, the most efficient resource use will often be achieved by dedicating many workers to the particle population and setting Morpheus to single-threaded simulation mode.

2.2.5 Analysis and visualization

The FitMultiCell pipeline provides a GUI for the analysis of the result of the parameter estimation process (see Fig. 3). This GUI can load the shareable SQL database generated by pyABC, but is flexible enough to visualize the results for other sampling tools after the result files were reformatted. Among other things, the GUI allows for the visualization of (i) samples using uni- and bi-variate plots, (ii) sampling diagnostics like the decrease of the acceptance threshold over generations, and (iii) the ability to generate the code for any plot that is generated. A detailed documentation for the GUI is available on pyABC (https://pyabc.readthedocs.io/en/latest/visualization.html) and FitMultiCell (https://fitmulticell.readthedocs.io/en/latest/youtube.html) documentation pages.

Figure 3.

Figure 3.

Several screenshots of the visualization and diagnostics GUI. (A) The main screen, (B) stacked bar plot of the number of required samples per generation, (C) 2D kernel density estimation for samples from approximate posterior distribution, (D) parallel coordinate plot for the development of the credibility intervals over generations, (E) line plot of the epsilon threshold per generation, and (F) line plot of the effective sample size per generation.

2.3 Implementation and development

The FitMultiCell pipeline is written in Python and available on GitLab (https://gitlab.com/fitmulticell/fit) and Zenodo (https://doi.org/10.5281/zenodo.10040172) under a BSD-3-Clause license. It is being developed by contributors from three institutions and others are invited to contribute. Code quality is monitored via unit tests and continuous integration. It can be installed from the Python Package Index (PyPI). Detailed documentation of the platform is available at https://fitmulticell.readthedocs.io, which covers installation, setting up the modeling and estimation problem, and running the parameter estimation process, including on HPC and cloud infrastructures.

To ensure ease of use, FitMultiCell pipeline integrates currently only tools which are easy to install and available under permissive licenses:

3 Results

To assess the performance of the FitMultiCell pipeline, we evaluated it using a broad spectrum of test and application examples. In the following, we present its core properties for a few selected models of multi-cellular processes. Several other applications to various multi-cellular problems and different data types have been published separately and are not included here [see Boutillon et al. (2022), Bundgaard et al. (2022)].

All the following results were obtained on the JUWELS standard compute nodes of the Supercomputing Center in Jülich, Germany. Details on the technical specifications are provided in the Supplementary data.

3.1 Standards supported by FitMultiCell pipeline allow for re-implementation of published application problems

The FitMultiCell pipeline allows for the standardized description of models, datasets and parameter estimation problems. To ensure that the supported standards are practically useful, we assessed to which degree we can implement already published application problems.

We considered three applications capturing different biological and technical problems: (M1) a model of virus transmission via cell-free virions and cell-to-cell contact (Kumberger et al. 2018); (M2) a model of tumor spheroid growth (Jagiella et al. 2017); and (M3) a model describing the mechano-sensing of the metabolic status during liver regeneration (Meyer et al. 2020). The models M1 and M2 were originally not available in MorpheusML and were re-implemented for the purpose of this study, while M3 was already published in MorpheusML. For model M1, Morpheus allowed for a one-to-one re-implementation of the original setup up to the numerical simulation schemes. For model M2, some reformulations were necessary. Here, we aimed to (i) resemble the biological processes (and subprocesses) contained in the original models, (ii) preserve the meaning of the model parameters, and (iii) achieve similar dynamics for key process characteristics. The most substantial change was the replacement of the representation of cells and modeling based on an unstructured grid by a cellular Potts model. To confirm the validity of the re-implementation, we assessed a broad spectrum of model characteristics, including statistics over space and time (such as radial profiles and growth curves for model M2). We observed a good resemblance for model M1—as expected—and minor difference for model M2 (results not shown). For all applications, we created the PEtab-MS description of the parameter estimation problems. PEtab-MS was flexible enough to describe all parameter estimation problems. This suggests that the standards employed in the FitMultiCell pipeline are sufficient for a broad spectrum of applications. Details on the application problems are provided below and in the Supplementary Section 3. Furthermore, the Morpheus and PEtab-MS files provide a comprehensive standardized description.

3.2 Parallelization in FitMultiCell pipeline provides substantial wall-time reduction

As the efficient use of computing resources is of crucial importance for parameter estimation, we evaluated the parallel efficiency achieved using the FitMultiCell pipeline. We considered a model of the spread of a viral infection within a tissue distinguishing between virus transmission via cell-free virions and direct cell-to-cell contact (Kumberger et al. 2018) (Fig. 4A and Supplementary Section 3.1). In the FitMultiCell pipeline, parallelization is available within and across individual simulation particles. For the considered problems, we parallelize exclusively on the population level to provide the most efficient resource usage; however, this may depend on the specific problem.

Figure 4.

Figure 4.

Performance of the parallelized FitMultiCell pipeline. (A) Illustration of the used model M1 for viral transmission within the host tissue. (B) Wall-time of two different fit scenarios across different numbers of used compute nodes, each consisting of 48 CPU cores. (C) Parallel efficiency of the FitMultiCell pipeline. The dashed line indicates perfect efficiency, i.e. a wall-time directly inversely proportional to the number of nodes, compared to single-node execution. For each choice of nodes, three consecutive ABC-SMC generations were run, with population sizes of N = 1000 (Scenario A) and N = 10 000 (Scenario B).

To assess the parallel efficiency, we performed parameter estimation using the ABC-SMC algorithm implemented in pyABC with the dynamic scheduling option. The algorithms were run for three generations with two different population sizes: (Scenario A) 1000 particles; and (Scenario B) 10 000 particles. As higher population sizes improve the posterior approximation but are computationally more demanding, both scenarios are relevant. We performed this task using 1, 2, 4, 8, and 16 compute nodes, each consisting of 48 CPU cores.

For both Scenarios A and B, we observed a substantial reduction in wall-time when using higher numbers of nodes, to a fraction of the single-node execution wall-time which already uses 48 CPU cores (Fig. 4B). Yet, the parallel efficiency (PE), defined as

PE={wall-time for run on single compute node}{wall-time for run on i compute nodes}i,

dropped (Fig. 4C). For the case of 16 nodes (768 cores), we observed a parallel efficiency of 49% and 75% in Scenario A and Scenario B, respectively. The higher parallel efficiency in Scenario B can be explained by the longer time required for individual generations, which reduces the effect of the idle time that usually arises at the end of each generation due to differences in computation times for individual particles [see also Klinger et al. (2018)].

In summary, our evaluation confirmed an overall good scaling of the FitMultiCell pipeline, yielding a wall-time reduction of several 10-fold compared to a single-node execution and several hundred-fold compared to single-core execution. As the number of proposed particles required to generate a specific number of accepted particles increases over generations, the parallel efficiency for production runs with the usual 15 to 40 generations should be higher than what has been observed here.

3.3 FitMultiCell facilitates the study of heterogeneous datasets via automatically tuned algorithms

To assess the benefit of the tight integration of modeling and parameter estimation tools in the FitMultiCell pipeline, and the availability of advanced inference algorithms, we considered the model of tumor spheroid growth (Jagiella et al. 2017) (Fig. 5 (left)). The original model was used to study growth control mechanisms and the effect of nutrient supply (Jagiella et al. 2016, 2017). Yet, the source code of this multi-cellular model entangled with its simulator is highly specific and no longer maintained, complicating the use of this model in further studies.

Figure 5.

Figure 5.

Result of FitMultiCell pipeline for the model of tumor spheroid growth. Left: Illustration of the model. Middle: Experimental data (gray) and model fit (green). The fit for ABC-SMC generation 1 (T1), generation 25 (T25) and final generation 51 (T51) is indicated using the 95% credibility interval of the accepted model simulations. Right: Credible intervals for the seven estimated parameters obtained using the original implementation (Jagiella et al. 2017) (gray) and the FitMultiCell pipeline (green).

We implemented the model in MorpheusML and made it available through a public model repository (https://identifiers.org/morpheus/M0007). The cell population was described using a stochastic CPM which accounts for cell division and death as well as cell–cell interactions, while the concentrations of extracellular substances were described using PDEs. The estimation problem was encoded using PEtab-MS, following the original publication by Jagiella et al. (2017). We selected the same seven unknown parameters that were also considered in Jagiella et al. (2017) based on a sensitivity analysis and biological relevance. This ensures comparability of our results against the reference publication. We considered all available datasets: a time-course for the tumor spheroid radius determined by bright field microscopy; and snapshots for radial profiles of markers for proliferating cells as well as extracellular matrix (ECM) abundance determined from fluorescence microscopy on day 17.

In our previous publication considering the parameter estimation problem (Jagiella et al. 2017), we manually defined weights to quantify the relative importance of different summary statistics for the heterogeneous dataset. As this was a time-consuming process, here we employed a fully automatic approach for summary statistics weighting based on inverse regression (see Supplementary Section 3.2.1 for details), which is available via the FitMultiCell pipeline. The specification of the parameter estimation task does only require a few lines of Python code given MorpheusML and PEtab-MS files.

We ran the ABC-SMC algorithm with a population size of 500 particles and set as stopping condition a maximum wall-time of 48 h, within which the pipeline was able to finish 46 generations. The comparison of simulated and observed summary statistics showed that the model is able to fit the data accurately, with substantially improving fit quality in later ABC-SMC generations (Fig. 5 (middle)). Similar to the original study, the exceptions lie in the model’s overestimation of the fraction of proliferating cells and underestimation of the ECM density at short distances from the rim. This corresponds to the findings of the original analysis (Jagiella et al. 2017). The analysis of credible intervals reveals that with the exception of the initial fraction of quiescent cells qinit, all parameters are identifiable with small uncertainties. The parameter estimates and uncertainty intervals obtained using our model are in excellent agreement with those in the original analysis (Fig. 5 (right)), even matching the distribution of the least certain parameter qinit. Hence, the FitMultiCell pipeline was able to reproduce the original results, while simplifying implementation and eliminating the need for manual tuning.

3.4 FitMultiCell pipeline facilitates parameter estimation for new applications

The parameters of multi-scale processes are often still adapted manually, e.g. due to the difficulty of setting up proper parameter estimation and its computational cost. As this involves manual work, it can lead to non-reproducible results and does not provide any information about parameter uncertainties. We assessed whether the FitMultiCell pipeline can easily replace manual parameter tuning. We studied a model describing the mechano-sensing of the metabolic status during liver regeneration after partial hepatectomy in mice, simulating the reaction network dynamics in hepatocytes along the central-portal axis of a liver lobule by a spatial array of coupled ODEs with spatially heterogeneous inputs (Meyer et al. 2020). For this model, the parameters were previously determined manually using data for two observables: the concentration of the YAP protein in the nucleus of hepatocytes (NYAP), and the total concentration of YAP protein in hepatocytes (TYAP), see Fig. 6 (left).

Figure 6.

Figure 6.

Result of FitMultiCell pipeline for the model of mechano-sensing of the metabolic status during liver regeneration. Left: Illustration of the model. Middle: Experimental data (gray) and simulation results (green) for different combinations of tissue location (from 1 to 10) and time point (t = 0 to t = 5.0 days). The fit for ABC-SMC generation 1 (T1), generation 7 (T7) and final generation 15 (T15) is indicated using the 95% credibility interval of the accepted model simulations. Right: Credible intervals for the 14 estimated parameters compared to the result of Meyer et al. (2020) (circle). Note that the four parameters k1, k5, k7, k8 were not estimated in Meyer et al. (2020), therefore no comparison is available for those.

The model was already implemented in MorpheusML and is available from a public model repository (https://identifiers.org/morpheus/M7990). Yet, to avoid the simplifying assumptions that were used in the original study due to lack of appropriate parameter estimation tools and computational resources, we (i) dropped the quasi-steady-state assumption of the intracellular dynamics and performed a full spatio-temporal simulation, (ii) extended the biochemical model by an observer model of the microscopy setup that introduces additional scaling parameters between protein concentration and observed fluorescence intensity, and (iii) included a statistical model for the measurement noise. Complementary, we created the PEtab-MS files encoding the estimation problem for the 14 unknown parameters. Previously, due to the employed quasi-steady state assumption, only the ratios of forward and backward rate constants were identifiable.

We ran the ABC-SMC inference with a population size of 1000 particles and a wall-time limit of 24 h. To save computational resources, we employed an early rejection strategy to reject particles based on a maximum runtime of 15 min for individual simulations not matching the data. The simulated trajectories in later generations of the inference process fitted the data for TYAP and NYAP mostly well (Fig. 6 (middle)). An exception was the pericentral locations (CV-PV zones 1–5) at the last time point t = 5.0 days, where the simulations couldn’t match the low concentrations present in the data, hinting at a transition to homeostasis and a time-dependence of some parameters over longer time scales of many days. The credible intervals indicate that all 14 parameters are identifiable (Fig. 6 (right)). Interestingly, the parameter describing the fluorescence intensity normalization (k14) was estimated close to one, which justifies the previous assumption that both microscopy-based observables can be treated with the same normalization (Meyer et al. 2020). Moreover, the ratios of rate constants for reversible reactions were found to match very well with the previous point estimates using the quasi-steady state assumption in (Meyer et al. 2020). Extending previous results, we here obtained estimates for the four rate constants k1, k5, k7, k8 and quantified the uncertainty of each estimate. Hence, our study underpinned previous conclusions (Meyer et al. 2020), but also demonstrated that the FitMultiCell pipeline allows directly for an uncertainty-aware study of multi-cellular processes.

4 Discussion

Quantitative data-based modeling of multi-cellular processes is challenging, because the models are often stochastic and computationally demanding. Furthermore, due to a lack of standards, the reusability of codes and pipelines is limited. Motivated by these issues, we developed the FitMultiCell pipeline, which covers the entire workflow of model development, simulation, and systematic parameter inference (on HPC or cloud infrastructures) based on standardized input formats. Thereby, it contributes to the accelerated testing of biological hypotheses.

We used the FitMultiCell pipeline to study various application problems. This demonstrated that the proposed pipeline is widely applicable and can recover and confirm previous results. Further, we showed that it can replace manual parameter tuning, thus accelerating and solidifying the quantitative modeling of multi-cellular processes. Its modular implementation scales to HPC infrastructures and thereby facilitates inference for computationally expensive problems.

While the FitMultiCell pipeline already provides the necessary features, there are multiple directions in which it can be developed further. (i) Statistical inference for computationally expensive models can become challenging even when using massive parallelization. To address this, the use of cheaper surrogate models could be investigated for guiding parameter search and reducing the number of model simulations (Prangle 2016, Prescott and Baker 2021). Furthermore, one could make use of the idle time at the end of generations, e.g. by pre-sampling the next generation and subsequently correcting for bias. (ii) Methods for the automatic construction of summary statistics need to be explored. While it is possible to construct summary statistics semi-automatically (Blum et al. 2013) for the type of image data frequently accompanying models of multi-cellular processes, automatic construction based on machine learning approaches, more specifically convolutional networks with prior information gained from transfer learning, could substantially improve statistics quality at decreased training cost. (iii) Extension of the FitMultiCell pipeline to additional modeling, simulation and parameter estimation frameworks will increase its applicability. While Morpheus and pyABC cover already a wide range of applications, an integration of further simulation tools [e.g. Swat et al. (2012), Mirams et al. (2013), Ghaffarizadeh et al. (2018), Merks et al. (2011)] and inference tools [e.g. Dutta et al. (2017), Kangasrääsiö et al. (2016)] could facilitate the use of complementary methods and thus grow the application spectrum further.

In conclusion, we illustrated that standardized workflows for the quantitative modeling of multi-cellular processes are feasible. The FitMultiCell pipeline and the standard PEtab-MS provide starting points for further development. Already in its current form, a broad spectrum of projects can profit from them to achieve systematic and scalable inference of unknown parameters.

Supplementary Material

btad674_Supplementary_Data

Acknowledgements

We acknowledge fruitful discussions with colleagues at Morpheus.lab and the Gauss Centre for Supercomputing e. V. (www.gauss-centre.eu) for providing computing time on the GCS Supercomputer JUWELS [Jülich Supercomputing Centre 2019] at Jülich Supercomputing Centre (JSC).

Contributor Information

Emad Alamoudi, Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany.

Yannik Schälte, Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany; Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany; Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany.

Robert Müller, Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.

Jörn Starruß, Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.

Nils Bundgaard, BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany.

Frederik Graw, BioQuant—Center for Quantitative Biology, Heidelberg University, Heidelberg 69120, Germany; Interdisciplinary Center for Scientific Computing, Heidelberg University, Heidelberg 69120, Germany; Department of Medicine 5, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen 91054, Germany.

Lutz Brusch, Center of Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.

Jan Hasenauer, Life and Medical Sciences Institute, University of Bonn, Bonn 53113, Germany; Institute of Computational Biology, Helmholtz Zentrum München—German Research Center for Environmental Health, Neuherberg 85764, Germany; Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Technische Universität München, Garching 85748, Germany.

Conflict of interest

None declared.

Funding

This work was supported by the German Federal Ministry of Education and Research (BMBF) [FitMultiCell grant nos. 031L0159B and 031L0159C, EMUNE grant nos. 031L0293C and 031L0293X]. J.H. acknowledges support by the German Research Foundation (DFG) under Germany’s Excellence Strategy (Project-ID 390873048—EXC 2151 and Project-ID 390685813—EXC 2047) and via the Schlegel Professorship for J.H. F.G. was additionally supported by the Chica and Heinz Schaller Foundation. Y.S. acknowledges financial support by the German Research Foundation (DFG) (Project-ID 432325352 - SFB 1454 Metaflammation) and the Joachim Herz Stiftung. L.B. was supported by the BMBF [LiSyM-Cancer/031L0258A].

References

  1. Anderson ARA, Quaranta V.. Integrative mathematical oncology. Nat Rev Cancer 2008;8:227–34. [DOI] [PubMed] [Google Scholar]
  2. Babtie AC, Stumpf MPH.. How to deal with parameters for whole-cell modelling. J R Soc Interface 2017;14: 20170237. 10.1098/rsif.2017.0237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beaumont MA, Zhang W, Balding DJ. et al. Approximate Bayesian computation in population genetics. Genetics 2002;162:2025–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bernardo RC, Said JL.. Towards a model-independent reconstruction approach for late-time Hubble data. J Cosmol Astropart Phys 2021;2021:027. [Google Scholar]
  5. Blum MG, Nunes MA, Prangle D, Sisson SA et al. A comparative review of dimension reduction methods in approximate Bayesian computation. Stat Sci 2013;28:189–208. [Google Scholar]
  6. Boutillon A, Escot S, Elouin A. et al. Guidance by followers ensures long-range coordination of cell migration through α-catenin mechanoperception. Dev Cell 2022;57:1529–44.e5. 10.1016/j.devcel.2022.05.001. [DOI] [PubMed] [Google Scholar]
  7. Bundgaard N, Harmel C, Imle A. et al. Inferring cell motility in complex environments with incomplete tracking data. bioRxiv, 10.1101/2022.02.10.479738, 2022, preprint: not peer reviewed. [DOI]
  8. Cantwell CD, Mohamied Y, Tzortzis KN. et al. Rethinking multiscale cardiac electrophysiology with machine learning and predictive modelling. Comput Biol Med 2019;104:339–51. 10.1016/j.compbiomed.2018.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Colom B, Herms A, Hall MWJ. et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 2021;598:510–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Coulier A, Hellander S, Hellander A. A multiscale compartment-based model of stochastic gene regulatory networks using hitting-time analysis. J Chem Phys 2021;154:184105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Durso-Cain K, Kumberger P, Schälte Y. et al. HCV spread kinetics reveal varying contributions of transmission modes to infection dynamics. Viruses 2021;13:1308. 10.3390/v13071308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dutta R, Schoengens M, Onnela J. et al. Abcpy: A user-friendly, extensible, and parallel library for approximate Bayesian computation. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC ’17, New York, NY: ACM, 2017, 8:1–8:9. ISBN 978-1-4503-5062-4. 10.1145/3093172.3093233. [DOI] [Google Scholar]
  13. Fearnhead P, Prangle D.. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc B 2012;74:419–74. [Google Scholar]
  14. Fletcher AG, Osborne JM.. Seven challenges in the multiscale modeling of multicellular tissues. WIREs Mech Dis 2022;14:e1527. 10.1002/wsbm.1527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Ghaffarizadeh A, Heiland R, Friedman SH. et al. Physicell: an open source physics-based cell simulator for 3-d multicellular systems. PLoS Comput Biol 2018;14:e1005991. 10.1371/journal.pcbi.1005991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hasenauer J, Jagiella N, Hross S. et al. Data-driven modelling of biological multi-scale processes. J Coupled Syst Multiscale Dyn 2015;3:101–21. [Google Scholar]
  17. Houston C, Marchand B, Engelbert L. et al. Reducing complexity and unidentifiability when modelling human atrial cells. Philos Trans A Math Phys Eng Sci 2020;378:20190339. 10.1098/rsta.2019.0339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Imle A, Kumberger P, Schnellbächer ND. et al. Experimental and computational analyses reveal that environmental restrictions shape HIV-1 spread in 3D cultures. Nat Commun 2019;10:2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jagiella N, Müller B, Müller M. et al. Inferring growth control mechanisms in growing multi-cellular spheroids of NSCLC cells from spatial-temporal image data. PLoS Comput Biol 2016;12:e1004412. 10.1371/journal.pcbi.1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jagiella N, Rickert D, Theis FJ. et al. Parallelization and high-performance computing enables automated statistical inference of multi-scale models. Cell Syst 2017;4:194–206.e9. [DOI] [PubMed] [Google Scholar]
  21. Johnston ST, Simpson MJ, McElwain DLS. et al. Interpreting scratch assays using pair density dynamics and approximate Bayesian computation. Open Biol 2014;4:140097. 10.1098/rsob.140097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Joslyn LR, Kirschner DE, Linderman JJ. et al. Calipro: a calibration protocol that utilizes parameter density estimation to explore parameter space and calibrate complex biological models. Cell Mol Bioeng 2021;14:31–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kangasrääsiö A, Lintusaari J, Skytén et al. ELFI: Engine for Likelihood-Free Inference. In: NIPS 2016 Workshop on Advances in Approximate Bayesian Inference, Barcelona, Spain, 2016.
  24. Keegan L, Andriushchenko P, Schreiner H. Spatial-model-editor/spatial-model-editor: 1.4.0, 2023. 10.5281/zenodo.8316368. [DOI]
  25. Kerr RA, Bartol TM, Kaminsky B. et al. Fast Monte Carlo simulation methods for biological reaction-diffusion systems in solution and on surfaces. SIAM J Sci Comput 2008;30:3126–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Klinger E, Rickert D, Hasenauer J. et al. pyABC: distributed, likelihood-free inference. Bioinformatics 2018;34:3591–3. [DOI] [PubMed] [Google Scholar]
  27. Köhn-Luque A, de Back W, Starruss J. et al. Early embryonic vascular patterning by matrix-mediated paracrine signalling: a mathematical model study. PLoS One 2011;6:e24175. 10.1371/journal.pone.0024175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kumberger P, Durso-Cain K, Uprichard SL. et al. Accounting for space—quantification of cell-to-cell transmission kinetics using virus dynamics models. Viruses 2018;10:200. 10.3390/v10040200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lewis SM, Asselin-Labat M-L, Nguyen Q. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat Methods 2021;18:997–1012. [DOI] [PubMed] [Google Scholar]
  30. MacLean AL, Filippi S, Stumpf MPH. et al. The ecology in the hematopoietic stem cell niche determines the clinical outcome in chronic myeloid leukemia. Proc Natl Acad Sci USA 2014;111:3883–8. https://doi.org/pnas.1317072111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Merks RMH, Guravage M, Inzé D. et al. Virtualleaf: an open-source framework for cell-based modeling of plant tissue growth and development. Plant Physiol 2011;155:656–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Meyer K, Morales-Navarrete H, Seifert S. et al. Bile canaliculi remodeling activates Yap via the actin cytoskeleton during liver regeneration. Mol Syst Biol 2020;16:e8985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Minucci JM, Curry R, DeGrandi-Hoffman G. et al. Inferring pesticide toxicity to honey bees from a field-based feeding study using a colony model and Bayesian inference. Ecol Appl 2021;31:e02442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Mirams GR, Arthurs CJ, Bernabeu MO. et al. Chaste: an open source C++ library for computational physiology and biology. PLoS Comput Biol 2013;9:e1002970. 10.1371/journal.pcbi.1002970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Prangle D. Lazy ABC. Stat Comput 2016;26:171–85. [Google Scholar]
  36. Prescott TP, Baker RE.. Multifidelity approximate Bayesian computation with sequential Monte Carlo parameter sampling. SIAM/ASA J Uncertain Quantif 2021;9:788–817. [Google Scholar]
  37. Pritchard JK, Seielstad MT, Perez-Lezaun A. et al. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 1999;16:1791–8. [DOI] [PubMed] [Google Scholar]
  38. Richmond P, Walker D, Coakley S. et al. High performance cellular level agent-based simulation with FLAME for the GPU. Brief Bioinform 2010;11:334–47. 10.1093/bib/bbp073. [DOI] [PubMed] [Google Scholar]
  39. Schälte Y, Alamoudi E, Hasenauer J. et al. Robust adaptive distance functions for approximate Bayesian inference on outlier-corrupted data. bioRxiv, 10.1101/2021.07.29.454327, 2021, preprint: not peer reviewed. [DOI]
  40. Schälte Y, Hasenauer J.. Efficient exact inference for dynamical systems with noisy measurements using sequential approximate Bayesian computation. Bioinformatics 2020;36:i551–9. 10.1093/bioinformatics/btaa397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Schälte Y, Hasenauer J. Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation. bioRxiv, 10.1101/2022.03.18.484896, 2022, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
  42. Schälte Y, Klinger E, Alamoudi E. et al. pyabc: efficient and robust easy-to-use approximate Bayesian computation. JOSS 2022;7:4304. 10.21105/joss.04304. [DOI] [Google Scholar]
  43. Schmiester L, Schälte Y, Bergmann FT. et al. PEtab—interoperable specification of parameter estimation problems in systems biology. PLoS Comput Biol 2021;17:e1008646. 10.1371/journal.pcbi.1008646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sisson SA, Fan Y, Beaumont M. et al. Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC, Boca Raton FL, 2018. [Google Scholar]
  45. Sottoriva A, Kang H, Ma Z. et al. A Big Bang model of human colorectal tumor growth. Nat Genet 2015;47:209–16. 10.1038/ng.3214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Starruß J, de Back W, Brusch L. et al. Morpheus: a user-friendly modeling environment for multiscale and multicellular systems biology. Bioinformatics 2014;30:1331–2. 10.1093/bioinformatics/btt772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Swat MH, Thomas GL, Belmonte JM. et al. Multi-scale modeling of tissues using CompuCell3D. Methods Cell Biol 2012;110:325–66. 10.1016/B978-0-12-388403-9.00013-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Toni T, Jovanovic G, Huvet M. et al. From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli. BMC Syst Biol 2011;5:69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Toni T, Stumpf MPH.. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 2010;26:104–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Vu HT-K, Mansour S, Kücken M. et al. Dynamic polarization of the multiciliated planarian epidermis between body plan landmarks. Dev Cell 2019;51:526–42.e6. 10.1016/j.devcel.2019.10.022. [DOI] [PubMed] [Google Scholar]
  51. Waclaw B, Bozic I, Pittman ME. et al. A spatial model predicts that dispersal and cell turnover limit intratumour heterogeneity. Nature 2015;525:261–4. 10.1038/nature14971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wilkinson MD, Dumontier M, Aalbersberg IJJ. et al. The fair guiding principles for scientific data management and stewardship. Sci Data 2016;3:160018. 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btad674_Supplementary_Data

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES