Summary
Bayesian inference is an important method in the life and natural sciences for learning from data. It provides information about parameter and prediction uncertainties. Yet, generating representative samples from the posterior distribution is often computationally challenging. Here, we present an approach that lowers the computational complexity of sample generation for dynamical models with scaling, offset, and noise parameters. The proposed method is based on the marginalization of the posterior distribution. We provide analytical results for a broad class of problems with conjugate priors and show that the method is suitable for a large number of applications. Subsequently, we demonstrate the benefit of the approach for applications from the field of systems biology. We report an improvement up to 50 times in the effective sample size per unit of time. As the scheme is broadly applicable, it will facilitate Bayesian inference in different research fields.
Subject areas: Biological sciences, Mathematical biosciences, Systems biology
Graphical abstract
Highlights
-
•
Bayesian posterior distribution inference for mathematical models can be challenging
-
•
Our mathematical approach applies marginalization to reduce parameter dimensionality
-
•
Our method increases the effective sample size per unit of time for all tested models
-
•
Particularly beneficial for multi-modal posterior problems
Biological sciences; Mathematical biosciences; Systems biology
Introduction
Mathematical models are important tools for understanding and predicting the dynamics of many processes, such as signaling processing in biological systems,1,2,3 patient progression,4,5 and epidemics.6,7 However, the parameters of mathematical models are in general unknown and need to be inferred from experimental data. This is an inherently challenging problem and complicated by the fact that, in addition to the dynamical properties of interest (e.g., rate constants and initial conditions), characteristics of the measurement process may also be unknown. In systems biology, most measurement techniques, including western blotting,8 fluorescence microscopy,9 and mass spectrometry,10 are not fully quantitative but provide only relative information. Moreover, there is often an unknown offset and/or noise level.11 Accordingly, unknown observation parameters, such as scaling factors but also offsets and noise levels, have to be estimated along with parameters of the mathematical models.12,13,14
Bayesian inference is often used to estimate unknown parameters.15,16,17 A particularly common approach is to employ Markov chain Monte Carlo (MCMC) algorithms, such as (adaptive) Metropolis-Hastings,18 Hamiltonian Monte Carlo methods,19,20 and parallel tempering,21 to generate representative samples from the posterior distribution. Yet, with increasing number of unknown parameters, the application of MCMC algorithms becomes challenging.22 This is a bottleneck that leaves sampling methods on the edge of computational feasibility. In principle, the challenge can be addressed by reducing the dimensionality of the sampling problem, e.g., by marginalizing over nuisance parameters (as, e.g., demonstrated in cosmology23). However, there is no generic and broadly applicable framework.
In frequentist inference, a template for the reduction of the dimensionality of parameter estimation problems has been provided.14,24,25 Here, hierarchical optimization approaches have been developed to determine the maximum likelihood estimate. These methods exploit that the observation parameters can be computed analytically for a given set of model parameters. It has been shown that this benefits the convergence of optimization methods and the computational efficiency, while providing the same results (see, e.g., Loos et al.24). Yet, these concepts cannot be directly translated to Bayesian inference as we are not interested in only optimal point estimates, but in (marginal) posterior distributions over parameters.
In this manuscript, we introduce a generic method for improving sampling efficiency by marginalizing over observation parameters. We provide analytical results for the marginalization over complex posterior distributions for dynamical biological processes—described, e.g., by ordinary differential equations (ODEs)—with a broad class of observation models. The marginalization yields a lower dimensional posterior for MCMC sampling. Samples of the original posterior can be obtained by subsequent sampling of the observation parameters conditioned on the remaining parameters. To illustrate the properties of the proposed approach, we benchmark its performance with a collection of published models, including models for which current available sampling strategies are computationally infeasible. We demonstrate that the proposed method achieves higher sampling efficiencies by reducing the auto-correlation of the samples and increasing the transition probabilities between posterior modes. Indeed, it turns computationally infeasible sampling problems feasible, increasing the set of problems which can be tackled using Bayesian inference.
Results
Many model structures allow for analytical marginalization of parameters and sampling in lower dimensional space
To facilitate Bayesian inference for mathematical models with observation parameters, we developed and implemented a marginalization-based sampling approach (Figure 1). The approach allows for inferring the parameters of mathematical models, such as ODEs and partial differential equation models, from data via observation models with scaling, offset, and noise parameters. For a mathematical model with parameter θ and time- and parameter-dependent states , consider the case of a one-dimensional observable with additive Gaussian measurement noise and observation model
(Equation 1) |
in which describes the measured quantity, s is the scaling factor (), b is the offset (), and is the variance of the measurement noise (). A collection of measurements at time points , with , is denoted as data . Following Bayes’ theorem, the posterior distribution of the parameters given the data is
(Equation 2) |
in which denotes the likelihood, denotes the prior distribution, and denotes the marginal probability.
Figure 1.
Standard and marginalization-based Markov chain Monte Carlo sampling
(A) Illustration of the general marginalization concept.
(B) Standard approach.
(C) Marginalization-based approach depicting: (Step 1) the sequential integration of the observation parameters s, b, and to evaluate , and (Step 2) the (optional) conditional sampling of the marginalized observation parameters.
The standard approach is to use MCMC methods to obtain representative samples from the joint posterior distribution for model parameters θ and observation parameters s, b, and (2) for subsequent analysis (Figure 1B). All parameters are sampled jointly, disregarding their nature (Figure 1B); in particular note that the state and the value of the observation map only depend on θ but not on s, b, or . This approach is often challenging and even infeasible for models with large datasets since the number of observation parameters can easily exceed the number of model parameters (see, e.g., Bachmann et al. and Raimúndez et al.26,27).
To simplify the sampling process, we propose a marginalization-based approach, which exploits a decomposition of the sampling problem in two steps (Figure 1C). In Step 1, we consider the marginalization of the posterior distribution (2) with respect to the observation parameters s, b, and , yielding
with as the marginal likelihood given by
(Equation 3) |
assuming that the prior can be written as . For various choices of noise models and prior distributions (in particular conjugate priors), this marginal likelihood can be computed in closed form. This is for instance the case for the combination of additive Gaussian noise with a joint prior distribution for s, b, and ,
in which and denote hyperparameters of the Normal-Inverse-Gamma-distributed joint prior, and denotes the Inverse-Gamma function. The hyperparameters might depend on θ. Here, we obtain for observations with the closed-form expression for the marginal likelihood as
(Equation 4) |
with and parameter-dependent constant
As the Normal-Inverse-Gamma prior is a conjugate prior for additive Gaussian noise, the marginal likelihood is analytically tractable. There are various other cases, including multiplicative Gaussian noise and even distributions with outliers. For the latter, Laplacian noise has shown to be more robust against measurement outliers.28 Tables S1 and S2 summarize ten practically relevant cases for which we obtained closed-form expressions, and we are certain that many more are possible. For details on the derivation of all individual results (including two cases for Laplace distributed noise), we refer to the supplemental data.
Given the marginalized likelihood function and the prior , the posterior distribution of the parameters of the mathematical model can be sampled using MCMC and related methods. The sampling can be performed in the space of θ, as the observation parameters are implicitly considered (Figure 1C).
The samples of model parameters θ from allow for the assessment of the model properties and its uncertainties. In this regard, there is no difference of sampling the marginalized posterior distribution compared to projecting the full posterior distribution onto the θ component. However, tasks like the assessment and plotting of the model-data mismatch also require the posterior of the observation parameters. These can be obtained by sampling from the conditional distribution . As the observation parameters only influence the observation model (1) and not the calculation of state and observable map , the conditional distribution can be expressed in closed form and sampled efficiently. For the aforementioned case, a matching sample of observation parameters for a given model parameter θ can be obtained by drawing from Gamma and Normal distributions:
with and C being evaluated for model parameter θ. This conditional sampling can be proven to provide the same correlation structure as directly sampling the full posterior distribution. For details on the derivation of the conditional sampling for the observation parameters we refer to the supplemental data. As the conditional sampling can be performed independently and does not require model simulation, it is computationally efficient. For additional observation models see Tables S1 and S2.
In summary, a broad spectrum of parameter estimation problems can be reformulated by performing an analytically tractable marginalization of their observation parameters. Sampling of this lower dimensional posterior distribution for the model parameters θ in combination with conditional sampling for the observation parameters allows the construction of samples from the full posterior distribution. Accordingly, the original sampling problem is decomposed in two sub-problems, of which the conditional sampling is optional.
Marginalization-based approach yields same results at lower computational cost
To compare the performance for the standard and marginalization-based approach, we performed a range of studies using (i) a simple test problem and (ii) published models and datasets.
As a simple test problem we considered a model of a conversion reaction process, . This process was considered in various other publications28,29 and can be described using a two-dimensional system of ODEs, with the concentrations of A and B as state variables. Here, we considered that the abundance of B is measured up to an unknown scaling, offset, and noise level. Accordingly, the mathematical model possesses two model parameters, the forward rate A to B, , and the backward rate B to A, , and three observation parameters, the scaling s, the offset b, and the noise variance (Table 1). A detailed description of the model is provided in the STAR methods section.
Table 1.
Key numbers and features of the considered toy and benchmark models
Model ID | Description | Reference | ||||
---|---|---|---|---|---|---|
Toy ![]() |
2 | 1 | 1 | 1 | Conversion reaction | – |
M1 ![]() |
13 | 3 | – | – | EGF-AKT pathway | Fujita et al.37 |
M2 ![]() |
6 | 3 | – | 3 | STAT5 dimerization | Boehm et al.38 |
M3 ![]() |
3 | 1 | – | 1 | mRNA transfection | Leonhardt et al.39 |
M4 ![]() |
26 | 31 | – | – | Gastric cancer signaling | Villaverde et al.40 |
The number of unknown model parameters , unknown scaling parameters , unknown offset parameters , and unknown noise parameters , which are effectively sampled, are reported.
In the first step, we used the model to assess the correctness of the analytical marginalized likelihood (4) by comparing its agreement with numerical integration of Equation 3. The results show a perfect match for a range of different parameter values (Figure 2A). Yet, the evaluation of the analytical marginalized likelihood was five orders of magnitude faster than the numerical integration (Figure 2B), which highlights the importance of the analytical derivations. In the second step, we performed 100 independent MCMC sampling runs for the standard and marginalization-based approach. The runs employed a state-of-the-art adaptive Metropolis-Hastings method.18 We found a superior performance of the marginalization-based approach, as the observed effective sample size per unit of time was twice as high as for the standard approach (Figure 2C). This indicates that the marginalization-based approach facilitates already for simple problems the mixing of the MCMC chains and, hence, provides a more efficient exploration of the posterior. Moreover, the model fit for the best sample found (i.e., maximizing the posterior) coincided for both approaches (Figure 2D) as well as the marginal distributions for the model parameters and (Figures 2E and 2F), and the conditionally sampled observation parameters (Figures 2G–2I).
Figure 2.
Evaluation of the standard and marginalization-based approach for the toy model
(A) Comparison of analytical vs. numerical integration.
(B) Time comparison of analytical vs. numerical integration.
(C) Effective sample size per unit of time for 100 independent runs.
(D) Model fit of the best sample found during sampling from the standard (orange) and marginalization-based (purple) approach.
(E–I) Parameter marginal posterior distributions computed using a kernel density estimate for the model parameters (E) and (F) , and the conditionally sampled observation parameters: (G) scaling factor s, (H) offset b, and (I) noise variance .
Following the promising results for the test problem, we evaluated the performance of the proposed marginalization-based approach for three already published models and datasets (Table 1 and STAR methods section). The models M1 to M3 describe cellular processes: (M1) epidermal growth factor (EGF)-induced protein kinase B (AKT) signaling; (M2) phosphorylation-dependent STAT5 dimerization; and (M3) mRNA transfection. The numbers of model and observation parameters differ, and so do the observation functions. Accordingly, different closed-form expressions for the marginalized likelihood function are used (Tables S1 and S2). More importantly, the full posterior distributions exhibit different characteristics, ranging for instance from uni- to bimodal.
For the considered application problems, the marginalization of the observation parameters reduced the dimensionality of the sampling problems by up to 50% (ranging from 19% to 50%) (Figure 3A). The validity of the analytical expressions for marginalized likelihoods was again confirmed using numerical integration (Figure S1). To evaluate the impact of this reduction on the sampling efficiency, we performed 50 independent MCMC sampling runs using the parallel tempering algorithm with 10 temperatures.21 All the runs were initialized at parameter values maximizing the posterior probability which were found using multi-start optimization.12 For M1 and M2, these maximum a posteriori (MAP) estimates were unique, while for M3, two MAP estimates were found with identical posterior values. The sampling was run for iterations. Further details are provided in the STAR methods section. The high number of iterations allowed all MCMC runs of the standard and marginalized problem to converge according to the Geweke test.30 Yet, the marginalization-based approach achieved a higher effective sample size per unit of computation time than the standard approach (Figure 3B). The improvement was problem dependent and ranged from 2 (M1 and M2) to nearly 50 (M3) times higher efficiency in the marginalization-based approach. As the computation time was similar, the core reason for this is a reduction in the auto-correlation length (Figure 3C). The model fits for the best sample found were identical for both approaches (Figures 3D, S2 and S3) as well as the parameter marginal distributions (Figures S4–S6).
Figure 3.
Evaluation of the standard and marginalization-based approach for the benchmark models
Models M1–M3 are shown from left to right.
(A) Number of sampled parameters.
(B) Effective sample size per unit of time.
(C) Auto-correlation length.
(D) Model fit of the best sample found during sampling. A subset of the experimental data is shown for M1 and M2. Complete datasets and parameter marginal distributions are depicted in Figures S2–S6. A comparison of analytical vs. numerical integration is shown in Figure S1.
In summary, test and application problems demonstrate the acceleration potential of the marginalization-based approach. The improvement was problem specific, with no clear dependence on the degree of dimensionality reduction, but in all cases substantial.
Marginalization-based approach improves transition rates between posterior modes
To understand for which problems the marginalization-based approach is expected to achieve a large acceleration, we considered the model M3. The posterior distribution for M3 is bimodal, and a simple explanation for the acceleration would have been that the bimodality is eliminated. Yet, this is not the case as the bimodality is related to a symmetry in model parameters. Numerical simulations as well as analytical results reveal that the observable trajectory remains unchanged when the mRNA and protein degradation rates are interchanged. As long as the optimal point is not located on the line of equal degradation rates, standard and marginalized posterior are bimodal.
We hypothesized that the large efficiency improvement is related to a lower minimum energy path for the transitions in the marginalized posterior. To assess this, we computed the minimum energy paths31 for the standard (Figures 4A and 4B) and marginalized posterior (Figures 4C and 4D) (see details in the STAR methods section). To our surprise, the minimum energy path is almost identical for both approaches (Figure 4E). Hence, there is at least no difference in the minimum energy path.
Figure 4.
Comparison of the minimum energy path for model M3
(A–D) Landscape of the optimized (A and B) posterior and (C and D) marginalized posterior for different fixed values of the model parameters β and δ. The difference with respect to the maximal posterior value is depicted.
(E) Transition coordinates for the minimum energy path.
In order to understand the improvement observed for runs of adaptive parallel tempering methods, we performed 10 runs of a single-chain adaptive Metropolis algorithm18 with iterations for exploring the posterior (). We expected this to simplify the interpretation. Yet, the adaptive Metropolis algorithm was essentially unable to transition between the two modes of the posterior, meaning that efficiency improvements could not be assessed with reasonable computation time (see in Figure 5A). To assess the relative complexity of the sampling problem for standard and marginalization-based approach, we repeated the evaluation with the single-chain adaptive Metropolis algorithm for the tempered posterior, keeping the temperature fixed for a specific run. We found that the marginalization-based approach allows already at lower temperatures for transitions between the modes unlike the standard sampling approach (Figures 5A, S8, and S9). For temperatures such as , the standard approach showed an average number of only 5 transitions between the modes with many runs only sampling from a single mode (Figures 5B and 5C), while for the marginalization-based approach on average transitions occurred (Figures 5D and 5E). As the minimum barrier energy is conserved also for higher temperatures (Figure S7), this increase in the transition rate by four orders of magnitude for the same algorithm implies a lower overall complexity of the marginalization-based sampling problem.
Figure 5.
Quantification of the transitions between the posterior modes for different temperatures T for model M3
(A) Number of transitions per iterations for a range of temperatures for the standard (orange) and marginalization-based (purple) approach. A total of 10 chains per temperature value are depicted.
(B and D) Marginal distribution computed using a kernel density estimate and (C,E) parameter trace for the model parameter β of a representative chain obtained with the (B,C) standard and (D and E) marginalization-based approach for .
(F and G) Direct transitions between the posterior modes of a representative chain along with the minimum energy path obtained with the (F) standard and (G) marginalization-based approach for . See also Figures S7–S9.
As the increased transition rate is not caused by an altered energy path, we studied the transition paths. This revealed that the employed single-chain algorithm facilitates jumps over the valley in the objective function (Figures 5F and 5G), meaning that it transitions between high-probability regions around the local optima. These direct transitions appear at a high rate for the marginalization-based approach (Figure 5G), while they rarely happen for the standard approach (Figure 5F). For the latter, most transitions are along low-energy paths with posterior probabilities dropping below the minimum energy path. Accordingly, the transition behavior is for the marginalization-based approach more efficient than for the standard approach.
In summary, the in-depth study of the mRNA transfection model (M3) showed that the marginalization-based approach can achieve substantial accelerations as the structure of the sampling problem is simplified, e.g., by facilitating transitions between modes. The improvements are related to the interplay of sampling approach and problem geometry. In particular for challenging (e.g., multi-modal) problems a much greater improvement could be observed.
Marginalization-based approach enables Bayesian inference for large models
As the marginalization-based approach appeared beneficial for challenging problems, we assessed in a next step whether it enables Bayesian inference for problems for which standard approaches did not provide reproducible results in a reasonable time frame. Specifically, we considered an ODE model for signal transduction in gastric cancer cells (cell line MKN1) that was developed to unravel response and resistance markers.27 This model possesses in total 57 unknown parameters, of which 26 are model parameters and 31 are observation parameters (Table 1, M4).
The application of the marginalization-based approach resulted in a reduction of the dimensionality of the sampling problem by over 50% (Figure 6A). For the 26 model parameters which remain to be sampled, we compared the marginal likelihoods as computed using the previously derived analytical formulas and numerical integration (Figure 6B). The agreement of the results (Pearson correlation ) confirmed the correctness of our analytical integration.
Figure 6.
Convergence of the marginalization-based approach for model M4
(A) Number of sampled parameters.
(B) Scatterplot for the agreement of analytical and numerical integration.
(C and D) Model fit of the best sample found during sampling for, (C) a subset of the experimental data represented as mean +/− standard deviation and (D) the complete dataset in form of a scatterplot, the standard (orange) and marginalization-based approach (purple). (E–H) Results from adaptive Metropolis (top) and parallel tempering (bottom) are shown.
(E and F) Parameter marginal posterior distribution obtained using the (E) standard and (F) marginalization-based approach computed using a kernel density estimate for model parameter .
(G and H) Dimensionality reduction for all samples from all runs for the (G) standard and (H) marginalization-based approach using the UMAP representation. Different shades correspond to individual runs. The UMAPs were constructed using the Python package umap.32 See also Figures S10–S12.
To determine the parameters of the model, we performed sampling using standard and marginalization-based approach. The adaptive Metropolis-Hastings algorithm18 and the adaptive parallel tempering algorithm21 employed in the previous sections were run 10 times with different starting points and random seeds for iterations for the adaptive Metropolis-Hastings and iterations for the adaptive parallel tempering algorithm. We found that while all runs in the marginalization-based approach (and for both sampling algorithms) successfully finished within a run time limit of 7 days, only 7 out of 10 runs successfully finished for the standard approach for each sampling algorithm. The MAP estimates observed in the different runs provided similar fits (Figures 6C and 6D). In contrast, the marginal distributions of the model parameters differed, with the marginalization-based approach mostly providing broader parameter distributions than the standard approach (Figures 6E and 6F). The assessment of the reproducibility of the marginal distributions revealed a high variability between different runs performed using the standard approach (Figures 6E and S10). On the contrary, for the marginalization-based approach a good agreement between runs was observed (Figures 6F and S11), indicating reproducibility. To verify that the behavior observed for the individual parameters is maintained in the full parameter space, we analyzed the overall agreement of all parameter samples across all runs for the standard and marginalization-based approach by visualizing the samples using the uniform manifold approximation and projection (UMAP) representation.32 We found that the individual runs of the standard approach represent individual clusters in the UMAP (Figure 6G), while the individual runs of the marginalization-based approach were indistinguishable (Figure 6H). This finding was supported by the distribution of the nearest neighbors (Figure S12). This revealed that: (i) in the marginalization-based approach all the individual runs sample from the same distribution and (ii) the standard approach failed for both algorithms considered here.
The study of the model of signal processing in gastric cancer cells revealed that marginalization-based approach allows for reproducible sampling in problems, where the standard approach failed. While for the marginalization-based approach all runs provided consistent results, the standard approach failed to converge within an average central processing unit (CPU) time of 150 h rendering its application impracticable. Furthermore, our study provides improved estimates for the parameters (Figure S13) of important processes of a drug used in clinical practice.
In summary, the application of our marginalization-based approach to Bayesian inference for models with relative measurement data shows consistently that our approach yields the same marginal distributions for the parameters as the standard approach, while being highly more efficient in exploring the parameter space and enabling Bayesian inference of larger models, which was not possible before with the standard approach.
Discussion
Bayesian inference for models of biological processes requires the consideration of parameters of the dynamical systems as well as the measurement process. The unknown scaling factors, offsets, and noise levels often resemble a large fraction of the overall parameters.12 This complicates sampling and can render the generation of representative samples practically infeasible. Here, we address this challenge by proving that an (analytical) marginalization of the posterior even for dynamical models, for common observation and noise models, and plausible priors. We provide analytical result for additive normal, multiplicative log-normal, and additive Laplace noise for different choices of unknown parameters and priors. This approach allows for the construction of a sample from the full posterior by (i) sampling a marginalized posterior for the parameters of the dynamical systems and (ii) conditional sampling of the observation parameters.
We evaluated the performance of our marginalization-based approach and compared it to the standard approach for four published models, with differences in their complexity. This revealed an increased effective sample size per unit of time, and increased transition probabilities between posterior modes. The marginalization-based approach was for all considered problems more efficient than the standard approach, but—more importantly—it also enabled the assessment of the posterior distribution for larger models for which the standard approach failed to converge in the considered time frame. Interestingly, there was no strong relation between the reduction of the problem dimensionality and the improvement in efficiency. The improvement seems to rather depend in the characteristics of the marginalized posterior and the interplay of these characteristics with the employed sampling algorithm. This is consistent with previous finding for hierarchical optimization,25 where a minimal reduction of the problem dimensionality was shown to substantially improve the conditioning of the optimization problem. Based on our observations we expect the sampling behavior to benefit substantially even from the removal of a small number of parameters, as (i) the likelihood value is often very sensitive to them, which produces narrow rims in the posterior distribution, and as (ii) the removal of a small number of parameters can result in a substantially increased probability to jump between modes. The latter was observed for the model of mRNA transfection. A review of the PEtab benchmark collection33 showed that 20 out of 30 dynamical models used in systems biology and medicine possess unknown observation parameters. Hence, a large number of modeling projects could profit from the approach.
The approach presented here is not limited to relative measurement data, but also applicable to absolute measurements. As for these, the noise parameters would still have to be inferred (Tables S1 and S2). We provide the detailed derivation in the supplemental data. Accordingly, our approach can be used for combinations of relative and absolute data. Also, it is applicable to different measurement process functions and noise models to the ones considered here. We hypothesize that also an extension to correlated noise is possible, but this remains to be assessed.
The choice of conjugate priors for the marginalized parameters eased the analytical derivation of the marginal posterior. This implies in our case that observable and noise parameters are not independent under the prior. Mostly, this is not a problem since both parameters are related to the measurement process. However, in some cases, there might be known parameters to be independent; therefore, other prior distribution assumptions must be considered. It should be noted that the concept of marginalization is not restricted to integrals that are analytically solvable, but also numerical integration schemes can be considered. However, this would increase the required computation time (as observed in Figure 2B), but very likely the improved mixing properties would be maintained. Whether the improved mixing out-weights the increased computational cost will be problem dependent but might not be unlikely (and would have been the case for the mRNA transfection model (M3)) as the numerical integration over observation parameters will not require numerical simulations of the model. In future research projects this question should be tackled via a comprehensive benchmarking. Similarly, while in this manuscript only cases were presented in which the conditional sampling of observation parameters was straightforward due to the use of conjugate priors, the approach is also applicable if this does not hold. In this case the sampling of the parameters θ is not impaired, but MCMC sampling or rejection sampling might need to be used to obtain sample for the observation parameters.
The proposed method was beneficial in combination with adaptive Metropolis-Hastings and adaptive parallel tempering algorithms. We expect that the same will hold true for sampling algorithms exploiting gradient information, such as Hamilton Monte Carlo sampling.19,20 As the marginal likelihood is differentiable, merely the derivation and implementation of the gradient are required. The usage of methods which exploit the Riemann geometry of the parameter space of statistical models, e.g., Metropolis-adjusted Langevin algorithm,34 might be slightly more involved. This requires the derivation of the marginalized Fisher information matrix. While we assume that this can be derived in closed form or at least be accurately approximated, the corresponding results are not yet available. Alternatively, automatic differentiation could be employed to obtain gradients.35 The assessment of the impact of posterior marginalization on the performance of these samplers as well as other sampling methods would be highly beneficial but is beyond the scope of this work.
In this study, we focused on the assessment of parameter uncertainties for ODE models. Yet, as the marginalization-based approach provides a complete parameter sample, it facilitates also the evaluation of prediction uncertainties.16 Accordingly, we expect that it might contribute to resolving reliability problems of Bayes prediction uncertainty analysis encountered in recent studies.36 Furthermore, the proposed approach is not limited to ODEs, but directly applicable for other deterministic models, e.g., partial differential equations.
In summary, the marginalization-based approach provides a new tool for Bayesian inference for models with observation-related parameters. It substantially benefits the efficiency of sampling-based approaches and renders the generation of representative posterior samples for large models possible. As it is agnostic to the structure of the underlying dynamical model, it is widely applicable to mathematical models from different research fields, such as engineering, physics, and ecology.
Limitations of the study
This study has three main limitations. The first limitation is the number of models that were considered in the study. The extrapolation of these results when testing on 4 published models may be not applicable to all models, and behavioral exceptions may occur. However, when selecting our candidate models we tried to cover different degrees of complexities and structures. Similarly, this applies to the sampling algorithms used. Secondly, our approach may be in principle applicable to other model types, such as partial differential equations. While we expect to get similar results, this remains to be evaluated. Lastly, the specification of such a “constrained” observation model is another limitation. Ideally, the approach could be combined with automatic-differentiation schemes to flexibly facilitate the use of multiple observation models.
STAR★Methods
Key resources table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Deposited data | ||
Code for mathematical modeling, analysis and visualization used in the manuscript | This paper | https://doi.org/10.5281/zenodo.7199473 |
Software and algorithms | ||
Python version 3.10 | Python Software Foundation | https://www.python.org |
AMICI (Python package) | Github | https://github.com/AMICI-dev/AMICI |
pyPESTO (Python package) | Github | https://github.com/ICB-DCM/pyPESTO |
Other | ||
Benchmark PEtab model repository collection | Github | https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab |
Resource availability
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jan Hasenauer (jan.hasenauer@uni-bonn.de).
Materials availability
This study did not generate new unique reagents.
Method details
Mechanistic modeling of biological systems
We consider models based on ODEs of the form
in which the vector field determines the temporal evolution of the states . The unknown model parameters, which are estimated from the measurements, are denoted by . Usually, θ includes reaction rate constants and initial amounts of species. Here, is the total number of modeled species, and the total number of model parameters. The states and model parameters θ are linked to the observables via the observation map , where is the total number of observables. The observables are the measured properties of the model. Most measurement techniques only provide relative information about the absolute concentrations of interest8,9 and, frequently, measurements are noise corrupted. Hence, to obtain the measurements (i) the model observables must be rescaled by introducing scaling factors and offsets, and (ii) the model also must capture experimental errors by defining a noise model. Most commonly, independent and additive Gaussian distributed noise models are assumed
(Equation 5) |
with observable index j, time index i, scaling factors , offsets , and noise parameters . Here, denotes the total number of time points. These parameters are often unknown and, therefore, also need to be estimated along with the unknown model parameters. Other usual noise assumptions include log-normal distributed noise models11 and Laplace distributed noise models.28 In this study, we focus on the case of additive Gaussian noise (5), but implementations for log-normal and Laplace distributed noise models are provided in Tables S1 and S2 and supplemental data.
We denoted the group of all measurements as .
Benchmark models
For the evaluation of the marginalization-based approach, we employed in total five models (one toy model and four published M1–M4) and their corresponding datasets (Table 1).
Toy: Model of a conversion reaction
The conversion reaction model was introduced in28 and describes a reversible chemical reaction, which converts a biochemical species A to a species B with rate , and B to A with rate (Figure 2). We modified the observation model to include scaling and offsets.For the evaluation of the proposed method, we generated one artificial dataset which is depicted in Figure 2D. For details on the model structure and synthetic data generation we refer to the supplemental data.
M1: Model of EGF-dependent AKT pathway
The model of EGF-dependent AKT pathway has been introduced in37 and possesses in total 16 unknown parameters: 13 model parameters and 3 scaling factors (Table 1, M1). The available experimental data are a total of 144 data points under 6 different experimental conditions for 3 observables. For each data point, the corresponding variance of the measurement noise is provided, therefore it does not need to be estimated. The complete dataset is depicted in Figure S2.
M2: Model of STAT5 dimerization
The model of STAT5 dimerization has been introduced in38 and possesses in total 9 unknown parameters: 6 model parameters and 3 noise parameters. To this model, we have added 3 scaling factors (Table 1, M2), one per observable, for the sake of testing the proposed method. The available experimental data are a total of 48 data points for 3 observables. The complete dataset is depicted in Figure S3.
M3: Model of mRNA transfection
The model for mRNA transfection has been introduced in39 and possesses in total 5 unknown parameters: 3 model parameters, 1 scaling factor, and 1 noise parameter (Table 1, M3). The complete dataset is depicted in Figure 3D. For further details of the model structure we refer to the supplemental data.
M4: Model of gastric cancer signaling
The model for gastric cancer signaling has been introduced in.27 Here, we considered the Cetuximab responder cell line MKN1. The available experimental data for the responder cell line were a total of 303 data points under 106 different experimental conditions for 31 observables. For each data point, the corresponding variance of the measurement noise was provided, therefore it did not need to be estimated.
For all models we used the parameter ranges and prior distributions introduced in the original publications. The priors are mostly uninformative.
Parameter optimization
To determine the maximum a posteriori (MAP) estimates, we minimized the negative log-posterior function. This minimization was performed using multi-start local optimization, an approach which was previously shown to be reliable.12,40 For local optimization, we used the trust-region optimizer fides.41 Parameters were -transformed to improve numerical properties.40,42,43 We generated 100 starting points for local optimization, except for model M4 for which we used 500 starting points.
Bayesian parameter inference
To perform Bayesian parameter inference, we used MCMC sampling following the pipeline presented in.44 Similar to parameter optimization, sampling was performed using -transformed parameters. The MAP estimates for the full problem (aka without marginalization were used to initialize the MCMC chains44: all runs for the standard sampling approach were initialized using the full optimal vector (found using multi-start local optimization); while for all runs for the marginalization-based sampling approach were initialized using the corresponding subset . Note that for the runs for the marginalization-based sampling approach also the MAP estimate for the marginalized problem could have been used, yet, differences were minor and the chosen approach allowed us to match runs of standard and marginalization-based sampling approach. The parameter posterior distribution was sampled using the adaptive Metropolis18 and parallel tempering45,46 algorithms implemented in the Python toolbox pyPESTO.47 For the parallel tempering algorithm, we used 10 chains initialized. For all runs of the parallel tempering algorithm, we initialized the first chain – which samples the posterior – with the best optimization result found using multi-start local optimization, the second chain with the second best optimization result, and so on.
Convergence after burn-in was assessed using the Geweke test30 and auto-correlation length using Sokal’s adaptive truncated periodogram-estimator.48 Both methods are implemented in pyPESTO and we refer to the respective original publications for technical details. The effective sample size is given by
where n is the number of samples remaining after discarding burn-in period, and is the estimated auto-correlation at lag τ.
For all models, the prior hyperparameters for both sampling approaches were the same as used for optimization.
Tempering scheme for the posterior analysis
The posterior for standard and marginalization-based approach were tempered to assess transition characteristics (Figure 5). We used the tempered posteriors
and
with temperature T.
Acknowledgments
This work was supported by the German Federal Ministry of Education and Research (Grant no. 031L0159C; J.H.), the University of Bonn (via the Schlegel Professorship; J.H.), the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2047/1 - 390685813 (E.R., M.F., J.H); EXC 2151 - 390873048 (E.R., J.H.); TRR 333/1 - 450149205 (E.R., J.H.); SFB 1454 - 432325352 (M.F.); and 443187771 (J.H.).
Author contributions
Conceptualization: J.H., E.R.; Methodology: J.H., E.R., M.F.; Software: E.R.; Formal analysis: E.R.; Investigation E.R., M.F.; Data curation: E.R.; Writing – original draft: J.H., E.R.; Writing – review and editing: all authors; Visualization: E.R.; Supervision: J.H., E.R.; Funding acquisition: J.H.
Declaration of interests
The authors declare no competing interests.
Published: September 28, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2023.108083.
Supplemental information
This supplemental data contains a step-to-step guide on how to obtain all marginalization cases considered in the main manuscript and additional ones.
Data and code availability
-
•
This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available as of the date of publication. DOIs are listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
References
- 1.Kitano H. Systems biology: A brief overview. Science. 2002;295:1662–1664. doi: 10.1126/science.1069492. [DOI] [PubMed] [Google Scholar]
- 2.Klipp E., Herwig R., Kowald A., Wierling C., Lehrach H. Wiley-VCH; 2005. Systems Biology in Practice. [Google Scholar]
- 3.Schöberl B., Pace E.A., Fitzgerald J.B., Harms B.D., Xu L., Nie L., Linggi B., Kalra A., Paragas V., Bukhalid R., et al. Therapeutically targeting ErbB3: A key node in ligand-induced activation of the ErbB receptor–PI3K axis. Sci. Signal. 2009;2:ra31. doi: 10.1126/scisignal.2000352. [DOI] [PubMed] [Google Scholar]
- 4.Fey D., Halasz M., Dreidax D., Kennedy S.P., Hastings J.F., Rauch N., Munoz A.G., Pilkington R., Fischer M., Westermann F., et al. Signaling pathway models as biomarkers: Patient-specific simulations of JNK activity predict the survival of neuroblastoma patients. Sci. Signal. 2015;8:ra130. doi: 10.1126/scisignal.aab0990. [DOI] [PubMed] [Google Scholar]
- 5.Hass H., Masson K., Wohlgemuth S., Paragas V., Allen J.E., Sevecka M., Pace E., Timmer J., Stelling J., MacBeath G., et al. Predicting ligand-dependent tumors from multi-dimensional signaling features. NPJ Syst. Biol. Appl. 2017;3:27. doi: 10.1038/s41540-017-0030-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Giordano G., Blanchini F., Bruno R., Colaneri P., Di Filippo A., Di Matteo A., Colaneri M. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nat. Med. 2020;26:855–860. doi: 10.1038/s41591-020-0883-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhao S., Chen H. Modeling the epidemic dynamics and control of COVID-19 outbreak in China. Quant. Biol. 2020;8:11–19. doi: 10.1007/s40484-020-0199-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Renart J., Reiser J., Stark G.R. Transfer of proteins from gels to diazobenzyloxymethyl-paper and detection with antisera: A method for studying antibody specificity and antigen structure. Proc. Natl. Acad. Sci. USA. 1979;76:3116–3120. doi: 10.1073/pnas.76.7.3116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sanderson M.J., Smith I., Parker I., Bootman M.D. Fluorescence Microscopy. Cold Spring Harb. Protoc. 2014;2014 doi: 10.1101/pdb.top071795. pdb.top071795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Blasi T., Feller C., Feigelman J., Hasenauer J., Imhof A., Theis F.J., Becker P.B., Marr C. Combinatorial histone acetylation patterns are generated by motif-specific reactions. Cell Syst. 2016;2:49–58. doi: 10.1016/j.cels.2016.01.002. [DOI] [PubMed] [Google Scholar]
- 11.Kreutz C., Bartolome Rodriguez M.M., Maiwald T., Seidl M., Blum H.E., Mohr L., Timmer J. An error model for protein quantification. Bioinformation. 2007;23:2747–2753. doi: 10.1093/bioinformatics/btm397. [DOI] [PubMed] [Google Scholar]
- 12.Raue A., Schilling M., Bachmann J., Matteson A., Schelker M., Kaschek D., Hug S., Kreutz C., Harms B.D., Theis F.J., et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS One. 2013;8 doi: 10.1371/journal.pone.0074335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Degasperi A., Fey D., Kholodenko B.N. Performance of objective functions and optimisation procedures for parameter estimation in system biology models. NPJ Syst. Biol. Appl. 2017;3:20. doi: 10.1038/s41540-017-0023-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Weber P., Hasenauer J., Allgöwer F., Radde N. Vol. 44. 2011. Parameter estimation and identifiability of biological networks using relative data; pp. 11648–11653. (Proc. of the 18th IFAC World Congress). [Google Scholar]
- 15.Xu T.-R., Vyshemirsky V., Gormand A., von Kriegsheim A., Girolami M., Baillie G.S., Ketley D., Dunlop A.J., Milligan G., Houslay M.D., Kolch W. Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci. Signal. 2010;3:ra20. [PubMed] [Google Scholar]
- 16.Raue A., Kreutz C., Theis F.J., Timmer J. Joining forces of Bayesian and frequentist methodology: A study for inference in the presence of non-identifiability. Philos. T. Roy. Soc. A. 2013;371:20110544. doi: 10.1098/rsta.2011.0544. [DOI] [PubMed] [Google Scholar]
- 17.Hug S., Raue A., Hasenauer J., Bachmann J., Klingmüller U., Timmer J., Theis F.J. High-dimensional Bayesian parameter estimation: Case study for a model of JAK2/STAT5 signaling. Math. Biosci. 2013;246:293–304. doi: 10.1016/j.mbs.2013.04.002. [DOI] [PubMed] [Google Scholar]
- 18.Haario H., Saksman E., Tamminen J. An adaptive Metropolis algorithm. Bernoulli. 2001;7:223–242. [Google Scholar]
- 19.Graham M.M., Storkey A.J. Proc. of Conference on Uncertainty in Artificial Intelligence. 2017. Continuously tempered Hamiltonian Monte Carlo. [Google Scholar]
- 20.Hoffman M.D., Gelman A. The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014;15:1593–1623. [Google Scholar]
- 21.Łącki M.K., Miasojedow B. State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm. Stat. Comput. 2015;26:951–964. [Google Scholar]
- 22.Bellman R.E. Princeton University Press; 1961. Adaptive Control Processes. [Google Scholar]
- 23.Taylor A.N., Kitching T.D. Analytic methods for cosmological likelihoods. Mon. Not. Roy. Astron. Soc. 2010;408:865–875. [Google Scholar]
- 24.Loos C., Krause S., Hasenauer J. Hierarchical optimization for the efficient parameterization of ODE models. Bioinformation. 2018;34:4266–4273. doi: 10.1093/bioinformatics/bty514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schmiester L., Schälte Y., Fröhlich F., Hasenauer J., Weindl D. Efficient parameterization of large-scale dynamic models based on relative measurements. Bioinformation. 2020;36:594–602. doi: 10.1093/bioinformatics/btz581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bachmann J., Raue A., Schilling M., Böhm M.E., Kreutz C., Kaschek D., Busch H., Gretz N., Lehmann W.D., Timmer J., Klingmüller U. Division of labor by dual feedback regulators controls JAK2/STAT5 signaling over broad ligand range. Mol. Syst. Biol. 2011;7:516. doi: 10.1038/msb.2011.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Raimúndez E., Keller S., Zwingenberger G., Ebert K., Hug S., Theis F.J., Maier D., Luber B., Hasenauer J. Model-based analysis of response and resistance factors of cetuximab treatment in gastric cancer cell lines. PLoS Comput. Biol. 2020;16 doi: 10.1371/journal.pcbi.1007147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Maier C., Loos C., Hasenauer J. Robust parameter estimation for dynamical systems from outlier-corrupted data. Bioinformation. 2017;33:718–725. doi: 10.1093/bioinformatics/btw703. [DOI] [PubMed] [Google Scholar]
- 29.Hasenauer J., Hasenauer C., Hucho T., Theis F.J. ODE constrained mixture modelling: A method for unraveling subpopulation structures and dynamics. PLoS Comput. Biol. 2014;10 doi: 10.1371/journal.pcbi.1003686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Geweke J. Vol. 4. 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments; pp. 169–193. (Bayesian Statistics). [Google Scholar]
- 31.Henkelman G., Jónsson H. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. J. Chem. Phys. 2000;113:9978–9985. [Google Scholar]
- 32.McInnes L., Healy J., Saul N., Großberger L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018;3:861. [Google Scholar]
- 33.Schmiester L., Schälte Y., Bergmann F.T., Camba T., Dudkin E., Egert J., Fröhlich F., Fuhrmann L., Hauber A.L., Kemmer S., et al. PEtab—interoperable specification of parameter estimation problems in systems biology. PLoS Comput. Biol. 2021;17:10086466–e1008710. doi: 10.1371/journal.pcbi.1008646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Girolami M., Calderhead B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. Roy. Stat. Soc. 2011;B73:123–214. [Google Scholar]
- 35.Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison A., Antiga L., Lerer A. Automatic differentiation in PyTorch. Proc. of the 31st Conference on Neural Information Processing Systems. NIPS; 2017. [Google Scholar]
- 36.Villaverde A.F., Raimúndez E., Hasenauer J., Banga J.R. A comparison of methods for quantifying prediction uncertainty in systems biology. IFAC-PapersOnLine. 2019;52:45–51. [Google Scholar]
- 37.Fujita K.A., Toyoshima Y., Uda S., Ozaki Y.i., Kubota H., Kuroda S. Decoupling of receptor and downstream signals in the Akt pathway by its low-pass filter characteristics. Sci. Signal. 2010;3:ra56. doi: 10.1126/scisignal.2000810. [DOI] [PubMed] [Google Scholar]
- 38.Boehm M.E., Adlung L., Schilling M., Roth S., Klingmüller U., Lehmann W.D. Identification of isoform-specific dynamics in phosphorylation-dependent STAT5 dimerization by quantitative mass spectrometry and mathematical modeling. J. Proteome Res. 2014;13:5685–5694. doi: 10.1021/pr5006923. [DOI] [PubMed] [Google Scholar]
- 39.Leonhardt C., Schwake G., Stögbauer T.R., Rappl S., Kuhr J.T., Ligon T.S., Rädler J.O. Single-cell mRNA transfection studies: Delivery, kinetics and statistics by numbers. Nanomedicine. 2014;10:679–688. doi: 10.1016/j.nano.2013.11.008. [DOI] [PubMed] [Google Scholar]
- 40.Villaverde A.F., Fröhlich F., Weindl D., Hasenauer J., Banga J.R. Benchmarking optimization methods for parameter estimation in large kinetic models. Bioinformation. 2019;35:830–838. doi: 10.1093/bioinformatics/bty736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Fröhlich F., Sorger P.K. Fides: Reliable trust-region optimization for parameter estimation of ordinary differential equation models. PLoS Comput. Biol. 2022;18 doi: 10.1371/journal.pcbi.1010322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hass H., Loos C., Raimúndez-Álvarez E., Timmer J., Hasenauer J., Kreutz C. Benchmark problems for dynamic modeling of intracellular processes. Bioinformation. 2019;35:3073–3082. doi: 10.1093/bioinformatics/btz020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kreutz C. New concepts for evaluating the performance of computational methods. IFAC-PapersOnLine. 2016;49:63–70. [Google Scholar]
- 44.Ballnus B., Hug S., Hatz K., Görlitz L., Hasenauer J., Theis F.J. Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems. BMC Syst. Biol. 2017;11:63. doi: 10.1186/s12918-017-0433-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vousden W.D., Farr W.M., Mandel I. Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations. Mon. Not. Roy. Astron. Soc. 2016;455:1919–1937. [Google Scholar]
- 46.Miasojedow B., Moulines E., Vihola M. An adaptive parallel tempering algorithm. J. Comput. Graph Stat. 2013;22:649–664. [Google Scholar]
- 47.Schälte Y., Fröhlich F., Jost P.J., Vanhoefer J., Pathirana D., Stapor P., Lakrisenko P., Wang D., Raimúndez E. Merkt S.,et al. pyPESTO: A modular and scalable tool for parameter estimation for dynamic models. arXiv. 2021 doi: 10.48550/arXiv:2305.01821[q-bio.QM]. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sokal A. Functional Integration. NATO ASI Series, 361. Springer; 1997. Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
This supplemental data contains a step-to-step guide on how to obtain all marginalization cases considered in the main manuscript and additional ones.
Data Availability Statement
-
•
This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table.
-
•
All original code has been deposited at Zenodo and is publicly available as of the date of publication. DOIs are listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.