Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

Yannik Schälte; Jan Hasenauer

doi:10.1371/journal.pone.0285836

. 2023 May 22;18(5):e0285836. doi: 10.1371/journal.pone.0285836

Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

Yannik Schälte ^1,^2,³, Jan Hasenauer ^1,^2,^3,^*

Editor: Nattapol Aunsri⁴

PMCID: PMC10202307 PMID: 37216372

Abstract

Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.

1 Introduction

Mechanistic models are important tools in systems biology and many other research areas to describe and study real-world systems, allowing researchers to understand underlying mechanisms [1, 2]. Commonly, they are subject to parameters that need to be estimated by comparison of model outputs to observed data [3]. The Bayesian framework allows doing so by combining the likelihood of data and prior information on parameters. However, for complex stochastic models, e.g. used in systems biology to describe multi-cellular systems, evaluating the likelihood is often computationally infeasible [4, 5]. Therefore, likelihood-free methods such as approximate Bayesian computation (ABC) have been developed [6, 7]. Put briefly, in ABC likelihood evaluation is circumvented by simulating data, and accepting these depending on their proximity to observed data, according to a distance measure and an acceptance threshold. In this way, it generates samples from an approximation to the posterior distribution. ABC is frequently combined with a sequential Monte Carlo scheme (ABC-SMC) [8, 9], which allows gradually reducing the acceptance threshold while maintaining high acceptance rates.

ABC relies on the comparison of relevant features in simulated and observed data. [10] demonstrates superior performance of distances that adaptively weight model outputs to normalize contributions on different scales, exploiting the structure of ABC-SMC algorithms. In [11], we extend this approach to outlier-corrupted data. However, an implicit assumption of scale normalization is that all model outputs are similarly informative of the parameters. It can worsen performance, e.g. when inflating the impact of data points underlying only background noise. Therefore, it would be preferable to either only consider informative statistics, or to account for informativeness in the weighting scheme.

Especially for noise-corrupted high-dimensional data, often lower-dimensional summary statistics are employed [12]. Various methods to construct such statistics have been developed, e.g. via subset selection or auxiliary likelihoods [13, 14]. In a popular class of approaches, inverse regression models of parameters on simulated data have been used as statistics [15–17]. Here, by “inverse” we mean that the summary statistics map from (functions of) simulated data back to (functions of) the parameters, i.e. in the inverse direction to the forward mechanistic model. Such regression models can be heuristically motivated as summarizing the information in the data in a single value per parameter. In addition, [15] argue that the resulting summary statistics effectively approximate posterior means, which conserves the true posterior mean in the ABC analysis.

To evaluate proximity of regression-based statistics, e.g. Euclidean distances have been used, or weighted Euclidean distances using weights based on calibration samples [15]. However, here essentially the same problems apply that motivated the use of adaptive weighting [10], shifted from the level of data to the level of parameters, or regression approximations thereof. In fact, the approach by [10] is particularly applicable to regression-based statistics, as all outputs are informative. A further problem with regression-based statistics is that a unique inverse mapping from data to parameters may not always exist even in the noise-free limit, resulting e.g. in a multi-modal posterior distribution, or plateaus of combinations of parameter values achieving the same posterior value. This is in particular the case when the observable model outputs and data are not rich enough to allow to structurally identify the parameters globally [18].

In this work, we present two approaches combining the concepts of adaptive distances and regression models. First, we integrate summary statistics learning in an ABC-SMC framework with scale-normalizing adaptive distances. Second, the focus of this work, we employ regression models not to transform data, but in order to inform additional sensitivity weights that account for informativeness. Moreover, we discuss the problem of non-identifiability of the inverse mapping, and present a solution using augmented regression targets. On a dedicated test problem exhibiting multiple problematic features such as partly uninformative data, heterogeneous data and parameter scales, and non-identifiability, we demonstrate how both scale-normalizing distances [10], and regression-based summary statistics [15] fail to approximate the true posterior distribution. Then, we demonstrate substantially improved performance of the newly introduced approaches. We evaluate the proposed methods on further test problems, including a systems biology application example and outlier-corrupted data, demonstrating in particular robustness as well as wide applicability of the sensitivity-weighted distance.

2 Methods

2.1 Background

In this section, we give required background knowledge on the underlying methodology.

2.1.1 Approximate Bayesian computation

In Bayesian inference, the likelihood π(y_obs|θ) of observing data $y_{obs} \in R^{n_{y}}$ under model parameters $θ \in R^{n_{θ}}$ is combined with prior information π(θ), giving the posterior π(θ|y_obs) ⋅ π(y_obs|θ) ⋅ π(θ). Here and throughout this manuscript, n_var denotes the dimension of a given variable var. We assume that while numerical evaluation of π(y_obs|θ) is infeasible, the model is generative, i.e. allows to simulate data y ∼ π(y|θ). The core principle of ABC consists of three steps [6]:

Sample parameters θ ∼ π(θ).
Simulate data y ∼ π(y|θ).
Accept (θ, y) if d(y, y_obs) ≤ ε.

Here, the distance $d : R^{n_{y}} \times R^{n_{y}} \to R_{\geq 0}$ compares simulated and observed data, and ε ≥ 0 an acceptance threshold. This is repeated until sufficiently many, say N, particles have been accepted.

For high-dimensional data, the comparison is often in terms of summary statistics $s : R^{n_{y}} \to R^{n_{s}}$ , as d(s(y), s(y_obs)) ≤ ε, with $d : R^{n_{s}} \times R^{n_{s}} \to R_{\geq 0}$ and typically n_s ≪ n_y. Denoting $π (s | θ) \propto \int I [s (y) = s] π (y | θ) d y$ the intractable summary statistics likelihood with I the indicator function, and s_obs = s(y_obs), the population of accepted particles then constitutes a sample from the approximate posterior distribution

\begin{matrix} π_{ABC} (θ | s_{obs}) \propto \int I [d (s, s_{obs}) \leq ε] π (s | θ) d s \cdot π (θ), \end{matrix}

where $π_{ABC} (s_{obs} | θ) \propto \int I [d (s, s_{obs}) \leq ε] π (s | θ) d s$ can be interpreted as an approximation to the likelihood.

For ε → 0, it holds under mild assumptions that π_ABC(θ|s(y_obs)) → π(θ|s(y_obs)) ⋅ π(s(y_obs)|θ)π(θ) in an appropriate sense [19]. Compared to likelihood-based sampling, ABC introduces two approximation errors [20, Chapter 1]. First, it accepts not only particles with y = y_obs, which occur for continuous models with probability zero, but also proximate ones according to d. Second, only for sufficient statistics, π(θ|s_obs) ≡ π(θ|y_obs), is the original posterior recovered in the approximate limit ε → 0. In practice, s is however usually insufficient, only capturing essential information about y in a low-dimensional representation.

In this work, we will denote by y either the raw data, or assume it to already incorporate a mapping to (manually crafted) summary statistics. We will assess y in ABC directly via the use of weighted distance metrics, or on top of y automatically derive regression-based summary statistics s.

2.1.2 Sequential importance sampling

As the above vanilla ABC algorithm, also called ABC-Rejection, exhibits a trade-off between decreasing the acceptance threshold ε to improve the posterior approximation, and maintaining high acceptance rates, it is frequently combined with a sequential Monte Carlo (SMC) importance sampling scheme [8, 9]. In ABC-SMC, a series of particle populations $P_{t} = {(θ_{i}^{t}, y_{i}^{t}, w_{i}^{t})}_{i \leq N}$ , t = 1, …, n_t, are generated, with acceptance thresholds $ε_{1} > \dots > ε_{n_{t}}$ , targeting successively better posterior approximations. Particles for generation t are sampled from a proposal distribution g_t(θ) ≫ π(θ) based on the previous generation’s accepted particles P_t−1, e.g. via a kernel density estimate, only initially g₁(θ) = π(θ). The importance weights $w_{i}^{t}$ are the corresponding non-normalized Radon-Nikodym derivatives, w_t(θ) = π(θ)/g_t(θ).

Algorithm 1 A basic ABC-SMC algorithm.

initialize ε₁ via calibration samples, let g₁(θ) = π(θ)

for t = 1, …, n_t do

while less than N acceptances do

sample parameter θ ∼ g_t(θ)

simulate data y ∼ π(y|θ)

accept θ if d(y, y_obs) ≤ ε_t

end while

compute weights $w_{i}^{t} = \frac{π (θ_{i}^{t})}{g_{t} (θ_{i}^{t})}$ , for accepted parameters ${θ_{i}^{t}}_{i \leq N}$

normalize weights $W_{i}^{t} = w_{i}^{t} / \sum_{j} w_{j}^{t}$

update g_t+1 and ε_t+1 based on particles from generation t

end for

output: weighted samples ${(θ_{i}^{n_{t}}, W_{i}^{n_{t}})}_{i \leq N}$

The underlying ABC-SMC algorithm (Algorithm 1) used throughout this work is based on [21], using an adaptive threshold scheme based on the median of distances in the previous generation [22] and multivariate normal proposal distributions with adaptive covariance matrix [23], see [24, 25] for details. There exist various ABC-SMC sampler variants [20], e.g. in some cases different threshold schemes [26] or proposal distributions [23] may be beneficial. The distances and summary statistics presented in this work are mostly independent of the sequential sampler specifics.

2.1.3 Adaptive distances

A common choice of distance d is a weighted Minkowski distance

\begin{matrix} d (y, y_{obs}) = {∥ r \cdot (y - y_{obs}) ∥}_{p} = {(\sum_{i_{y} = 1}^{n_{y}} {| r_{i_{y}} \cdot (y_{i_{y}} - y_{obs, i_{y}}) |}^{p})}^{1 / p}, \end{matrix}

(1)

with p ≥ 1 and weights $r_{i_{y}}$ . Frequently, simply unit weights r = 1 are used (e.g. [15–17, 21]). However, model outputs can be and vary on different scales, in which case highly variable ones dominate the acceptance decision. This can be corrected for by the choice of weights $r_{i_{y}}$ in (1), commonly as inversely proportional to measures of variability,

\begin{matrix} r_{i_{y}} = 1 / σ_{i_{y}}, \end{matrix}

(2)

with $σ_{i_{y}}$ e.g. given via the median absolute deviation (MAD) from the sample median [27]. To define weights, calibration samples can be used (e.g. [7, 15]). However, [10] demonstrates that in an ABC-SMC framework, the relative variability of model outputs in later generations can differ considerably from pre-calibration. Thus, they propose an iteratively updated distance d_t, defining weights for generation t based on all samples generated in generation t−1.

In [11], we demonstrate the L2 norm used in (1) in [10] to be sensitive to data outliers, and show an L1 norm to be more robust on both outlier-corrupted and outlier-free data. To further reduce the impact of outliers, we complement MAD, as a measure of sample variability, by the median absolute deviation to the observed value, as a measure of deviation, giving a normalization term PCMAD (“perhaps use the combined median absolute deviation (CMAD) to the population median (MAD) and to the observation (MADO), or only use MAD”; see [11] for details).

2.1.4 Regression-based summary statistics

The comparison of simulations and data in ABC is often in terms of low-dimensional, informative summary statistics. The “semi-automatic ABC” approach by [15] uses the outputs of a regression model $s : R^{n_{y}} \to R^{n_{θ}}$ , predicting parameters from simulated data (Fig 1):

Fig 1 — While, given parameters θ, the mechanistic model π(y|θ) simulates data y, which are then compared to the observed data (black) via some distance d and threshold ε, we employ regression models to learn an inverse mapping s, to either construct summary statistics, or define sensitivity weights for distance calculation.

In an ABC pilot run, determine a high-density posterior region H.
Generate a population $P = {(θ_{i}, y_{i})}_{i \leq \tilde{N}} \sim π (y | θ) I [θ \in H]$ , for some $\tilde{N} \in N$ .
Train a regressor model $s : R^{n_{y}} \to R^{n_{θ}}$ , y ↦ θ, on P.
Run the actual ABC analysis using s as summary statistics.

In step 4, the distance operates on s(y). Step 1 aims to find a good training region, and can be skipped for informative priors [15]. In [17], $H = [0.5 \tilde{θ}, 2 \tilde{θ}]$ is used, around a literature value $\tilde{θ}$ , based on manual experimentation, which is in practice only applicable if reliable references exist. In [16], step 1 is omitted, using the prior directly, in one case constrained to an identifiable region. In step 3, [15] employ a linear regression (LR) model on potentially augmented data. [16] and [17] respectively use neural networks (NN) and Gaussian processes (GP) instead, aiming at a more accurate description of non-linear relationships, and further process automation. The sufficient performance of LR observed in [15] may be due to the substantial time spent in the pilot run, identifying a high-density region where a linear approximation suffices, while e.g. [16] observe a clearly better posterior approximation with NN, and [17] better model predictions with GP.

A theoretical justification of regression-based summary statistics is that the regression model serves as an approximation to the posterior mean, $s (y) \approx E_{π (θ | y)} [θ]$ , using which as statistic ensures that the ABC posterior approximation recovers the actual posterior mean as ε → 0, see [15, 16], or S1 File, Theorem 1.

2.2 Adaptive and informative regression-based distances and summary statistics

In this section, we describe the novel methods introduced in this work.

2.2.1 Integrating summary statistics learning and adaptive distances

In previous studies, the regression approach from Section 2.1.4 was used together with uniform, or on a previous run pre-calibrated, distance weights [15–17]. However, to the regression model outputs, approximating underlying parameters, the same problems apply that motivated the adaptive approach in [10]: Parameters varying on larger scales dominate the analysis without scale adjustment, with potentially changing levels of variability over ABC-SMC generations.

We propose to combine the regression-based summary statistics from Section 2.1.4 with the weight adaptation from Section 2.1.3. The regression model can be pre-trained as previously done. Here, we however suggest to increase efficiency and automation by integrating the training into the actual ABC-SMC run (Algorithm 2). We begin by using an adaptively scale-normalized distance on the full model outputs. Then, in a generation t_train ≥ 1, the regression model $s : R^{n_{y}} \to R^{n_{θ}}$ is trained on all particles ${(θ_{i}^{t_{train} - 1}, y_{i}^{t_{train} - 1})}_{i \leq \tilde{N}}$ , $\tilde{N} \geq N$ , generated in the previous generation. From t ≥ t_train onward, the regression model outputs s(y) are used as summary statistics, also here using a scale-normalized distance with iteratively adjusted weights. Like for adaptive distance weight calculation, the training samples also include rejected ones. First, this increases the training sample size, and second, it gives a representative sample from the joint distribution of data and parameters, focusing on a high-density region, but not confined to y ≈ y_obs.

The delay of regression model training until after a few generations serves to focus on a high-density region, similar to [15], such that simpler regression models provide a sufficient description. While [15] update the prior to a typical range of values observed in the pilot run, we consider the prior as part of the problem formulation, and thus do not update it. In generations t ≥ t_train the proposal distributions g_t will usually anyway mostly suggest values within the training domain range.

Algorithm 2 ABC-SMC algorithm with regression-based summary statistics or sensitivity-weighted distances.

initialize ε₁, $σ_{i_{y}}^{1}$ via calibration samples, let g₁(θ) = π(θ)

for t = 1, …, n_t do

while less than N acceptances do

sample parameter θ ∼ g_t(θ)

simulate data y ∼ π(y|θ)

if t < t_train then

accept θ if d_t(y, y_obs) ≤ ε_t, where d_t uses scale weights $r_{i_{y}}^{t} = 1 / σ_{i_{y}}^{t}$

else if s is used as summary statistics then

accept θ if d_t(s(y), s(y_obs)) ≤ ε_t, where d_t uses scale weights $r_{i_{s}}^{t} = 1 / σ_{i_{s}}^{t}$

else if using s only to define sensitivity weights $q_{i_{y}}$ then

accept if d_t(y, y_obs) ≤ ε_t, where d_t uses scale and sensitivity weights $r_{i_{y}}^{t} = q_{i_{y}}^{t} / σ_{i_{y}}^{t}$

end if

end while

compute importance weights $w_{i}^{t} = \frac{π (θ_{i}^{t})}{g_{t} (θ_{i}^{t})}$ , for accepted parameters ${θ_{i}^{t}}_{i \leq N}$ normalize importance weights $W_{i}^{t} = w_{i}^{t} / \sum_{j} w_{j}^{t}$

if t + 1 == t_train then

train regression model s on all particles from generation t

if using s to weight model outputs then

define sensitivity weights $q_{1}, \dots, q_{n_{y}}$ via s

end if

update g_t+1 and ε_t+1 based on particles from generation t

update inverse scale weights $σ_{i_{y}}^{t + 1}$ or $σ_{i_{s}}^{t + 1}$ based on all particles from generation t

end for

output: weighted samples ${(θ_{i}^{n_{t}}, W_{i}^{n_{t}})}_{i \leq N}$

2.2.2 Regression-based sensitivity weights

The adaptive scale-normalized distance approach from [10, 11] is, operating on the full data without summary statistics, not ideal if data points are not similarly informative. The regression approach from Section 2.1.4 is one solution to focus on informative statistics. However, it performs a complex transformation of the model outputs, which can hinder interpretation, and perform badly if the regression model is inaccurate. In this section, we present an alternative approach, using the regression model to inform additional weights on the full data, instead of constructing summary statistics. The idea is to weight a data point by how informative it is of underlying parameters. We quantify informativeness via the sensitivity of how much the posterior expectation of parameters, or transformations thereof, given observed data y_obs, would vary under data perturbations. As in Section 2.2.1, we use a regression model to describe the inverse mapping from data to parameters.

Specifically, before a generation t_train, we train a regression model $s : R^{n_{y}} \to R^{n_{θ}}$ on samples from the previous generation. As regression model inputs, we use normalized simulations $y / σ_{t_{train}}$ , with $σ_{t_{train}}$ the measure of scale used for distance scale normalization, e.g. MAD. Further, we z-score normalize regression model targets θ, in order to render the model scale-independent. Then, we calculate the sensitivity matrix

\begin{matrix} S = \nabla_{y} s (y_{obs}) \in R^{n_{y} \times n_{θ}} \end{matrix}

(3)

at the observed data. To robustly approximate derivatives, we employ central finite differences with automatic step size control [28]. We define the sensitivity weight of model output i_y as

\begin{matrix} q_{i_{y}} = \sum_{i_{θ} = 1}^{n_{θ}} \frac{| S_{i_{y}, i_{θ}} |}{\sum_{j_{y} = 1}^{n_{y}} | S_{j_{y}, i_{θ}} |}, \end{matrix}

(4)

i.e. as the sum over the absolute sensitivities of all parameters with respect to the model output, normalized per parameter to level their impact. The normalization can be omitted, but yields more conservative weights, accounting for the fact that the regression model may be inaccurate, by more evenly distributed weights when all sensitivities with respect to some parameters are small. Here, $S_{i_{y}, i_{θ}}$ denotes the sensitivity of the i_θth output with respect to the i_yth input.

The final weight used in the distance (1) is then given as the product of scale weight (2) and sensitivity weight (4),

\begin{matrix} r_{i_{y}} = q_{i_{y}} / σ_{i_{y}}, \end{matrix}

(5)

with here $σ_{i_{y}}$ e.g. again given via MAD, or, also taking bias into account, PCMAD. This separate treatment of scale and sensitivity weights allows to e.g. include the error correction from [11] in the scale correction, but not in the normalized data used for regression model training, which would lead to inversely re-scaled sensitivities. Thus, we can simultaneously account for informativeness and outliers. As long as $r_{i_{y}} \neq 0$ for all weights, the original posterior π(θ|y_obs) can be conceptually recovered for ε → 0 [10, 19], i.e. no information is lost, unlike for insufficient summary statistics, while practical convergence is clearly weight-dependent.

2.2.3 Optimal summary statistics to recover distribution features

A problem with inverse regression models of parameters on data is that such a mapping may not exist. For example, consider a quadratic model $y \sim N (θ^{2}, 0 . 1^{2})$ , with prior θ ∼ U[−1, 1], and observed data y_obs = 0.7. Given the data, the underlying parameter cannot be uniquely identified. As an inverse mapping y ↦ θ does not exist globally, a regression model s: y ↦ θ cannot extract a meaningful relationship. Indeed, the problem is symmetric in θ, such that the posterior mean is $E_{π (θ | y)} [θ] = 0$ , using which as summary statistic as in [12, 15] would clearly recover the true posterior mean. However, it would fail to describe the posterior shape at all. More generally, an informative inverse mapping from data back to parameters can only exist when the forward model θ ↦ π(⋅|θ) is injective, i.e. the model is structurally identifiable in the limit of infinite data [29, 30]. We would like to remark that this lack of identifiability of true parameters given data also stands in the way of theoretical asymptotic results obtained for ABC methods regarding convergence of point estimators in the large-data limit in [31] (Condition 4.(ii)) and limiting shape of posteriors and posterior means in [32] (Assumption 3.(ii)).

In general, non-identifiability can be tackled in various ways, e.g. by fixing some parameters to constant values, reparameterization, or by generating additional data to break the symmetry [18]. However, this requires an in-depth and iterative model analysis. Even for deterministic ordinary differential equation based models, this is already a challenging task [33], while ABC is commonly applied to highly complex stochastic models. Further, one ends up with an altered inference problem, which renders the analysis of prediction uncertainties more involved [33].

Here we propose an approach that is able to work with the original, unaltered, problem formulation, including potential non-identifiabilities. We solve the problem by considering transformations λ(θ) of the parameters, e.g. higher-order moments s: y ↦ λ(θ) = (θ¹, …, θ^k), which may be better described as functions of the data, or identifiable in the first place. In the above example, it suffices to consider θ², giving a linear mapping y ∼ θ² and breaking the symmetry. While the use of parameter transformations as regression model targets is heuristically reasonable, their use can be theoretically further justified: Employing as summary statistics posterior expectations of transformations of the parameters,

\begin{matrix} s (y) = E_{π (θ | y)} [λ (θ)], \end{matrix}

allows under mild assumptions to recover the corresponding posterior expectations for ε → 0,

\begin{matrix} lim_{ε \to 0} E_{π_{ABC, ε}} [λ (Θ) | s (y_{obs})] = E [λ (Θ) | Y = y_{obs}], \end{matrix}

see Theorem 1 in S1 File for details.

Obviously, conditional posterior expectations are hardly available in practice. However, we may interpret the above regression-based summary statistics as approximations, aiming at a sufficiently accurate description of the underlying expectations by the regression model. Thus, we propose to use λ(θ) as targets, both for summary statistics (Section 2.2.1), and sensitivity weights (Section 2.2.2).

We would like to remark that here we augment the regression targets θ, while e.g. [12, 15] instead augment the regressors y, e.g. by higher-order moments y ↦ (y¹, …, y⁴). Their approach allows to employ simple regression models, such as linear ones, on an increased feature space, which can be guided by model analysis and prior experimentation. However, it can conceptually not solve the problem tackled by our approach, that some parameters cannot be described as functions of the data (or transformations thereof). Of course, it would also be possible to augment regressors and regression targets simultaneously.

2.3 Implementation

We implemented all presented methods in the open-source Python package pyABC (https://github.com/icb-dcm/pyabc) [25], interfacing particularly scikit-learn regression models [34]. The code underlying the application study is on GitHub (https://github.com/yannikschaelte/study_abc_slad), a snapshot of code and data on Zenodo (http://doi.org/10.5281/zenodo.5522919).

3 Results

We evaluated the performance of the proposed methods on various test problems.

3.1 Distances and summary statistics

As distance to compare model outputs or summary statistics, we considered, given its robust performance in [11], an L1 norm, with adaptive MAD weights when employing scale-normalization (denoted “Ada.+MAD”). Acceptance in generation t was only based on d_t, but not previous acceptance criteria, for ease of implementation and as for an L1 norm no substantial differences were observed in [11].

As regression models, we considered LR and NN. We trained the regression model after 40% of the simulation budget. For comparison, we also considered training the regression model before the initial generation, t_train = 1, based on samples from the prior (“Init”). NN models were considered with a single hidden layer of dimension [(n_y + n_θ)/2], with ReLU activation function, using ADAM stochastic gradient descent for optimization, and early stopping to avoid overfitting, with a 10% validation set. Both regression models were computationally efficient compared to the full ABC-SMC analyses, with run-times on the order of milliseconds (LR) or few seconds (NN).

When employing parameter augmentation (Section 2.2.3), we used the first four moments, λ(θ) = (θ¹, …, θ⁴) (“P4”). We considered both regression to define summary statistics (“Stat”, Section 2.2.1) and sensitivity weights (“Sensi”, Section 2.2.2). We used a single regression model to learn the full vector-valued mapping y ↦ λ(θ), allowing to leverage synergies, as opposed to learning a separate model per regression target.

For example, L1+Ada.+MAD+StatNN denotes an analysis using consistently an adaptive distance with MAD normalized weights, and using a neural network to construct summary statistics after 40% of the total simulation budget, with regression targets λ(θ) = θ. L1+Ada.+MAD+SensiLR+P4 uses an adaptive distance with scale-normalizing weights via MAD, and a linear model to define further sensitivity weights, with regression targets λ(θ) = (θ¹, …, θ⁴), and L1+StatLR uses a linear model for summary statistics construction, but uses uniform distance weights.

3.2 Performance on dedicated demonstration problem

To illustrate the different problems addressed in this work, we constructed a demonstration problem with four parameters and five types of data, constituting a joint data vector $y = (y_{1}, \dots, y_{5}) \in R^{17}$ :

$y_{1} \sim N (θ_{1}, 0 . 1^{2})$ is informative of θ₁, with a relatively wide prior θ₁ ∼ U[−7, 7],
$y_{2} \sim N (θ_{2}, 100^{2})$ is informative of θ₂, with prior θ₂ ∼ U[−700, 700],
$y_{3} \sim N {(θ_{3}, 4 \cdot 100^{2})}^{\otimes 4} \in R^{4}$ is informative of θ₃, with prior θ₃ ∼ U[−700, 700],
$y_{4} \sim N (θ_{4}^{2}, 0 . 1^{2})$ is informative of θ₄, with prior θ₄ ∼ U[−1, 1], but quadratic in the parameter,
$y_{5} \sim N {(0, 10)}^{\otimes 10} \in R^{10}$ is uninformative.

The model dynamics are purposely simple, such that inverse mappings can be captured easily. The problem exhibits the following potentially problematic features:

A substantial part of the data, y₅, is uninformative, such that approaches ignoring data informativeness may converge slower.
Both data and parameters are on different scales, such that approaches comparing data, or, via regression-based summary statistics, parameters, without normalization focus on large-scale variables. Further, e.g. the prior of θ₁ is relatively wide, preventing pre-calibration.
y₄ is quadratic in θ₄, such that first-order regression models cannot capture a meaningful relationship.
While y₂, y₃ are such that the posteriors of θ₂, θ₃ are identical, in solely scale-normalized approaches, the impact of y₄ on the distance value is roughly four times as high as that of y₃, resulting in uneven convergence.

We studied the demonstration problem with synthetic data y_obs,1, y_obs,2, y_obs,3, y_obs,5 ≡ 0, y_obs,4 = 0.7, using a population size of N = 4e3 with a total budget of 1e6 simulations per run. Marginal posterior approximations obtained using selected distances and summary statistics are shown in Fig 2.

Fig 2 — ABC marginal posterior approximations obtained using regression-based summary statistics (top, “Stat”) or sensitivity weights (bottom, “Sensi”) on the demonstration problem, using an underlying L1 norm, uniformly weighted, or MAD scale-normalized distance weights (“Ada.+MAD”), using a linear (“LR”) or a neural network (“NN”) regression model, and in some cases augmented regression targets θ¹, …, θ⁴ (“P4”).

Solely scale-normalized distances without informativeness assessment converge slowly

The scale-normalized adaptive distance L1+Ada.+MAD correctly captured all posterior modes and shapes, in particular the bi-modality of θ₄, however with large variances, because the uninformative model outputs y₅ were considered on the same scale as the informative ones (Fig 2 bottom). Further, while the true posteriors of θ₂ and θ₃ coincide, L1+Ada.+MAD assigned a substantially wider variance to θ₂, as only a single model output, y₂, is informative of it, while four are of θ₃, all on the same normalized scale.

Non-scale-normalized distances converge unevenly

The analyses L1+Stat{LR/NN} without scale normalization described the posteriors of θ₂ and θ₃ accurately, which are on the same scale, however yielded substantially wider variances for θ₁ (Fig 2 top), because θ₁, used as regression target, varies on a smaller scale. In contrast, all analyses employing scale normalization described θ₁, θ₂, and θ₃ roughly or almost similarly well, with the exception of L1+Ada.+MAD, as outlined above.

Regression models not accounting for non-identifiability cannot capture posterior

All analyses employing regression models but using the non-augmented regression targets λ(θ) = θ failed to describe the bi-modal distribution of θ₄, because a global mapping y₄ ↦ θ₄ does not exist. In comparison, analyses considering higher-order regression targets (“P4”) captured the bi-modality, as for this problem a linear mapping $θ_{4}^{2} \sim y_{4}$ exists, or a quadratic one $θ_{4}^{4} \sim y_{4}^{2}$ .

Novel approaches fit all parameters well

The analyses L1+Ada.+MAD+{Stat{LR/NN}/Sensi{LR/NN}}+P4 combining all methods introduced in this work, i.e. scale normalization, informativeness assessment via regression-based summary statistics or sensitivity weights, and regression target augmentation, provided the overall best description of all posterior marginals, with roughly homogeneously small variances. Advantages of NN over LR were not observed.

Estimates for θ₃ were with L1+Ada.+MAD+Sensi{LR/NN} consistently slightly worse than with L1+Ada.+MAD+Stat{LR/NN}. This can be explained by the latter approaches employing a one-dimensional interpolation of $y_{3} \in R^{4}$ , and thus e.g. an approximation of the sufficient statistic $\frac{1}{4} \sum_{i = 1}^{4} y_{3, i}$ . Meanwhile, approaches that do not transform but only weight, are more subject to random noise. This illustrates that when low-dimensional sufficient statistics exist and are accurately captured, employing explicit dimension reduction can be superior to mere re-weighting.

Sensitivity weights permit further insights

In Fig 3, normalized absolute sensitivities (4) of parameters with respect to model outputs are visualized. Overall, both regression models captured the relationship of model outputs and parameters well, and assigned large, albeit not completely homogeneous, sensitivity weights to y₁, …, y₄, and lower ones to y₅, with roughly $q_{1} \approx q_{2} \approx \sum_{i = 1}^{4} q_{3, i}$ . The description provided by NN was overall slightly better than LR, assigning lower weights to y₅, and capturing the non-linear mappings $θ_{1}^{2} \sim y_{1}$ and $θ_{1}^{4} \sim y_{1}$ better. As seen above, LR nevertheless sufficed to yield good posterior approximations. Sensitivities of $θ_{4}^{1}$ and $θ_{4}^{3}$ were, as expected, comparably small with respect to all variables.

The weight assigned to y₄ was roughly half the ones assigned to y₁, y₂ and y₃, because $θ_{4}^{1}$ and $θ_{4}^{3}$ could not be accurately described. Correspondingly, the variance of θ₄ was slightly wider under sensitivity-weighted analyses, compared to using summary statistics (Fig 2). This could be improved by not employing parameter-wise normalization in (4), which however makes the analysis less robust to regression model misspecification, or by an alternative normalization.

An analysis such as performed here may generally allow evaluating regression model plausibility, and to obtain insights into parameter-data relationships, e.g. eliciting uninformative data.

3.3 Performance on general test problems

To evaluate robustness and general performance of the proposed methods, we next considered six test problems T1–6, not tailored to the challenges discussed in Section 3.2. Core model properties as well as employed ABC-SMC population sizes N and total budgets of numbers of simulations are given in Table 1.

Table 1. Test model properties: Identifier, short description, number of parameters n_θ and data points n_y, population size N and maximum number of model simulations after which an analysis was terminated.

ID	Description	n _θ	n _y	N	Max. sim.
T1	Conversion reaction ODE model	2	10	1000	250000
T2	One informative and one uninformative variable	1	2	1000	25000
T3	g-and-k distribution order statistics, small	4	7	1000	250000
T4	Lotka-Volterra Markov jump process model, small	3	32	500	125000
T5	g-and-k distribution order statistics, large	4	100	1000	250000
T6	Lotka-Volterra Markov jump process model, large	3	200	500	125000

Open in a new tab

T1, T3, and T4 are problems M3, M4, and M5 from [11], respectively an ODE model of a conversion reaction, and, based on application examples in [10], g-and-k distribution samples, and a Markov jump process model of a Lotka-Volterra predator-prey process. T2 consists of two observables, thereof $y_{1} \sim N (θ, 0 . 1^{2})$ informative and $y_{2} \sim N (0, 1^{2})$ uninformative, with wide prior $θ \sim N (0, 100^{2})$ , also from [10]. T5 and T6 are variations of T3 and T4 with higher-dimensional data, based on application examples in [15]. T5 employs 100 order statistics out of 10,000 samples from a g-and-k distribution, with U[0, 10] priors on the four parameters A, B, g, k, considering ground truth values (A, B, g, k) = (3, 1, 2, 0.5). T6 employs noise-free observations of predators and prey at 200 evenly-spaced time-points over the interval [0, 20], estimating the three reaction rate coefficients on linear scale, considering tight independent priors θ₁ ∼ U[0, 2], θ₂ ∼ U[0, 0.1], θ₃ ∼ U[0, 1], and ground truth values (θ₁, θ₂, θ₃) = (0.5, 0.0025, 0.3). We ran 10 repetitions of different inference scenarios on problems T1–6 on different data sets. To measure fit quality, we reported root mean square errors (RMSE) of the weighted posterior samples from the last ABC-SMC generation, with respect to ground truth parameters (note all problem considered here are uni-modal). The results are visualized in Fig 4.

Fig 4 — As regression models we considered linear regression (“LR”) and neural networks (“NN”), both to define summary statistics (“Stat”) and sensitivity weights (“Sensi”). Some inference settings further used parameter transformations λ(θ) = (θ¹, …, θ⁴) as regression targets (“P4”). In some settings, the regression model was trained before the initial generation (“Init”), or after 40% of the simulation budget if unspecified. The first row contains solely scale-normalized L1+Ada.+MAD as a reference, followed by two blocks of four rows using summary statistics, using firstly LR and secondly NN, and then by two blocks of three using sensitivity weights, using firstly LR and secondly NN. Reported values are medians over 10 replicates, with horizontal grey error lines indicating MAD.

Delay of regression model training advantageous on complex models

For the considered LR and NN models, regression model training on prior samples (“Init”) gave for most problems substantially worse results than when trained after 40% of the simulation budget. One reason may be that only N prior samples were used for training, compared to potentially more samples, including rejected ones, in later generations. However, also when using only N training samples in the later-trained approach (not shown here), results were better than based on the prior. Thus, an explanation is that after multiple generations the bulk of samples is restricted to a high-density region, in which a simpler model is sufficiently accurate. This justifies empirically the approach by [15] of using a pilot run to constrain parameters. [16], who base their regression model on the prior, use firstly more complex NN models, and secondly up to 1e6 training samples, far more than entire analyses here. An exception was T2, where sometimes initial training improved performance. This can be explained by the global linear parameters-data mapping, such that accurate regression models can be easily learned and thereafter be beneficial.

Scale normalization improves performance for regression-based summary statistics

As the comparison of L1+Stat{LR/NN} and L1+Ada.+MAD+Stat{LR/NN} shows, the use of scale normalization improved performance for many problems, particularly for T5+6, while it was roughly similar for T1. An exception was again T2, where in fact a uniformly weighted L1 distance would be preferable over L1+Ada.+MAD at least in the first generations, as the uninformative observable happens to vary less there. It should further be remarked that overall, L1+Ada.+MAD performed quite robustly across all problems, including high-dimensional problems T5+6, and was consistently beaten by the methods accounting for informativeness only on T2. This again highlights the importance of scale normalization, and suggests that on these problems there were no overall uninformative data points (except T2) confounding the analysis.

Sensitivity-weighted distances perform highly robustly

The approaches L1+Ada.+MAD+Sensi{LR/NN}(+P4) using regression models to define sensitivity weights performed reliably, with RMSE values generally not far higher, but in some cases consistently lower, than those obtained by L1+Ada.+MAD. This indicates that, while the sensitivity weighting could in those cases not improve performance, as sole scale normalization was efficient already, the approach is highly robust. In some cases, specifically T2, which had one clearly uninformative statistic, and arguably T5, which is a high-dimensional collection of order statistics, did the sensitivity weighting improve performance. In other cases, specifically T1, T3, T4, and T6, RMSE values for some parameters decreased, but slightly increased for others, indicating that the weighting scheme re-prioritized data points, while no overall uninformative ones could be disregarded.

Regression-based summary statistics can be superior but also less robust

In various cases, e.g. when trained in the initial generation, and consistently for T4, as well as using LR on T6, summary statistics were inferior to both L1+Ada.+MAD and sensitivity weights. Arguably, in those cases the regression model was not accurate enough to allow using its outputs as low-dimensional summaries. However, in some cases, specifically for T1, and two parameters of T6 using NN, RMSE values obtained using summary statistics were smaller than with both L1+Ada.+MAD and sensitivity weights. This again indicates that if the lower-dimensional summary statistics representation is accurate and informative of the parameters, then its use can be beneficial and superior to mere re-weighting.

No clear preference for regression model or target augmentation

For both regression-based summary statistics and sensitivity weights, we found overall no clear preference for LR or NN, with LR more robust in many cases, but NN clearly preferable in some. Further, the use of augmented parameters as regression targets did not substantially worsen, but also not notably improve performance for any test problem, however performed inferior e.g. on T2, which has a clear linear mapping, such that the consideration of higher-order moments may have complicated the inference. This indicates that using augmented parameters as regression targets is robust, but if further information is available, a restriction to e.g. first or second order may be beneficial.

3.4 Performance on application example

Next, we considered an agent-based model of tumor spheroid growth (model M6 from [11]), considering both outlier-free and outlier-corrupted data. We employed the same simulated data as in [11], a population size of N = 500, and a computational budget of 150,000 simulations per analysis. Given its computational complexity with ABC-SMC wall times on the order of several hours even with parallelization on hundreds of CPU cores, we had to restrict our analysis of this problem to only a few selected approaches: Besides the reference L1+Ada.+PCMAD, we employed, given the robust performance of LR before, L1+Ada.+PCMAD+SensiLR(+P4) using sensitivity weights, L1+Ada.+PCMAD+StatLR(+P4) using summary statistics, both with and without augmented regression targets λ(θ) = (θ¹, …, θ⁴), further L1+Ada.+PCMAD+StatNN and L1+Ada.+PCMAD+SensiNN+P4 using NN. Here, to facilitate outlier detection, we used PCMAD instead of MAD.

Sensitivity weights identify uninformative model outputs

Using regression models to define sensitivity weights improved performance on the tumor model with outlier-free data over L1+Ada.+PCMAD, giving lower variances for the division rate and depth parameters, with otherwise similar results (Fig 5 top), and accepted simulations closely matching the observed data (Fig 6 top, simulations). No differences could be observed between using only the parameters, or also higher-order moments, as regression targets.

Fig 5 — Without (top) and with (bottom) outliers. Ground truth parameters are indicated by vertical grey dotted lines. Plot boundaries are the employed uniform prior ranges, except the ECM production rate is zoomed in for visibility.

Fig 6 — The respective upper rows show the observed data (black), and, for each approach, 20 accepted simulated data sets (light lines) as well as the sample means (darker lines) from the last ABC-SMC generation. The respective middle rows show the scale weights assigned to each data point in the last generation, normalized to unit sum, and the bottom rows the sensitivity weights, respectively only for distances employing such weights, and operating on the full data.

On this problem, regression-based summary statistics performed substantially worse, which may indicate that the employed regression models did not provide a sufficiently informative low-dimensional representation (L1+Ada.+PCMAD+StatLR(+P4), Fig 5 top), simulations did visibly not match the observed data (Fig 6 top, 1st row).

The overall structure of sensitivity weights assigned via LR with and without parameter augmentation, as well as NN, was roughly consistent across multiple runs (Fig 6 top, 3rd row). Low weights were assigned to the fraction of proliferating cells at large distances to the rim, indicating these to be uninformative, and counteracting the large weights resulting from scale normalization (Fig 6 top, 2nd row). While the sensitivity weights exhibit some variability between adjacent points and across runs, consistent and reasonable overall patterns can be observed.

Robust on outlier-corrupted data

Using sensitivity weights improved performance also on outlier-corrupted data (Fig 5 bottom). Given its previously good performance, here we only considered L1+Ada.+PCMAD+SensiLR+P4. Accepted simulations in the final generation matched the observed data more closely than for L1+Ada.+PCMAD (Fig 6 bottom, 1st row). The PCMAD scheme assigned low weights to outliers, independent of the regression-based sensitivity weights (Fig 6 bottom, 2nd and 3rd row). Thus, the combination of both methods allowed to simultaneously account for outliers and informativeness. For an in-depth study of adaptive scale normalization in the presence of data outliers, see [11].

4 Discussion

In this work, we discussed problems arising in ABC (1) from partly uninformative data for scale-normalized distances, (2) from heterogeneous parameter scales for regression-based summary statistics, and (3) from parameter non-identifiability for regression model adequacy. To tackle these problems, we presented multiple solutions: First, we suggested employing adaptive scale-normalizing distances on top of regression-based summary statistics, to homogenize the impact of parameters. Second, as an alternative to the first solution, we introduced novel sensitivity weights derived from regression models, measuring the informativeness of data on parameters. Third, we introduced augmented regression targets to overcome parameter non-identifiability.

We showed substantial improvements of the novel methods over established approaches on a simple demonstration problem. For the sensitivity-weighted distances, we showed robust performance on various further test problems, in particular on a complex systems biological application problem. Yet, there are numerous ways in which the presented methods can be improved:

While simple linear models often sufficed, especially when trained on a high-density region, in some cases more complex models were superior, indicating the presence of non-linear data-parameter relationships. A systematic investigation of alternative and more complex regression model types, e.g. neural networks tailored to the respective data types, as well as model selection among competing regression models, would be useful. To select among competing model candidates, an efficient yet robust model selection scheme with out-of-sample evaluation, e.g. based on cross validation, might be useful. The employed regression model should reflect and exploit any symmetries or structures in the data. For example, for image and sequence data convolutional and recurrent networks would be prime candidates, for i.i.d. data permutation invariant networks [35]. More complex regression models and robust estimators require larger training sets (which could be mitigated by transfer learning). While increasing the training set is straightforward, as it only requires continued sampling from the forward model, there is a cost trade-off of the actual ABC inference and regression model training, which remains to be investigated.

While in many cases delaying regression model training to later generations and a high-density region was advantageous, for simple models we observed benefits of early regression. Besides training prior to the first generation, we only considered training the regression model after 40% of the simulation budget, based on the heuristic that the ABC-SMC algorithm should have had time to converge to a higher posterior density region, while still leaving a substantial simulation budget to leverage the learned mapping. However, in practice good training times will clearly be problem-specific. Criteria on if and when to train or update regression models, also repeatedly, would be of interest.

While augmenting regression targets was essential to render inverse regression model based approaches applicable to non-identifiable forward models, one could further study the ramifications of the increased summary statistics dimension, as well as methods to automatically adjust the set of applied transformations. In particular, for multivariate problems it may be necessary to also consider combinations of parameters to render the inverse problem identifiable. While we showed that our approach also performed robustly in the presence of outliers for a given model, its applicability in general under model misspecification could be studied further [36, 37].

It should be noted that in calculating sensitivity weights we make the assumption that the employed regression model is (sub-)differentiable, which clearly holds for the regression models employed in this work (see also S1 File, Section 3). Further, we implicitly assume that the employed smooth regression model provides an accurate approximation of the underlying expectations it aims to describe. In practice, it should be carefully evaluated whether such an approximation appears valid for the given forward model. In the first place, we use the derived sensitivity matrix as a “heuristic” to quantify informativeness for weighting. However, it may fail in this function as a heuristic if the underlying quantities are inadequately captured. In particular, it provides a biased approximation of the derivative it intends to approximate when in the underlying expectations integration and differentiation cannot be exchanged in finite samples. Studying the consequences of such conceptual and practical discrepancies as well as investigating alternative measures of sensitivity could improve robustness and applicability of the method.

This work may be regarded as an extension of the approaches of [10, 11] as well as [15]. An alternative weighting scheme is presented by [38], who maximize a distance between samples from the prior and the posterior approximation. While using a different notion of informativeness and a specific underlying sampler, a comparison in terms of efficiency, robustness to outliers, and information gain would be of interest. A core problem with model selection in ABC is that insufficient summary statistics can lead to inconsistent Bayes factors [39]. Thus, approaches such as our adaptive weights accounting for scale and informativeness, or also [38, 40], hold the potential to facilitate unbiased ABC model selection even on larger data sets.

We have shown here that adaptive distance metrics accounting for both scale and informativeness facilitate an efficient analysis. We envisage that they may prove valuable also in combination with many orthogonal methods in an ABC-SMC framework. Promising appear for example a combination with optimal thresholding schemes [26], as well as a combination with multifidelity approaches using surrogates to even further reduce the number of evaluations of expensive simulators [41].

All methods presented in this work have been implemented in the Python package pyABC, facilitating their straightforward application. We anticipate that such approaches, which automatically normalize and extract or weight features of interest without extensive manual tuning, will substantially improve performance of ABC methods on a wide range of applications problems.

Supporting information

S1 File. Contains all the supporting text and figures.

(PDF)

Click here for additional data file.^{(348KB, pdf)}

Acknowledgments

We thank Dennis Prangle and Marc Vaisband for fruitful discussions, Emad Alamoudi for help with the HPC setup, and acknowledge the Gauss Centre for Supercomputing for providing computing time on the GCS Supercomputer JUWELS [42] at Jülich Supercomputing Centre.

Data Availability

The code underlying the application study is on GitHub (https://github.com/yannikschaelte/study_abc_slad), a snapshot of code and data on Zenodo (http://doi.org/10.5281/zenodo.5522919). All methods developed in this contribution have been implemented, tested and documented in the open-source Python package pyABC (https://github.com/icb-dcm/pyabc).

Funding Statement

This work was supported by the German Federal Ministry of Education and Research (BMBF) (FitMultiCell/031L0159 and EMUNE/031L0293) and the German Research Foundation (DFG) under Germany’s Excellence Strategy (EXC 2047 390873048 and EXC 2151 390685813) and a Schlegel Professorship for JH. YS acknowledges support by the Joachim Herz Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Gershenfeld NA, Gershenfeld N. The nature of mathematical modeling. Cambridge university press; 1999. [Google Scholar]
2. Kitano H. Systems Biology: A Brief Overview. Science. 2002;295(5560):1662–1664. doi: 10.1126/science.1069492 [DOI] [PubMed] [Google Scholar]
3.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM; 2005.
4. Tavaré S, Balding DJ, Griffiths RC, Donnelly P. Inferring coalescence times from DNA sequence data. Genetics. 1997;145(2):505–518. doi: 10.1093/genetics/145.2.505 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Hasenauer J, Jagiella N, Hross S, Theis FJ. Data-Driven Modelling of Biological Multi-Scale Processes. J Coupled Syst Multiscale Dyn. 2015;3(2):101–121. doi: 10.1166/jcsmd.2015.1069 [DOI] [Google Scholar]
6. Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol. 1999;16(12):1791–1798. doi: 10.1093/oxfordjournals.molbev.a026091 [DOI] [PubMed] [Google Scholar]
7. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian Computation in Population Genetics. Genetics. 2002;162(4):2025–2035. doi: 10.1093/genetics/162.4.2025 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci. 2007;104(6):1760–1765. doi: 10.1073/pnas.0607208104 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Del Moral P, Doucet A, Jasra A. Sequential Monte Carlo samplers. J R Stat Soc B. 2006;68(3):411–436. doi: 10.1111/j.1467-9868.2006.00553.x [DOI] [Google Scholar]
10. Prangle D. Adapting the ABC Distance Function. Bayesian Analysis. 2017;12(1):289–309. doi: 10.1214/16-BA1002 [DOI] [Google Scholar]
11. Schälte Y, Alamoudi E, Hasenauer J. Robust adaptive distance functions for approximate Bayesian inference on outlier-corrupted data. bioRxiv. 2021;. [Google Scholar]
12. Blum MG, Nunes MA, Prangle D, Sisson SA. A comparative review of dimension reduction methods in approximate Bayesian computation. Stat Sci. 2013;28(2):189–208. doi: 10.1214/12-STS406 [DOI] [Google Scholar]
13. Nunes MA, Balding DJ. On Optimal Selection of Summary Statistics for Approximate Bayesian Computation. Stat Appl Genet Mol. 2010;9(1). [DOI] [PubMed] [Google Scholar]
14. Drovandi CC, Pettitt AN, Faddy MJ. Approximate Bayesian computation using indirect inference. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2011;60(3):317–337. [Google Scholar]
15. Fearnhead P, Prangle D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc B. 2012;74(3):419–474. doi: 10.1111/j.1467-9868.2011.01010.x [DOI] [Google Scholar]
16. Jiang B, Wu Ty, Zheng C, Wong WH. Learning summary statistic for approximate Bayesian computation via deep neural network. Statistica Sinica. 2017; p. 1595–1618. [Google Scholar]
17. Borowska A, Giurghita D, Husmeier D. Gaussian process enhanced semi-automatic approximate Bayesian computation: parameter inference in a stochastic differential equation system for chemotaxis. Journal of Computational Physics. 2021;429:109999. doi: 10.1016/j.jcp.2020.109999 [DOI] [Google Scholar]
18. Wieland FG, Hauber AL, Rosenblatt M, Tönsing C, Timmer J. On structural and practical identifiability. Current Opinion in Systems Biology. 2021;25:60–69. doi: 10.1016/j.coisb.2021.03.005 [DOI] [Google Scholar]
19. Barber S, Voss J, Webster M. The rate of convergence for approximate Bayesian computation. Electronic Journal of Statistics. 2015;9(1):80–105. doi: 10.1214/15-EJS988 [DOI] [Google Scholar]
20. Sisson SA, Fan Y, Beaumont M. Handbook of approximate Bayesian computation. Chapman and Hall/CRC; 2018. [Google Scholar]
21. Toni T, Stumpf MPH. Simulation-based model selection for dynamical systems in systems and population biology. Bioinf. 2010;26(1):104–110. doi: 10.1093/bioinformatics/btp619 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Drovandi CC, Pettitt AN. Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics. 2011;67(1):225–233. doi: 10.1111/j.1541-0420.2010.01410.x [DOI] [PubMed] [Google Scholar]
23. Filippi S, Barnes CP, Cornebise J, Stumpf MP. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat Appl Genet Mol. 2013;12(1):87–107. [DOI] [PubMed] [Google Scholar]
24.Klinger E, Hasenauer J. A scheme for adaptive selection of population sizes in Approximate Bayesian Computation—Sequential Monte Carlo. In: Feret J, Koeppl H, editors. Computational Methods in Systems Biology. CMSB 2017. vol. 10545 of Lecture Notes in Computer Science. Springer, Cham; 2017. p. 128–144.
25. Klinger E, Rickert D, Hasenauer J. pyABC: distributed, likelihood-free inference. Bioinf. 2018;34(20):3591–3593. [DOI] [PubMed] [Google Scholar]
26. Silk D, Filippi S, Stumpf MPH. Optimizing threshold-schedules for sequential approximate Bayesian computation: Applications to molecular systems. Stat Appl Genet Mol Biol. 2013;12(5):603–618. doi: 10.1515/sagmb-2012-0043 [DOI] [PubMed] [Google Scholar]
27. Csilléry K, François O, Blum MG. abc: an R package for approximate Bayesian computation (ABC). Methods in ecology and evolution. 2012;3(3):475–479. doi: 10.1111/j.2041-210X.2011.00179.x [DOI] [PubMed] [Google Scholar]
28. Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS ONE. 2013;8(9):e74335. doi: 10.1371/journal.pone.0074335 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(25):1923–1929. doi: 10.1093/bioinformatics/btp358 [DOI] [PubMed] [Google Scholar]
30. Lehmann EL, Casella G. Theory of point estimation. Springer Science & Business Media; 2006. [Google Scholar]
31. Li W, Fearnhead P. On the asymptotic efficiency of approximate Bayesian computation estimators. Biometrika. 2018;105(2):285–299. doi: 10.1093/biomet/asx078 [DOI] [Google Scholar]
32. Frazier DT, Martin GM, Robert CP, Rousseau J. Asymptotic properties of approximate Bayesian computation. Biometrika. 2018;105(3):593–607. doi: 10.1093/biomet/asy027 [DOI] [Google Scholar]
33. Villaverde AF, Pathirana D, Fröhlich F, Hasenauer J, Banga JR. A protocol for dynamic model calibration. Briefings in Bioinformatics. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
35.Radev ST, Mertens UK, Voss A, Ardizzone L, Köthe U. BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems. 2020;. [DOI] [PubMed]
36. Frazier DT, Robert CP, Rousseau J. Model misspecification in approximate Bayesian computation: consequences and diagnostics. J R Stat Soc B. 2020;. doi: 10.1111/rssb.12356 [DOI] [Google Scholar]
37.Schmon SM, Cannon PW, Knoblauch J. Generalized posteriors in approximate bayesian computation. arXiv preprint arXiv:201108644. 2020;.
38. Harrison JU, Baker RE. An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PloS one. 2020;15(8):e0236954. doi: 10.1371/journal.pone.0236954 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Didelot X, Everitt RG, Johansen AM, Lawson DJ. Likelihood-free estimation of model evidence. Bayesian analysis. 2011;6(1):49–76. doi: 10.1214/11-BA602 [DOI] [Google Scholar]
40. Bernton E, Jacob PE, Gerber M, Robert CP. Approximate Bayesian computation with the Wasserstein distance. J Roy Stat Soc B (Statistical Methodology). 2019;81(2):235–269. doi: 10.1111/rssb.12312 [DOI] [Google Scholar]
41. Prescott TP, Baker RE. Multifidelity approximate Bayesian computation with sequential Monte Carlo parameter sampling. SIAM-ASA J Uncertain Quantif. 2021;9(2):788–817. doi: 10.1137/20M1316160 [DOI] [Google Scholar]
42. Jülich Supercomputing Centre. JUWELS: Modular Tier-0/1 Supercomputer at the Jülich Supercomputing Centre. Journal of large-scale research facilities. 2019;5(A135). [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0285836.r001

Decision Letter 0

Nattapol Aunsri

21 Nov 2022

PONE-D-22-20368Informative and adaptive distances and summary statistics in sequential approximate Bayesian computationPLOS ONE

Dear Dr. Schälte,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR: I see that this work has a potential for publication in PLOS ONE. But I see that several issues posted by all reviewers must be clearly addressed before reconsideration.

Please submit your revised manuscript by Jan 05 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Nattapol Aunsri, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. New software must comply with the Open Source Definition.

3. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Please see the attached comments.

Reviewer #2: Review report of ``Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation''

by Yannik Schalte and Jan Hasenauer

Summary: In this paper the authors present a method to adaptively reweight terms in the distance metric used in approximate Bayesian computation.

These reweighting approach exploits sequential Monte Carlo (SMC) for ABC byt using early iterations to learn the inverse mapping from the

data or summary statistics to parameters. For the training steps they consider both statistical (i.e., linera regression) and machine learing

(i.e., neural networks). The key elements of there method is demonstrated on a specially constructed toy problem and then tested on a raneg of

common benchmark models for ABC methods. Consistently their new approach is shown to have greater statistical efficiency and, more importantly,

better robustness properties in the situation where data is corrupted by outliers.

I thoroughly enjoyed reading this paper as it addresses a very important problem for approximate Bayesian computation, that is the choice of distance

metric and summary statistics. It was also refreshing to the see the problem addressed for the cases when the inverse mapping does not exist due to

idendifiability concerns. While the statistically efficiency improvements are not always substantial, the robustness properties of this approach alone

with its applicability to non-identifiable models make this a very useful piece of work. I therefore recommend it for publication pending some minor

revisions based on my comments below.

1. Regarding the main contribution (Algorithm 2), I have a few minor comments:

1.1 What tuning is involved in selecting the value for t_train? In the paper it seems that the only

choices considered are t_train = 1 and t_train such that 40% of computational budget is used. How was

the 40% budget chosen and what other considerations are important for choosing t_train?

1.2 Throughout only L1 is used based on it's robustness properties. It would be interesting to see an examples of

this adaptive scheme under L2 or another Lp in the presence of outliers. Given the discussed robustness

properties of the adaptive reweighting, presumably this would better highlight this property compared to other

approaches with L2 or other Lp distances.

1.3 I believe there is a typo in the second line of Algorithm 2 (for t = t_train, ..., n_t do), as the conditions

in the subsquent if conditions (i.e., "if t < t_train then" and "if t+1 == t_trian") will alwasy be false.

2. In section 2.2.3 when using multiple parameter moments to account for identifiability I am not completely clear on

the implementation. Are you a) fitting k summaries independently (i.e., s_i from y ~ theta^i) or b) using all the moments

to obtain a single summary (i.e., s from y ~ theta + theta^1 + ... + theta^k).

3. For the toy example, it is not completely clear how the synthetic dataset is constructed. Is each observsation a) a

5-d vector (with each component from the respective distibution), or b) a scalar randomly chosen from one of the

components. If the answer is b) is each component equally likely, or are some rare events?

4. The following questions relate to the results:

4.1 In the results table in Figure 4, the vanilla L1+Ada.+MAD works surprisingly well in some cases. Can you provide some insight

into this? I also presume that these results are for equivalent computational budgets.

4.2 For the tumor problem under the corruption of outliers why are only L1+Ada.+PCMAD+StatLR+P4 and L1+Ada.+PCMAD+SensiLR+P4 applied?

5. There are a few discussion points that could be worth considering:

5.1. How do you deal with the case when rare "outlier" events are part of the data generating process.

5.2. beyond corruption of outliers, there are other kinds of miss-specification that are structural (e.g., using an SIR model when

the real process is SEIR). Could your approach be used to identify this?

5.3 I believe this adaptive scheme could be particularly valuable for multifidelity method which are methods that exploit approximate models

and appropriately correct for bias. Could your method be applied to automatically construct

summaries/distances for each model fidelity, such that the ROC properties of the approximations are improved (and therefore less need to

simulated the exact model). This could be particularly useful within Adaptive mutlifidelity schemes such as

https://doi.org/10.1137/20M1316160

https://arxiv.org/abs/2112.11971

https://doi.org/10.1016/j.jcp.2022.111543

6. Some minor typographical things throughout that I found, suggest a careful proof-read:

6.1 Line 30 "allowing to understand"

6.2 Line 33 "allows doing so"

6.3 Line 37 "In a nutshell" is a bit informal suggest "Put briefly"

6.4 Line 39 "This way," -> "In this way,"

6.5 Line 44 "demonstrate" -> "demonstrates"

6.6 Lines 54-55 sentence is a bit awkward, suggest rephrasing.

6.7 throughout section 2: define all maths symbols e.g., n_y n_s and n_theta are never defined

6.8 Line 105 "Monte-Carlo" -> "Monte Carlo"

6.9 Line 214 define PCMAD

Reviewer #3: Using NN to improve summary statistics of improve ABC is a nice idea. I posed three questions in my review of the manuscirpt concerning non-identifiability, curse of dimensionality and conditions for model selection.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Marcos A. Capistran

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: JPONE_D-22-20368_report.pdf

Click here for additional data file.^{(114.9KB, pdf)}

Attachment

Submitted filename: review.pdf

Click here for additional data file.^{(71.3KB, pdf)}

PLoS One. 2023 May 22;18(5):e0285836. doi: 10.1371/journal.pone.0285836.r002

Author response to Decision Letter 0

27 Jan 2023

Dear reviewers,

We thank you very much for your in-depth assessment of our manuscript. We have addressed all comments and incorporated changes which definitely improved the quality of our manuscript. Please find attached our one-by-one response, as well as a revised version of the manuscript.

Kind regards,

Yannik Schaelte

Attachment

Submitted filename: Response_Informative.pdf

Click here for additional data file.^{(495.7KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0285836.r003

Decision Letter 1

Nattapol Aunsri

28 Feb 2023

PONE-D-22-20368R1Informative and adaptive distances and summary statistics in sequential approximate Bayesian computationPLOS ONE

Dear Dr. Schälte,

ACADEMIC EDITOR: Please consider all reviewer's comments carefully before submitting your revised manuscript.

Please submit your revised manuscript by Apr 14 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Nattapol Aunsri, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #1: Please see the attached comments.

Reviewer #2: I am very pleased with the responses by the authors. All my questions and comments have been addressed.

I believe this will be a very useful paper to the systems biology community, and statistical computing more broadly.

Reviewer #3: The authors have addressed all my concerns. I appreciate their careful and insightful replies. In particular, the one about model selection.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Marcos A. Capistran

**********

Attachment

Submitted filename: JPONE_D-22-20368_report_R1.pdf

Click here for additional data file.^{(89.1KB, pdf)}

PLoS One. 2023 May 22;18(5):e0285836. doi: 10.1371/journal.pone.0285836.r004

Author response to Decision Letter 1

17 Apr 2023

Dear reviewers,

Thank you very much for your previous feedback. We hope to have addressed the remaining comments of Reviewer 1.

Best,

Yannik Schaelte

Attachment

Submitted filename: Response2_Informative.pdf

Click here for additional data file.^{(100.5KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0285836.r005

Decision Letter 2

Nattapol Aunsri

3 May 2023

Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

PONE-D-22-20368R2

Dear Dr. Schälte,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nattapol Aunsri, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have adequately addressed my remaining concerns. I have no remaining comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

PLoS One. doi: 10.1371/journal.pone.0285836.r006

Acceptance letter

Nattapol Aunsri

11 May 2023

PONE-D-22-20368R2

Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

Dear Dr. Hasenauer:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nattapol Aunsri

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Contains all the supporting text and figures.

(PDF)

Click here for additional data file.^{(348KB, pdf)}

Attachment

Submitted filename: JPONE_D-22-20368_report.pdf

Click here for additional data file.^{(114.9KB, pdf)}

Attachment

Submitted filename: review.pdf

Click here for additional data file.^{(71.3KB, pdf)}

Attachment

Submitted filename: Response_Informative.pdf

Click here for additional data file.^{(495.7KB, pdf)}

Attachment

Submitted filename: JPONE_D-22-20368_report_R1.pdf

Click here for additional data file.^{(89.1KB, pdf)}

Attachment

Submitted filename: Response2_Informative.pdf

Click here for additional data file.^{(100.5KB, pdf)}

Data Availability Statement

[pone.0285836.ref001] 1. Gershenfeld NA, Gershenfeld N. The nature of mathematical modeling. Cambridge university press; 1999. [Google Scholar]

[pone.0285836.ref002] 2. Kitano H. Systems Biology: A Brief Overview. Science. 2002;295(5560):1662–1664. doi: 10.1126/science.1069492 [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref003] 3.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM; 2005.

[pone.0285836.ref004] 4. Tavaré S, Balding DJ, Griffiths RC, Donnelly P. Inferring coalescence times from DNA sequence data. Genetics. 1997;145(2):505–518. doi: 10.1093/genetics/145.2.505 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref005] 5. Hasenauer J, Jagiella N, Hross S, Theis FJ. Data-Driven Modelling of Biological Multi-Scale Processes. J Coupled Syst Multiscale Dyn. 2015;3(2):101–121. doi: 10.1166/jcsmd.2015.1069 [DOI] [Google Scholar]

[pone.0285836.ref006] 6. Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol. 1999;16(12):1791–1798. doi: 10.1093/oxfordjournals.molbev.a026091 [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref007] 7. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian Computation in Population Genetics. Genetics. 2002;162(4):2025–2035. doi: 10.1093/genetics/162.4.2025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref008] 8. Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci. 2007;104(6):1760–1765. doi: 10.1073/pnas.0607208104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref009] 9. Del Moral P, Doucet A, Jasra A. Sequential Monte Carlo samplers. J R Stat Soc B. 2006;68(3):411–436. doi: 10.1111/j.1467-9868.2006.00553.x [DOI] [Google Scholar]

[pone.0285836.ref010] 10. Prangle D. Adapting the ABC Distance Function. Bayesian Analysis. 2017;12(1):289–309. doi: 10.1214/16-BA1002 [DOI] [Google Scholar]

[pone.0285836.ref011] 11. Schälte Y, Alamoudi E, Hasenauer J. Robust adaptive distance functions for approximate Bayesian inference on outlier-corrupted data. bioRxiv. 2021;. [Google Scholar]

[pone.0285836.ref012] 12. Blum MG, Nunes MA, Prangle D, Sisson SA. A comparative review of dimension reduction methods in approximate Bayesian computation. Stat Sci. 2013;28(2):189–208. doi: 10.1214/12-STS406 [DOI] [Google Scholar]

[pone.0285836.ref013] 13. Nunes MA, Balding DJ. On Optimal Selection of Summary Statistics for Approximate Bayesian Computation. Stat Appl Genet Mol. 2010;9(1). [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref014] 14. Drovandi CC, Pettitt AN, Faddy MJ. Approximate Bayesian computation using indirect inference. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2011;60(3):317–337. [Google Scholar]

[pone.0285836.ref015] 15. Fearnhead P, Prangle D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc B. 2012;74(3):419–474. doi: 10.1111/j.1467-9868.2011.01010.x [DOI] [Google Scholar]

[pone.0285836.ref016] 16. Jiang B, Wu Ty, Zheng C, Wong WH. Learning summary statistic for approximate Bayesian computation via deep neural network. Statistica Sinica. 2017; p. 1595–1618. [Google Scholar]

[pone.0285836.ref017] 17. Borowska A, Giurghita D, Husmeier D. Gaussian process enhanced semi-automatic approximate Bayesian computation: parameter inference in a stochastic differential equation system for chemotaxis. Journal of Computational Physics. 2021;429:109999. doi: 10.1016/j.jcp.2020.109999 [DOI] [Google Scholar]

[pone.0285836.ref018] 18. Wieland FG, Hauber AL, Rosenblatt M, Tönsing C, Timmer J. On structural and practical identifiability. Current Opinion in Systems Biology. 2021;25:60–69. doi: 10.1016/j.coisb.2021.03.005 [DOI] [Google Scholar]

[pone.0285836.ref019] 19. Barber S, Voss J, Webster M. The rate of convergence for approximate Bayesian computation. Electronic Journal of Statistics. 2015;9(1):80–105. doi: 10.1214/15-EJS988 [DOI] [Google Scholar]

[pone.0285836.ref020] 20. Sisson SA, Fan Y, Beaumont M. Handbook of approximate Bayesian computation. Chapman and Hall/CRC; 2018. [Google Scholar]

[pone.0285836.ref021] 21. Toni T, Stumpf MPH. Simulation-based model selection for dynamical systems in systems and population biology. Bioinf. 2010;26(1):104–110. doi: 10.1093/bioinformatics/btp619 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref022] 22. Drovandi CC, Pettitt AN. Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics. 2011;67(1):225–233. doi: 10.1111/j.1541-0420.2010.01410.x [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref023] 23. Filippi S, Barnes CP, Cornebise J, Stumpf MP. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat Appl Genet Mol. 2013;12(1):87–107. [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref024] 24.Klinger E, Hasenauer J. A scheme for adaptive selection of population sizes in Approximate Bayesian Computation—Sequential Monte Carlo. In: Feret J, Koeppl H, editors. Computational Methods in Systems Biology. CMSB 2017. vol. 10545 of Lecture Notes in Computer Science. Springer, Cham; 2017. p. 128–144.

[pone.0285836.ref025] 25. Klinger E, Rickert D, Hasenauer J. pyABC: distributed, likelihood-free inference. Bioinf. 2018;34(20):3591–3593. [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref026] 26. Silk D, Filippi S, Stumpf MPH. Optimizing threshold-schedules for sequential approximate Bayesian computation: Applications to molecular systems. Stat Appl Genet Mol Biol. 2013;12(5):603–618. doi: 10.1515/sagmb-2012-0043 [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref027] 27. Csilléry K, François O, Blum MG. abc: an R package for approximate Bayesian computation (ABC). Methods in ecology and evolution. 2012;3(3):475–479. doi: 10.1111/j.2041-210X.2011.00179.x [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref028] 28. Raue A, Schilling M, Bachmann J, Matteson A, Schelke M, Kaschek D, et al. Lessons learned from quantitative dynamical modeling in systems biology. PLoS ONE. 2013;8(9):e74335. doi: 10.1371/journal.pone.0074335 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref029] 29. Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(25):1923–1929. doi: 10.1093/bioinformatics/btp358 [DOI] [PubMed] [Google Scholar]

[pone.0285836.ref030] 30. Lehmann EL, Casella G. Theory of point estimation. Springer Science & Business Media; 2006. [Google Scholar]

[pone.0285836.ref031] 31. Li W, Fearnhead P. On the asymptotic efficiency of approximate Bayesian computation estimators. Biometrika. 2018;105(2):285–299. doi: 10.1093/biomet/asx078 [DOI] [Google Scholar]

[pone.0285836.ref032] 32. Frazier DT, Martin GM, Robert CP, Rousseau J. Asymptotic properties of approximate Bayesian computation. Biometrika. 2018;105(3):593–607. doi: 10.1093/biomet/asy027 [DOI] [Google Scholar]

[pone.0285836.ref033] 33. Villaverde AF, Pathirana D, Fröhlich F, Hasenauer J, Banga JR. A protocol for dynamic model calibration. Briefings in Bioinformatics. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref034] 34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]

[pone.0285836.ref035] 35.Radev ST, Mertens UK, Voss A, Ardizzone L, Köthe U. BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems. 2020;. [DOI] [PubMed]

[pone.0285836.ref036] 36. Frazier DT, Robert CP, Rousseau J. Model misspecification in approximate Bayesian computation: consequences and diagnostics. J R Stat Soc B. 2020;. doi: 10.1111/rssb.12356 [DOI] [Google Scholar]

[pone.0285836.ref037] 37.Schmon SM, Cannon PW, Knoblauch J. Generalized posteriors in approximate bayesian computation. arXiv preprint arXiv:201108644. 2020;.

[pone.0285836.ref038] 38. Harrison JU, Baker RE. An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PloS one. 2020;15(8):e0236954. doi: 10.1371/journal.pone.0236954 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0285836.ref039] 39. Didelot X, Everitt RG, Johansen AM, Lawson DJ. Likelihood-free estimation of model evidence. Bayesian analysis. 2011;6(1):49–76. doi: 10.1214/11-BA602 [DOI] [Google Scholar]

[pone.0285836.ref040] 40. Bernton E, Jacob PE, Gerber M, Robert CP. Approximate Bayesian computation with the Wasserstein distance. J Roy Stat Soc B (Statistical Methodology). 2019;81(2):235–269. doi: 10.1111/rssb.12312 [DOI] [Google Scholar]

[pone.0285836.ref041] 41. Prescott TP, Baker RE. Multifidelity approximate Bayesian computation with sequential Monte Carlo parameter sampling. SIAM-ASA J Uncertain Quantif. 2021;9(2):788–817. doi: 10.1137/20M1316160 [DOI] [Google Scholar]

[pone.0285836.ref042] 42. Jülich Supercomputing Centre. JUWELS: Modular Tier-0/1 Supercomputer at the Jülich Supercomputing Centre. Journal of large-scale research facilities. 2019;5(A135). [Google Scholar]

PERMALINK

Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation

Yannik Schälte

Jan Hasenauer

Roles

Abstract

1 Introduction

2 Methods

2.1 Background

2.1.1 Approximate Bayesian computation

2.1.2 Sequential importance sampling

2.1.3 Adaptive distances

2.1.4 Regression-based summary statistics

Fig 1. Concept visualization.

2.2 Adaptive and informative regression-based distances and summary statistics

2.2.1 Integrating summary statistics learning and adaptive distances

2.2.2 Regression-based sensitivity weights

2.2.3 Optimal summary statistics to recover distribution features

2.3 Implementation

3 Results

3.1 Distances and summary statistics

3.2 Performance on dedicated demonstration problem

Fig 2.

Solely scale-normalized distances without informativeness assessment converge slowly

Non-scale-normalized distances converge unevenly

Regression models not accounting for non-identifiability cannot capture posterior

Novel approaches fit all parameters well

Sensitivity weights permit further insights

Fig 3.

3.3 Performance on general test problems

Table 1. Test model properties: Identifier, short description, number of parameters nθ and data points ny, population size N and maximum number of model simulations after which an analysis was terminated.

Fig 4. Median RMSE (smaller is better) for the parameters of models T1–6 (columns) obtained for 15 inference methods (rows), using an L1 distance, either uniformly weighted if unspecified, or with adaptive MAD scale normalization (“Ada.+MAD”).

Delay of regression model training advantageous on complex models

Scale normalization improves performance for regression-based summary statistics

Sensitivity-weighted distances perform highly robustly

Regression-based summary statistics can be superior but also less robust

No clear preference for regression model or target augmentation

3.4 Performance on application example

Sensitivity weights identify uninformative model outputs

Fig 5. Posterior marginals for 5 out of the 7 model parameters of the tumor problem with interesting behavior.

Fig 6. Fits, scale and sensitivity weights for the tumor problem on outlier-free (top) and outlier-corrupted (bottom) data.

Robust on outlier-corrupted data

4 Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Nattapol Aunsri

Roles

Author response to Decision Letter 0

Decision Letter 1

Nattapol Aunsri

Roles

Author response to Decision Letter 1

Decision Letter 2

Nattapol Aunsri

Roles

Acceptance letter

Nattapol Aunsri

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 1. Test model properties: Identifier, short description, number of parameters n_θ and data points n_y, population size N and maximum number of model simulations after which an analysis was terminated.