Approximate Bayesian inference for complex ecosystems

Michael PH Stumpf

doi:10.12703/P6-60

. 2014 Jul 17;6:60. doi: 10.12703/P6-60

Approximate Bayesian inference for complex ecosystems

Michael PH Stumpf ^1,^✉

PMCID: PMC4136695 PMID: 25152812

Abstract

Mathematical models have been central to ecology for nearly a century. Simple models of population dynamics have allowed us to understand fundamental aspects underlying the dynamics and stability of ecological systems. What has remained a challenge, however, is to meaningfully interpret experimental or observational data in light of mathematical models. Here, we review recent developments, notably in the growing field of approximate Bayesian computation (ABC), that allow us to calibrate mathematical models against available data. Estimating the population demographic parameters from data remains a formidable statistical challenge. Here, we attempt to give a flavor and overview of ABC and its applications in population biology and ecology and eschew a detailed technical discussion in favor of a general discussion of the advantages and potential pitfalls this framework offers to population biologists.

Introduction

Theoretical population biology has been crucial for our understanding of ecosystems [1]. Mathematical models can explain elegantly what might appear as bewilderingly complex variations in species abundances. Seminal work starting in the early 20^th century [2-4] has, in fact, become so familiar to population biologists and beyond that today we are hardly surprised to see complex oscillatory patterns or complex dependencies of population dynamics on a myriad of environmental and demographic factors [5]. Many of these phenomena can straightforwardly be explained in terms of relatively simple population dynamics models. The success of these models has also meant that ecological ideas are coming to pervade the analysis of other interacting systems, including cancer [6], stem cells [7,8], and even the banking system [9,10], all of which are characterized by the interactions between different entities that affect the overall dynamics of the system and its stability.

Simple models are beguiling and shape our intuition and allow us to explain trends in data. In many important scenarios, however, different factors come together with sometimes complex patterns resulting from their interplay. Thus, understanding realistic systems—subject to a multitude of internal and external factors—is hard [11,12]. This is further complicated in situations where models are used to make predictions or assess different types of interventions in silico prior to their implementation in, for example, conservation biology.

These challenges are not unique to theoretical ecology, of course, and recent years have seen concerted efforts to tackle the so-called inverse problem: estimating parameters of a model from data [13]; choosing from among a set of plausible candidate models the model that is best able to explain the data [14]; or inferring mechanistic or statistical dependencies between the different state variables making up a system—in an ecological case, this would, for example, be the species considered in the model. Below, we are considering population dynamical models where a vector, x, containing N species, $x_{1}, x_{2}, \dots, x_{N}$ describes the abundances of species in the ecosystem. These are assumed to change as a result of interactions among the species (and potentially external factors) according to some rate laws,

\frac{d x}{d t} = f (x; Θ),

(1)

where, with slight abuse of notations, we will also implicitly allow for stochastic dynamics. The community matrix of the ecological system (1) is, of course, given by

A = \frac{\partial f (x; Θ)}{\partial x},

which captures the ecological relationships among the species. Finally, the (vector-valued) parameter Θ denotes the typically unknown demographic and system parameters (for example, birth, death, and migration rates) as well as parameters characterizing the interactions between and within species.

Below, we will discuss methods that allow us to infer the parameters, Θ, and choose between different potential models (for example, $f_{1}, f_{2}, \dots, f_{K}$ ). The statistical toolset that we will discuss, centered primarily around ABC [15,16], complements traditional mathematical approaches that have been used in theoretical population biology to great effect since the 1950s. But the aim here is—rather than to focus on general mathematical laws governing the behavior and fate of natural populations—to make models as specific to a given problem, to identify the key factors driving an ecosystem's dynamics, or to make predictions about the future of an ecosystem.

There are well-defined statistical frameworks to deal with parameter inference. Model selection—the process of comparing the ability of different models to explain some data—is continuing to attract the attention of statisticians and domain experts in different scientific disciplines [14]. But for many challenging real-world problems, conventional statistical approaches become computationally too cumbersome very quickly. This class of problems includes many stochastic processes, highly structured populations, and those where different types of data need to be considered. Often, it is still straightforward to establish simulation models—in general, real-world problems tend to defy purely analytical approaches—but conventional statistical approaches become computationally too expensive.

Arguably, many of the most contentious problems in population biology (or science in general) fall into this category of problems. A model abstracts from reality what are known or believed to be the essential features of a real natural (or technological or social) system. This fact alone has in the past added to some controversies: as “all models are wrong” [17], it is necessary to identify the best model that captures and allows us to quantitatively and qualitatively understand the dynamics of the real system. Thus, we need statistical tools that allow us to deal with complex systems, many of which are expected to stretch conventional statistics. Here, we develop a viable alternative that maintains most if not all of the advantages of the Bayesian inferential apparatus but can be extended to problems defying conventional statistics.

Model calibration and parameter estimation

Given a model, f(x;Θ), and some data, $D = {d_{1}, \dots, d_{n}}$ , we need to infer the parameters, Θ, from the data. The likelihood [18] is defined as the probability of obtaining some data D given a parameter value Θ',

L (Θ') = \Pr (D | Θ')

(2)

This is the central quantity in likelihood inference; crucially, the likelihood contains all the information about the parameter that can be extracted from the data D. In Bayesian inference [19], it, together with the prior distribution of Θ, Pr(Θ), strikes a balance between what is or can be known about the parameter prior to having seen the data, and the information contained in the data, to give rise to the posterior distribution,

P r (Θ | D) = \frac{P r (D | Θ) P r (Θ)}{P r (D)} .

(3)

Here, Pr(D) denotes the evidence. It is often thought of as a normalization constant but does in fact contain information about the ability of a model to describe the data.

Obtaining the posterior distribution, or a sample from it, is computationally demanding. In general, computing the evidence Pr(D), which is typically a high-dimensional integral, is complicated. Sometimes, the focus, therefore, may shift from consideration of the whole (posterior) distribution to the maximum (mode) of the posterior distribution; this maximum a posteriori estimate is the Bayesian equivalent to the maximum likelihood estimate.

So that the additional information contained in the distribution can be obtained, a wealth of computational statistical approaches have been developed. Markov chain Monte Carlo (MCMC) methods have become the main workhorses of computational Bayesian statistics and have allowed us to generate samples from the posterior distribution. Recent years have witnessed increased interest in these and related methods—such as population and sequential Monte Carlo techniques—but even the most sophisticated approaches reach their limits when the number of parameters or the complexity of the model increases. The first problem, the so-called curse of dimensionality, is shared by all statistical inference procedures.

The second problem is more interesting. For example, we may ask ourselves whether there are simpler versions, $M_{a} (Θ_{a})$ , of the model that we are considering, $M_{0} (Θ)$ , that would, despite the simplification or coarse-graining (by simplification we typically mean that the dimension of the parameter vector is smaller in the simplified model, that is, $| Θ_{a} | < | Θ |$ ), allow us to draw meaningful, verifiable (or falsifiable) mechanistic insights from the available data. In principle, this might appear to exacerbate the statistical problem, for we would have to find computationally affordable and sufficiently discriminatory ways of deciding if and when a simpler model $M_{a} (Θ_{a})$ is a good approximation to the original model, $M_{0} (Θ)$ . We will return to this point again below. First, however, we discuss ABC methods, which form an alternative approach to tackling statistically challenging problems in a Bayesian framework and which have become a popular alternative to conventional (or exact) Bayesian inference in many applications, especially in evolutionary, population, and systems biology.

Approximate Bayesian computation

In ABC, we stay as close as possible [16] to the model of interest but instead forgo evaluation of the likelihood in favor of a comparison between simulated and real data [15,20,21]. For many systems, the likelihood becomes computationally intractable, either because of the model complexity or the detailed nature of the data. Nevertheless, the underlying model can still be simulated. The principal underlying insight of ABC is that we can consider

P r (D | Θ) = \lim_{ε \to 0} P r (Δ [D, D_{Θ}] \leq ε) | Θ),

(4)

where $D_{Θ}$ is data obtained by simulating from our model with parameter Θ, Δ[x,x'] is a distance function that can be chosen flexibly to suit the problem at hand, and ε is a tolerance threshold that reflects the desired accuracy of our inference. The essential problem is that for any complicated problem, it is impossible to obtain the precise dataset, D, by simulating from the model $D_{Θ}$ , even if we know the true parameter (we ignore the artificial problem of deterministic dynamics with no observational noise). By increasing the threshold ε, our inference becomes more approximate, but the chance of obtaining a simulated dataset for which $Δ [D, D_{Θ}] < ε$ increases.

The comparison of real and simulated data is particularly straightforward for ecological time-series data, for example [22]. Here, D might take the form of vectors of population abundances, x_t, for n species collected at t=1,…,m time-points. In this case, the Euclidean (or any other vector) norm provides a suitable distance. The analysis of dynamical systems is thus relatively straightforward in an ABC framework. Generalization to compartmental or spatio-temporal models (or both) is straightforward [23]: if we can simulate data efficiently, we can appeal to the Bayesian inference formalism via ABC (keeping in mind the nature of the approximation and the tolerance threshold ε).

Instead of comparing the data, we can compare aspects of the data, such as summary statistics. This has been one of the main advantages as well as sources of contention for ABC inference. We call a statistic, s of the data, sufficient if and only if

P r (D | Θ, s) = P r (D | s) .

In this case, we can replace the data by the sufficient statistic without any loss of information about the parameter, Θ. The attraction of using sufficient summary statistics lies in the fact that their dimension, d_S, is typically much smaller than the dimension of the data itself,d_D (in the above example of n species sampled at m time points, d_D=m×n); that is, d_S ≪ d_D. Especially in population genetics, which has inspired the rise of ABC methods since the late 1990s, the use of summary statistics has been popular (see, for example, [24-28]). With the use of summary statistics, the likelihood can be written as

P r (D | Θ) = \lim_{ε \to 0} P r (Δ [s (D), s (D_{Θ})] \leq ε) | Θ),

(5)

with potentially an appropriate change in the distance function and ε.

Although Equation 5 works very well for parameter inference if s is sufficient, it is important to note that sufficient statistics are few and far between for any real-world problem. Unfortunately, ABC requires appropriate sufficient statistics (or comparisons of the data directly, as in the case of time-series problems). There have been attempts to generate collections of statistics that together fulfil sufficiency properties [28-33], but these are computationally expensive in their own right.

So far, we have implicitly considered ABC in a simple rejection framework: (a) we sample a parameter from a suitable prior, (b) we simulate the model for the parameter, and then (c) we compare the simulated and the real data (or their respective summary) statistics and accept a parameter as a draw from the ABC posterior if the distance is below some threshold. Steps (a) to (c) are repeated until a sufficiently large number of parameter values have been accepted. The posterior in this case is represented as a sum over indicator functions, $P r_{A B C} (Θ | D) = \sum_{i = 1}^{N} 1 (Θ_{i})$ , where either $Δ [D, D - Θ_{i}] \leq ε$ or $Δ [s (D), s (D_{Θ_{i}})] \leq ε$ , depending on whether the data or sufficient summaries are used in the inference process.

This framework is as simple as it is impractical: like all rejection samplers, it is limited to small problems involving less than a handful of parameters. It has been possible to construct ABC-MCMC samplers [21], but the real workhorses of most ABC approaches to real-world problems are based on sequential Monte Carlo (SMC) approaches [22,34]; ABC-SMC has become a very popular field of research (arguably inspiring more detailed analysis also in exact SMC samplers), and recent developments are allowing us to tackle larger and more complicated systems [35]. The most widely used flavor of ABC-SMC proceeds by constructing a set of intermediate distributions that start from the prior and increasingly resemble the posterior. To do so, a sequence of decreasing thresholds, $ε_{1} \geq ε_{2} \geq \dots ε_{K}$ (with $ε_{K} = ε$ ), is defined and the sequence of distributions is constructed by sampling parameter vectors from the previous distribution (or the prior in the first step), perturbing them by using some perturbation kernel function, and accepting those parameter vectors for which the distance between real and simulated data falls below the threshold $ε_{k}$ . Choice of the thresholds and the nature of the perturbation kernels determine the computational efficiency and runtime of the inference, but both can be tuned to speed up the process and tackle larger problems [36,37].

Model selection and checking

So far, we have assumed that we have a single model that describes our system of interest. The Bayesian framework readily provides us with credible intervals for parameters, but it is also possible to assign probabilities for different models to be the correct model, conditional on the available data and the set of competing models, $M_{i} \in M = {M_{1}, \dots, M_{k}}$ . In the likelihood framework, the comparison of general (that is, non-nested) models is made possible only through the use of information criteria; in the Bayesian framework, the posterior probability of a model is given analogously to Equation 3 by [14]

P r (M_{i} | D) = \frac{P r (D | M_{i}) P r (M_{i})}{P r (D)} .

(6)

which is also known as the marginal likelihood of model $M_{i}$ . In principle, Bayesian model selection allows us to compare any number of arbitrary models. An additional advantage is that the selection via the marginal likelihood, Equation 6, automatically strikes a balance between the ability of models to reproduce or explain the observed data, the complexity of the model, and the robustness of the inference.

Equation 6 can be interpreted in the ABC framework [22,38,39], and ABC model selection has been an area of great interest and activity [40-45]. Although model selection is indeed straightforward if experimental and simulated data are compared directly, it has been shown that model selection becomes unreliable when summary statistics instead of the data are compared [46,47]: summary statistics are sufficient for model selection for only a very restricted set of problems. Constructing sets of statistics that are sufficient for model selection (they must be sufficient for every model considered and across the models; this is an area of active research [48,49]), while possible in principle, is computationally enormously demanding.

In many ecological problems, however, we deal with spatio-temporal time-series data, for which model selection is possible. Our aim in such cases is typically to identify the most promising mechanistic descriptions of a complex system. If no single model emerges from such a comparison, then we need to investigate those models that have comparably high marginal likelihoods. Simulations from the respective model posteriors can then be used, for example, to develop more discriminatory experimental designs that allow us to further distinguish among these models [50]. This, too, is an area of continuing importance for ABC.

Applicability of approximate Bayesian computation: an outlook

ABC methods were borne out of a need to tackle problems that defy conventional statistical methodologies. It has become clear, however, that whenever suitable Bayesian alternatives that do deal with the proper likelihood are available, ABC becomes computationally too expensive. The reason for this is primarily the fact that the representation of the posterior (as a weighted sum over Dirac δ functions) is not very efficient. So when alternatives are available, they ought to be used. In parallel to their role in computationally demanding applications, ABC techniques have, more recently, also attracted attention as an inferential framework in their own right [16,51]. From this, interesting new approaches to deal with real-world problems may well emerge [52].

In conclusion, ABC-based methods are best suited to those problems for which other likelihood-based (or exact) Bayesian inference procedures do not yet exist. This appears to still include a host of challenging and interesting problems. Many stochastic and highly structured spatio-temporal problems in ecology, epidemiology, and evolutionary genetics clearly fall into this category. The recent developments discussed above mean that ABC has become a viable new way of tackling computationally demanding parameter inference problems. Given a model—as long as we can simulate it—ABC gives us a handle to evaluate approximate posterior distributions, which then can be further evaluated. Sensitivity and robustness analyses, but also predictions of future behavior or the likely effects of any interventions or perturbations, can be analyzed by simulating the model with parameters sampled from the posterior. There is enormous scope for basing the exploration of, for example, policy or conservation measures on the available data in this way. ABC has, for example, been used in experimental design [50,53] and in synthetic biology [14,54] to generate designs of molecular pathways that exhibit certain types of behavior. In such cases, we replace the observed data, D, by a representation of the desired behavior (such as the desired abundance of a species). Then the inference procedure is used to identify the scenario for which we are most likely to observe this outcome. Such predictions then reflect the best available evidence in light of the data and the model.

As an aside, it is worth keeping in mind that the technical challenges of statistical inference and modeling can often be minor compared with the difficulties in communicating the results to policy makers or the general public. Many of the most pressing problems in ecology have become highly emotive topics as they nearly always involve a conflict between parties that have very different priorities (see, for example, [45,55]). In many complicated situations, the nuance and cautiousness that accompany how we present such analyses could be taken for wavering or lack of reliability. Here, however, ABC, with its explicit focus on simulation, may even have an advantage, as the underlying rationale is so straightforwardly explained and easy to understand.

Abbreviations

ABC: approximate Bayesian computation
MCMC: Markov chain Monte Carlo
SMC: sequential Monte Carlo

Disclosures

The author declares that he has no disclosures.

The electronic version of this article is the complete one and can be found at: http://f1000.com/prime/reports/b/6/60

References

1.May RM. Stability and complexity in model ecosystems. 1. Princeton: Princeton University Press; 1973. [PubMed] [Google Scholar]
2.Lotka AJ. Analytical Note on Certain Rhythmic Relations in Organic Systems. Proc Natl Acad Sci USA. 1920;6:410–5. doi: 10.1073/pnas.6.7.410. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Volterra V. Fluctuations in the abundance of a species considered mathematically. Nature. 1926;1926;118:558–60. doi: 10.1038/118558a0. [DOI] [Google Scholar]
4.Wilson EO, MacArthur RH. The Theory of Island Biogeography. Princeton: Princeton University Press; 1967. [Google Scholar]
5.Pascual M. Computational ecology: from the complex to the simple and back. PLoS Comput Biol. 2005;1:101–5. doi: 10.1371/journal.pcbi.0010018. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Siegmund KD, Marjoram P, Shibata D. Modeling DNA methylation in a population of cancer cells. Stat Appl Genet Mol Biol. 2008;7 doi: 10.2202/1544-6115.1374. Article 18. [DOI] [PubMed] [Google Scholar]
7.Powell K. Stem-cell niches: it's the ecology, stupid! Nature. 2005;435:268–70. doi: 10.1038/435268a. [DOI] [PubMed] [Google Scholar]
8.MacLean AL, Lo Celso C, Stumpf MPH. Population dynamics of normal and leukaemia stem cells in the haematopoietic stem cell niche show distinct regimes where leukaemia will be controlled. J R Soc Interface. 2013;10:20120968. doi: 10.1098/rsif.2012.0968. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.May RM, Arinaminpathy N. Systemic risk: the dynamics of model banking systems. J R Soc Interface. 2010;7:823–38. doi: 10.1098/rsif.2009.0359. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Haldane AG, May RM. Systemic risk in banking ecosystems. Nature. 2011;469:351–5. doi: 10.1038/nature09659. [DOI] [PubMed] [Google Scholar]; http://f1000.com/prime/13322956
11.Pimm SL. The complexity and stability of ecosystems. Nature. 1984;307:321–26. doi: 10.1038/307321a0. [DOI] [Google Scholar]
12.Jansen VAA, Kokkoris GD. Complexity and stability revisited. Ecology Letters. 2003;6:498–502. doi: 10.1046/j.1461-0248.2003.00464.x. [DOI] [Google Scholar]
13.Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003;13:2467–74. doi: 10.1101/gr.1262503. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kirk P, Thorne T, Stumpf Michael P H. Model selection in systems and synthetic biology. Curr Opin Biotechnol. 2013;24:767–74. doi: 10.1016/j.copbio.2013.03.012. [DOI] [PubMed] [Google Scholar]
15.Csilléry K, Blum, Michael G B, Gaggiotti OE, François O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. (Amst.) 2010;25:410–8. doi: 10.1016/j.tree.2010.04.001. [DOI] [PubMed] [Google Scholar]
16.Wilkinson RD. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat Appl Genet Mol Biol. 2013;12:129–41. doi: 10.1515/sagmb-2013-0010. [DOI] [PubMed] [Google Scholar]; http://f1000.com/prime/718414656
17.Box G. Sampling and Bayes' inference in scientific modelling and robustness. J R Stat Soc Series B Stat Methodol 1980. 143:383–430. doi: 10.2307/2982063. [DOI] [Google Scholar]
18.Cox DR. Principles of Statistical Inference. Cambridge: Cambridge University Press; 2006. [DOI] [Google Scholar]
19.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman & Hall; 2003. [Google Scholar]
20.Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025–35. doi: 10.1093/genetics/162.4.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Marjoram P, Molitor J, Plagnol V, Tavare S. Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2003;100:15324–8. doi: 10.1073/pnas.0306899100. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf Michael P H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009;6:187–202. doi: 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Liepe J, Taylor H, Barnes CP, Huvet M, Bugeon L, Thorne T, Lamb JR, Dallman MJ, Stumpf Michael P H. Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr Biol (Camb) 2012;4:335–45. doi: 10.1039/c2ib00175f. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Chan YL, Anderson, Christian N K, Hadly EA. Bayesian estimation of the timing and severity of a population bottleneck from ancient DNA. PLoS Genet. 2006;2:e59. doi: 10.1371/journal.pgen.0020059. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hickerson MJ, Stahl E, Takebayashi N. msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC Bioinformatics. 2007;8:268. doi: 10.1186/1471-2105-8-268. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Wegmann D, Leuenberger C, Excoffier L. Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics. 2009;182:1207–18. doi: 10.1534/genetics.109.102509. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Rosvold J, Røed KH, Hufthammer AK, Andersen R, Stenøien HK. Reconstructing the history of a fragmented and heavily exploited red deer population using ancient and contemporary DNA. BMC Evol Biol. 2012;12:191. doi: 10.1186/1471-2148-12-191. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Soubeyrand S, Carpentier F, Guiton F, Klein EK. Approximate Bayesian computation with functional statistics. Stat Appl Genet Mol Biol. 2013;12:17–37. doi: 10.1515/sagmb-2012-0014. [DOI] [PubMed] [Google Scholar]
29.Nunes MA, Balding DJ. On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol. 2010;9 doi: 10.2202/1544-6115.1576. Article 34. [DOI] [PubMed] [Google Scholar]
30.Jung H, Marjoram P. Choice of summary statistic weights in approximate Bayesian computation. Stat Appl Genet Mol Biol. 2011;10 doi: 10.1111/j.1467-9868.2011.01010.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Fearnhead P, Prangle D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc Series B Stat Methodol 2012, 74:419–74. doi: 10.1534/genetics.112.143164. [DOI] [Google Scholar]
32.Aeschbacher S, Beaumont MA, Futschik A. A novel approach for choosing summary statistics in approximate Bayesian computation. Genetics. 2012;192:1027–47. doi: 10.1515/sagmb-2012-0050. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Nakagome S, Fukumizu K, Mano S. Kernel approximate Bayesian computation in population genetic inferences. Stat Appl Genet Mol Biol. 2013;12:667–78. doi: 10.1073/pnas.0607208104. [DOI] [PubMed] [Google Scholar]
34.Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2007;104:1760–5. doi: 10.1038/nprot.2014.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Liepe J, Kirk P, Filippi S, Toni T, Barnes CP, Stumpf Michael P H. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat Protoc. 2014;9:439–56. doi: 10.1515/sagmb-2012-0043. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Silk D, Filippi S, Stumpf Michael P H. Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. Stat Appl Genet Mol Biol. 2013;12:603–18. doi: 10.1515/sagmb-2012-0069. [DOI] [PubMed] [Google Scholar]
37.Filippi S, Barnes CP, Cornebise J, Stumpf Michael P H. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat Appl Genet Mol Biol. 2013;12:87–107. doi: 10.1093/bioinformatics/btp619. [DOI] [PubMed] [Google Scholar]
38.Toni T, Stumpf Michael P H. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics. 2010;26:104–10. doi: 10.1534/genetics.109.109058. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Leuenberger C, Wegmann D. Bayesian computation and model selection without likelihoods. Genetics. 2010;184:243–52. doi: 10.1111/j.1365-294X.2010.04783.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Peter BM, Wegmann D, Excoffier L. Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure. Mol Ecol. 2010;19:4648–60. doi: 10.1371/journal.pgen.1001036. [DOI] [PubMed] [Google Scholar]
41.Morelli G, Didelot X, Kusecek B, Schwarz S, Bahlawane C, Falush D, Suerbaum S, Achtman M. Microevolution of Helicobacter pylori during prolonged infection of single hosts and within families. PLoS Genet. 2010;6:e1001036. doi: 10.1111/j.1755-0998.2012.03153.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Estoup A, Lombaert E, Marin J, Guillemaud T, Pudlo P, Robert CP, Cornuet J. Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics. Mol Ecol Resour. 2012;12:846–55. doi: 10.1098/rsif.2012.0220. [DOI] [PubMed] [Google Scholar]
43.Thorne T, Stumpf Michael P H. Graph spectral analysis of protein interaction network evolution. J R Soc Interface. 2012;9:2653–66. doi: 10.1371/journal.pcbi.1002835. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Ratmann O, Donker G, Meijer A, Fraser C, Koelle K. Phylodynamic inference and model assessment with approximate bayesian computation: influenza as a case study. PLoS Comput Biol. 2012;8:e1002835. doi: 10.1111/mec.12264. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Mardulyn P, Goffredo M, Conte A, Hendrickx G, Meiswinkel R, Balenghien T, Sghaier S, Lohr Y, Gilbert M. Climate change and the spread of vector-borne diseases: using approximate Bayesian computation to compare invasion scenarios for the bluetongue virus vector Culicoides imicola in Italy. Mol Ecol. 2013;22:2456–66. doi: 10.1214/11-BA602. [DOI] [PubMed] [Google Scholar]
46.Didelot X, Everitt RG, Johansen AM, Lawson DJ. Likelihood-free estimation of model evidence. Bayesian Analysis. 2011;6:49–76. doi: 10.1073/pnas.1102900108. [DOI] [Google Scholar]
47.Robert CP, Cornuet J, Marin J, Pillai NS. Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci USA. 2011;108:15112–7. doi: 10.1007/s11222-012-9335-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Barnes CP, Filippi S, Stumpf MPH, Thorne T. Considerate approaches to constructing summary statistics for ABC model selection. Statistics and Computing. 2012;22:1181–97. doi: 10.1515/sagmb-2013-0012. [DOI] [Google Scholar]
49.Prangle D, Fearnhead P, Cox MP, Biggs PJ, French NP. Semi-automatic selection of summary statistics for ABC model choice. Stat Appl Genet Mol Biol. 2014;13:67–82. doi: 10.1111/biom.12081. [DOI] [PubMed] [Google Scholar]
50.Drovandi CC, Pettitt AN. Bayesian experimental design for models with intractable likelihoods. Biometrics. 2013;69:937–48. doi: 10.1073/pnas.0807882106. [DOI] [PubMed] [Google Scholar]
51.Ratmann O, Andrieu C, Wiuf C, Richardson S. Model criticism based on likelihood-free inference, with an application to protein network evolution. Proc Natl Acad Sci USA. 2009;106:10576–81. doi: 10.1073/pnas.1208827110. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Mengersen KL, Pudlo P, Robert CP. Bayesian computation via empirical likelihood. Proc Natl Acad Sci USA. 2013;110:1321–6. doi: 10.1371/journal.pcbi.1002888. [DOI] [PMC free article] [PubMed] [Google Scholar]; http://f1000.com/prime/718441389
53.Liepe J, Filippi S, Komorowski M, Stumpf Michael P H. Maximizing the information content of experiments in systems biology. PLoS Comput Biol. 2013;9:e1002888. doi: 10.1073/pnas.1017972108. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Barnes CP, Silk D, Sheng X, Stumpf Michael P H. Bayesian design of synthetic biological systems. Proc Natl Acad Sci USA. 2011;108:15190–5. doi: 10.1021/es1030432. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Rieckermann J, Anta J, Scheidegger A, Ort C. Assessing wastewater micropollutant loads with approximate Bayesian computations. Environ Sci Technol. 2011;45:4399–406. doi: 10.1021/es1030432. [DOI] [PubMed] [Google Scholar]

[bib-001] 1.May RM. Stability and complexity in model ecosystems. 1. Princeton: Princeton University Press; 1973. [PubMed] [Google Scholar]

[bib-002] 2.Lotka AJ. Analytical Note on Certain Rhythmic Relations in Organic Systems. Proc Natl Acad Sci USA. 1920;6:410–5. doi: 10.1073/pnas.6.7.410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-003] 3.Volterra V. Fluctuations in the abundance of a species considered mathematically. Nature. 1926;1926;118:558–60. doi: 10.1038/118558a0. [DOI] [Google Scholar]

[bib-004] 4.Wilson EO, MacArthur RH. The Theory of Island Biogeography. Princeton: Princeton University Press; 1967. [Google Scholar]

[bib-005] 5.Pascual M. Computational ecology: from the complex to the simple and back. PLoS Comput Biol. 2005;1:101–5. doi: 10.1371/journal.pcbi.0010018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-006] 6.Siegmund KD, Marjoram P, Shibata D. Modeling DNA methylation in a population of cancer cells. Stat Appl Genet Mol Biol. 2008;7 doi: 10.2202/1544-6115.1374. Article 18. [DOI] [PubMed] [Google Scholar]

[bib-007] 7.Powell K. Stem-cell niches: it's the ecology, stupid! Nature. 2005;435:268–70. doi: 10.1038/435268a. [DOI] [PubMed] [Google Scholar]

[bib-008] 8.MacLean AL, Lo Celso C, Stumpf MPH. Population dynamics of normal and leukaemia stem cells in the haematopoietic stem cell niche show distinct regimes where leukaemia will be controlled. J R Soc Interface. 2013;10:20120968. doi: 10.1098/rsif.2012.0968. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-009] 9.May RM, Arinaminpathy N. Systemic risk: the dynamics of model banking systems. J R Soc Interface. 2010;7:823–38. doi: 10.1098/rsif.2009.0359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-010] 10.Haldane AG, May RM. Systemic risk in banking ecosystems. Nature. 2011;469:351–5. doi: 10.1038/nature09659. [DOI] [PubMed] [Google Scholar]; http://f1000.com/prime/13322956

[bib-011] 11.Pimm SL. The complexity and stability of ecosystems. Nature. 1984;307:321–26. doi: 10.1038/307321a0. [DOI] [Google Scholar]

[bib-012] 12.Jansen VAA, Kokkoris GD. Complexity and stability revisited. Ecology Letters. 2003;6:498–502. doi: 10.1046/j.1461-0248.2003.00464.x. [DOI] [Google Scholar]

[bib-013] 13.Moles CG, Mendes P, Banga JR. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 2003;13:2467–74. doi: 10.1101/gr.1262503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-014] 14.Kirk P, Thorne T, Stumpf Michael P H. Model selection in systems and synthetic biology. Curr Opin Biotechnol. 2013;24:767–74. doi: 10.1016/j.copbio.2013.03.012. [DOI] [PubMed] [Google Scholar]

[bib-015] 15.Csilléry K, Blum, Michael G B, Gaggiotti OE, François O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. (Amst.) 2010;25:410–8. doi: 10.1016/j.tree.2010.04.001. [DOI] [PubMed] [Google Scholar]

[bib-016] 16.Wilkinson RD. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat Appl Genet Mol Biol. 2013;12:129–41. doi: 10.1515/sagmb-2013-0010. [DOI] [PubMed] [Google Scholar]; http://f1000.com/prime/718414656

[bib-017] 17.Box G. Sampling and Bayes' inference in scientific modelling and robustness. J R Stat Soc Series B Stat Methodol 1980. 143:383–430. doi: 10.2307/2982063. [DOI] [Google Scholar]

[bib-018] 18.Cox DR. Principles of Statistical Inference. Cambridge: Cambridge University Press; 2006. [DOI] [Google Scholar]

[bib-019] 19.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. London: Chapman & Hall; 2003. [Google Scholar]

[bib-020] 20.Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025–35. doi: 10.1093/genetics/162.4.2025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-021] 21.Marjoram P, Molitor J, Plagnol V, Tavare S. Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2003;100:15324–8. doi: 10.1073/pnas.0306899100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-022] 22.Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf Michael P H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009;6:187–202. doi: 10.1098/rsif.2008.0172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-023] 23.Liepe J, Taylor H, Barnes CP, Huvet M, Bugeon L, Thorne T, Lamb JR, Dallman MJ, Stumpf Michael P H. Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr Biol (Camb) 2012;4:335–45. doi: 10.1039/c2ib00175f. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-024] 24.Chan YL, Anderson, Christian N K, Hadly EA. Bayesian estimation of the timing and severity of a population bottleneck from ancient DNA. PLoS Genet. 2006;2:e59. doi: 10.1371/journal.pgen.0020059. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-025] 25.Hickerson MJ, Stahl E, Takebayashi N. msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC Bioinformatics. 2007;8:268. doi: 10.1186/1471-2105-8-268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-026] 26.Wegmann D, Leuenberger C, Excoffier L. Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics. 2009;182:1207–18. doi: 10.1534/genetics.109.102509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-027] 27.Rosvold J, Røed KH, Hufthammer AK, Andersen R, Stenøien HK. Reconstructing the history of a fragmented and heavily exploited red deer population using ancient and contemporary DNA. BMC Evol Biol. 2012;12:191. doi: 10.1186/1471-2148-12-191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-028] 28.Soubeyrand S, Carpentier F, Guiton F, Klein EK. Approximate Bayesian computation with functional statistics. Stat Appl Genet Mol Biol. 2013;12:17–37. doi: 10.1515/sagmb-2012-0014. [DOI] [PubMed] [Google Scholar]

[bib-029] 29.Nunes MA, Balding DJ. On optimal selection of summary statistics for approximate Bayesian computation. Stat Appl Genet Mol Biol. 2010;9 doi: 10.2202/1544-6115.1576. Article 34. [DOI] [PubMed] [Google Scholar]

[bib-030] 30.Jung H, Marjoram P. Choice of summary statistic weights in approximate Bayesian computation. Stat Appl Genet Mol Biol. 2011;10 doi: 10.1111/j.1467-9868.2011.01010.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-031] 31.Fearnhead P, Prangle D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J R Stat Soc Series B Stat Methodol 2012, 74:419–74. doi: 10.1534/genetics.112.143164. [DOI] [Google Scholar]

[bib-032] 32.Aeschbacher S, Beaumont MA, Futschik A. A novel approach for choosing summary statistics in approximate Bayesian computation. Genetics. 2012;192:1027–47. doi: 10.1515/sagmb-2012-0050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-033] 33.Nakagome S, Fukumizu K, Mano S. Kernel approximate Bayesian computation in population genetic inferences. Stat Appl Genet Mol Biol. 2013;12:667–78. doi: 10.1073/pnas.0607208104. [DOI] [PubMed] [Google Scholar]

[bib-034] 34.Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2007;104:1760–5. doi: 10.1038/nprot.2014.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-035] 35.Liepe J, Kirk P, Filippi S, Toni T, Barnes CP, Stumpf Michael P H. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat Protoc. 2014;9:439–56. doi: 10.1515/sagmb-2012-0043. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-036] 36.Silk D, Filippi S, Stumpf Michael P H. Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. Stat Appl Genet Mol Biol. 2013;12:603–18. doi: 10.1515/sagmb-2012-0069. [DOI] [PubMed] [Google Scholar]

[bib-037] 37.Filippi S, Barnes CP, Cornebise J, Stumpf Michael P H. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat Appl Genet Mol Biol. 2013;12:87–107. doi: 10.1093/bioinformatics/btp619. [DOI] [PubMed] [Google Scholar]

[bib-038] 38.Toni T, Stumpf Michael P H. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics. 2010;26:104–10. doi: 10.1534/genetics.109.109058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-039] 39.Leuenberger C, Wegmann D. Bayesian computation and model selection without likelihoods. Genetics. 2010;184:243–52. doi: 10.1111/j.1365-294X.2010.04783.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-040] 40.Peter BM, Wegmann D, Excoffier L. Distinguishing between population bottleneck and population subdivision by a Bayesian model choice procedure. Mol Ecol. 2010;19:4648–60. doi: 10.1371/journal.pgen.1001036. [DOI] [PubMed] [Google Scholar]

[bib-041] 41.Morelli G, Didelot X, Kusecek B, Schwarz S, Bahlawane C, Falush D, Suerbaum S, Achtman M. Microevolution of Helicobacter pylori during prolonged infection of single hosts and within families. PLoS Genet. 2010;6:e1001036. doi: 10.1111/j.1755-0998.2012.03153.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-042] 42.Estoup A, Lombaert E, Marin J, Guillemaud T, Pudlo P, Robert CP, Cornuet J. Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics. Mol Ecol Resour. 2012;12:846–55. doi: 10.1098/rsif.2012.0220. [DOI] [PubMed] [Google Scholar]

[bib-043] 43.Thorne T, Stumpf Michael P H. Graph spectral analysis of protein interaction network evolution. J R Soc Interface. 2012;9:2653–66. doi: 10.1371/journal.pcbi.1002835. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-044] 44.Ratmann O, Donker G, Meijer A, Fraser C, Koelle K. Phylodynamic inference and model assessment with approximate bayesian computation: influenza as a case study. PLoS Comput Biol. 2012;8:e1002835. doi: 10.1111/mec.12264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-045] 45.Mardulyn P, Goffredo M, Conte A, Hendrickx G, Meiswinkel R, Balenghien T, Sghaier S, Lohr Y, Gilbert M. Climate change and the spread of vector-borne diseases: using approximate Bayesian computation to compare invasion scenarios for the bluetongue virus vector Culicoides imicola in Italy. Mol Ecol. 2013;22:2456–66. doi: 10.1214/11-BA602. [DOI] [PubMed] [Google Scholar]

[bib-046] 46.Didelot X, Everitt RG, Johansen AM, Lawson DJ. Likelihood-free estimation of model evidence. Bayesian Analysis. 2011;6:49–76. doi: 10.1073/pnas.1102900108. [DOI] [Google Scholar]

[bib-047] 47.Robert CP, Cornuet J, Marin J, Pillai NS. Lack of confidence in approximate Bayesian computation model choice. Proc Natl Acad Sci USA. 2011;108:15112–7. doi: 10.1007/s11222-012-9335-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-048] 48.Barnes CP, Filippi S, Stumpf MPH, Thorne T. Considerate approaches to constructing summary statistics for ABC model selection. Statistics and Computing. 2012;22:1181–97. doi: 10.1515/sagmb-2013-0012. [DOI] [Google Scholar]

[bib-049] 49.Prangle D, Fearnhead P, Cox MP, Biggs PJ, French NP. Semi-automatic selection of summary statistics for ABC model choice. Stat Appl Genet Mol Biol. 2014;13:67–82. doi: 10.1111/biom.12081. [DOI] [PubMed] [Google Scholar]

[bib-050] 50.Drovandi CC, Pettitt AN. Bayesian experimental design for models with intractable likelihoods. Biometrics. 2013;69:937–48. doi: 10.1073/pnas.0807882106. [DOI] [PubMed] [Google Scholar]

[bib-051] 51.Ratmann O, Andrieu C, Wiuf C, Richardson S. Model criticism based on likelihood-free inference, with an application to protein network evolution. Proc Natl Acad Sci USA. 2009;106:10576–81. doi: 10.1073/pnas.1208827110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-052] 52.Mengersen KL, Pudlo P, Robert CP. Bayesian computation via empirical likelihood. Proc Natl Acad Sci USA. 2013;110:1321–6. doi: 10.1371/journal.pcbi.1002888. [DOI] [PMC free article] [PubMed] [Google Scholar]; http://f1000.com/prime/718441389

[bib-053] 53.Liepe J, Filippi S, Komorowski M, Stumpf Michael P H. Maximizing the information content of experiments in systems biology. PLoS Comput Biol. 2013;9:e1002888. doi: 10.1073/pnas.1017972108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-054] 54.Barnes CP, Silk D, Sheng X, Stumpf Michael P H. Bayesian design of synthetic biological systems. Proc Natl Acad Sci USA. 2011;108:15190–5. doi: 10.1021/es1030432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib-055] 55.Rieckermann J, Anta J, Scheidegger A, Ort C. Assessing wastewater micropollutant loads with approximate Bayesian computations. Environ Sci Technol. 2011;45:4399–406. doi: 10.1021/es1030432. [DOI] [PubMed] [Google Scholar]

PERMALINK

Approximate Bayesian inference for complex ecosystems

Michael PH Stumpf

Abstract

Introduction

Model calibration and parameter estimation

Approximate Bayesian computation

Model selection and checking

Applicability of approximate Bayesian computation: an outlook

Abbreviations

Disclosures

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Approximate Bayesian inference for complex ecosystems

Michael PH Stumpf

Abstract

Introduction

Model calibration and parameter estimation

Approximate Bayesian computation

Model selection and checking

Applicability of approximate Bayesian computation: an outlook

Abbreviations

Disclosures

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases