Abstract
We describe a strategy for Markov chain Monte Carlo analysis of non-linear, non-Gaussian state-space models involving batch analysis for inference on dynamic, latent state variables and fixed model parameters. The key innovation is a Metropolis-Hastings method for the time series of state variables based on sequential approximation of filtering and smoothing densities using normal mixtures. These mixtures are propagated through the non-linearities using an accurate, local mixture approximation method, and we use a regenerating procedure to deal with potential degeneracy of mixture components. This provides accurate, direct approximations to sequential filtering and retrospective smoothing distributions, and hence a useful construction of global Metropolis proposal distributions for simulation of posteriors for the set of states. This analysis is embedded within a Gibbs sampler to include uncertain fixed parameters. We give an example motivated by an application in systems biology. Supplemental materials provide an example based on a stochastic volatility model as well as MATLAB code.
Keywords: Bayesian computation, Forward filtering, backward sampling, non-linear state-space model, Regenerating mixture procedure, Smoothing in state-space models, Systems biology
1 Introduction
Motivated by problems of fitting and assessing structured, non-linear dynamic models to time series data arising in studies of cellular networks in systems biology, we revisit the problem of Bayesian inference on latent, time-evolving states and fixed model parameters in a state-space model context. In recent years, this general area has seen development of a number of customized Monte Carlo methods, but existing approaches do not yet provide the kind of comprehensive, robust, generally and automatically applicable computational methods needed for repeat batch analysis in routine application. The above criteria for effective statistical computation are central to our primary motivating context of dynamic cellular networks, an applied field that is beginning to grow rapidly as relevant high-resolution time series data on genetic circuitry becomes increasingly available in single-cell and related studies (Rosenfeld et al., 2005; Golightly and Wilkinson, 2005; Wilkinson, 2006; Wang et al., 2009) and synthetic bioengineering (Tan et al., 2007). With this motivation, we have built on the best available analytic and Monte Carlo tools for non-linear dynamic models to generate a novel adaptive mixture modelling method that is embedded in an overall MCMC strategy for posterior computation; the resulting methodology satisfies the above criteria while also being computationally efficient, and has generated most effective MCMC analyses from the viewpoint of convergence in a range of example models we have studied.
Given a specified model and a series of observations over a fixed time interval, we are interested in summaries of the posterior distribution for the full series of corresponding state vectors as well as fixed model parameters; this is the batch analysis. Sequential filtering and retrospective smoothing using Monte Carlo is central to our general, non-linear forward-filtering, backward sampling (FFBS) approach that extends the profoundly useful FFBS methodology of conditionally linear, normal models introduced in Carter and Kohn (1994) and Frühwirth-Schnatter (1994); see also West and Harrison (1997), chapter 15. Filtering is the canonical setting for sequential particulate methods (West, 1992; Gordon et al., 1993; West, 1993b; Chen and Liu, 2000; Doucet et al., 2001; Liu and West, 2001); retrospective smoothing analysis has been explored in such contexts, in terms of both marginal (Kitagawa, 1996; Hürzeler and Künsch, 1998; Doucet et al., 2000) and joint (Godsill et al., 2004) smoothing approaches. We do not, however, use particle filtering methods; though useful and interesting, the well-known shortcomings of these methods, including the key issue of particle attrition, are currently simply limiting from the viewpoint we have of robust, stable, and automatically applicable methods for a range of non-linear models. Previous approaches using MCMC in non-linear non-Gaussian state-space models include Carlin et al. (1992), Geweke and Tanizaki (1999), and Stroud et al. (2003). In particular, Stroud et al. (2003) suggest sampling latent states in blocks from an auxiliary mixture model for use as a Metropolis proposal. The main difference of our method is that our MCMC scheme does not condition on mixture component indicators. Closer to our perspective is Ravines et al. (2007) who, in the class of dynamic generalized linear models (West and Harrison, 1997, chapter 14), develop a global Metropolis-Hastings analysis where the proposal distribution for state vectors are generated from analytic approximations to filtering and smoothing distributions that are known to be accurate, and hence can be expected to lead to reasonable acceptance rates. We develop this perspective using adaptive mixture approximations to filtering distributions that apply widely. This leads to accurate, direct analytic approximations to the smoothed distributions for the full set of states in a batch analysis, and hence to effective Monte Carlo that uses these approximations as proposal distributions. This sampling strategy for latent states is embedded in an overall MCMC that couples in samplers for fixed model parameters to define a complete analysis.
Section 2 introduces the state-space model context, focusing on non-linear models with additive Gaussian noise. Section 3 reviews mixture model approximations in state-space models, and develops a regenerating procedure to improve the utility of Gaussian sum mixtures in adaptive mixture modelling for analytic approximation to sequential filtering and smoothing. Section 4 then embeds the mixture analysis in an overall MCMC as the novel Metropolis proposal method for latent states. Section 5 illustrates the analyses with non-linear models relevant in systems biological studies of dynamic cellular networks. Section 5.2 provides a full Bayesian analysis of a model with fixed parameters and a two-dimensional state vector in a systems biology example. Comments on, and comparisons with, prior methods are included throughout. Section 6 provides summary comments.
2 State-Space Model and FFBS Analysis
Begin with the Markovian state-space model (West and Harrison, 1997)
| (1) |
where xt is the unobserved state of the system at time t, yt is the observation at time t, ft(·) and gt(·) are known, non-linear observation and evolution functions, and νt ~ N (0, Vt) and ωt ~ N (0, Wt) are independent and mutually independent Gaussian terms, the observation and evolution noise terms, respectively. Initially we assume any model parameters in the non-linear functions or noise variances are known. We note that the development can be extended well beyond our focus here on additive, normal noise terms, though for specificity we focus on that structure here.
We use s:t to denote consecutive times s, s + 1, …, t for any s, t>s, so that xs:t = {xs, …, xt} and so forth. Based on the batch of data y1:T, the main goal is simulation of the set of states x0:T from the implied posterior
| (2) |
where p(x0) is the density of the initial state and p(yt|xt) and p(xt|xt−1) are defined by equation (1). This is done using the FFBS strategy:
FF: For each t = 1:T in sequence, sequentially process the datum yt to update numerical summaries of the filtering densities p(xt|y1:t) at time t.
-
BS: Simulate the joint distribution in equation (2) via the implied backward compositional form
(3) That is:
draw xT ~ p(xT |y1:T) and set t = T;
draw xt−1 ~ p(xt−1|xt, y1:(t−1));
reduce t to t − 1 and return to step b; stop when t = 0.
This generates the full joint sample x0:T in reverse order. All steps of FFBS depend fundamentally on the structure of the joint densities
| (4) |
In particular, filtering relies on the ability to compute and summarize
| (5) |
while backward sampling relies on the ability to simulate from
| (6) |
the bivariate margin of equation (4). In linear, Gaussian models, these distributions are all Gaussian; in non-linear models, the implied computations require some form of approximation.
3 Normal Mixture Model Approximations
3.1 Background and Notation
Our strategy is based on approximation of the sequentially updated distributions of states via mixtures of many, very precise normal components. Mixtures have been used broadly in dynamic modelling, for both model specification and computational methods, especially in adaptive multi-process models and to represent model uncertainty in terms of multiple models analyzed in parallel (chapter 12 and references in West and Harrison, 1997; Fearnhead and Meligkotsidou, 2007). The basic idea of normal mixture approximation in non-linear state-space models in fact goes back several decades to at least Harrison and Stevens (1971) in statistics and Sorenson and Alspach (1971) in engineering, the latter using the term Gaussian sum for direct analytic approximations to non-linear models; see also Alspach and Sorenson (1972) and Harrison and Stevens (1976). Our method here is a direct extension of the original Gaussian sum approximation idea now embedded in the Markov chain Monte Carlo framework. The approach builds on the concept of using mixtures of many precise normal components to approximate sequences of posterior distributions for sets of states as the conditioning data is updated; in essence, this revisits and revises earlier adaptive importance sampling approaches (West, 1992, 1993a,b) to be based on far more efficient – computationally and statistically – Metropolis accept/reject methods.
By way of notation we denote a normal mixture distribution for a random variate z by
where N(μ, σ2) indicates a Gaussian distribution with mean μ and standard deviation σ.
3.2 Mixtures in State-Space Models
Suppose at time t − 1 the density p(xt−1|y1:(t−1)) is – either exactly or approximately – given by
Then the key trivariate density of equation (4) is
| (7) |
Suppose that component variances Ct−1,1:J are very small relative to the variances Vt, Wt and inversely related to the local gradients of the regression and evolution functions ft(·), gt(·) in equation (1); this generally requires a large value of J. Then variation of component j of the summand in equation (7) is heavily restricted to the implied, small region around xt−1 = mt−1,j and we can accurately approximate gt(·) and ft(·) with local linearization valid in that small region. The two lead terms in summand j are replaced by the local normal, linear forms with at,j = gt(mt−1,j) and with ft,j = ft(at,j). This immediately reduces equation (7) to a mixture of trivariate normals, so that all marginals and conditionals are computable as normal mixtures. In particular, the key distributions for filtering and smoothing are:
The equation (5) for forward filtering:
| (8) |
having elements mt,j = at,j+At,j(yt−ft,j) and where and . The component probabilities are updated via pt,j ∝ pt−1,jN (yt|ft,j, Qt,j).
The equation (6) for backward sampling:
| (9) |
having elements ht,j = mt−1,j + Bt,j(xt − at,j) and where ; the component probabilities are qt,j ∝ pt−1,jN (xt|at,j, Rt,j).
For large J and small enough Ct−1,j, the implied filtering computations will provide good approximations to the true model analysis, and have been used quite widely in applications in engineering and elsewhere for some years. Smoothing computations based on the approximations are direct, but have been less widely used and exploited to date. Our strategy is to embed these mixture computations in an overall MCMC, using them to define a global Metropolis proposal distribution for p(x0:T |y1:T). As a nice by-product, the observed Metropolis acceptance rates also provide an indirect assessment of the adequacy of the Gaussian method as a direct analytic approximation, though our interest is its use in obtaining exact posterior samples.
3.3 Two-State Example
To clearly fix ideas, Figure 1 shows aspects of an example with gt(x) = 0.1x3 + sin(5x), Wt = 0.2 and in which (xt−1|y1:(t−1)) is a J = 50 component mixture with resulting density graphed in the figure. The comparison between the bivariate contours of the exact and mixture approximation of p(xt, xt−1|y1:(t−1)) demonstrates the efficacy of the method in this highly non-linear model. Evidently, the mixture model is an accurate representation of the true non-linear model, though the joint mixture density is in fact very slightly more diffuse – a good attribute from the viewpoint of the goal of generating a useful Metropolis proposal distribution. Approximation accuracy increases with the number of normal mixture components used so long as the variances Ct−1,j fall off appropriately as J increases, as we discuss further below.
Figure 1.
Bivariate and marginal distributions for two-states in a model with gt(x) = 0.1x3 + sin(5x) and Wt = 0.2, and where p(xt−1|y1:(t−1)) is a J = 50 component mixture with density graphed on the horizontal axis of each frame. The left pane shows exact bivariate density contours and, on the vertical axis, the implied margin for (xt|y1:(t−1)). The right pane shows the corresponding 50-component mixture approximations. In each frame, the dashed line shows the evolution function gt(·). (A color version of this figure is available in the electronic version of this article.)
3.4 Regenerating Mixtures
The basic mixture approximation strategy can work well when the component means are spread out, the component variances are small, and the component probabilities are approximately equal. The component means being spread out leads to desirable wide-ranging evaluations of the non-linear evolution and observation functions. Very small component variances improve the validity of the local linear approximations to the non-linear functions over increasingly small regions. Balanced component probabilities ensures that each component contributes to the mixture after propagation through the non-linearities; if only a few components dominate, all other components are effectively irrelevant and the overall strategy will collapse.
These properties are explicitly maintained using a novel regenerating procedure shown in Table 1. Suppose we wish to approximate an arbitrary density p(x) using an equally-weighted mixture of Gaussians with means set at the J+1-quantiles of p(x) and component variances constant and chosen so that the variance of the mixture equals that of p(x). For large J, this satisfies the above desiderata for mixture approximations in our context, and this idea is used to map any given mixture distribution to one with any number of components with these characteristics.
Table 1.
Regenerating procedure to approximate an arbitrary density p(x) with a mixture of Gaussians Nm(p1:J, m1:J, C1:J).
|
1 Set pj = 1/J for j ∈ {1, …, J}. 2 Set mj equal to the j/J +1 quantile of p(x) for j ∈ {1, …, J }. That is 3 Set Cj = C such that where . |
In our model of equation (1), suppose an initial prior p(x0) is defined as a mixture, either directly or by applying the regeneration procedure in Table 1 to an original prior and using a large value J. We proceed through the sequential updating analysis, now at each step using the regenerating procedure when necessary to revise, balance and hence improve the overall adequacy of the approximation at each stage. Depending on the model and data, this regeneration may be needed to approximate the prior p(xt|xt−1, y1:(t−1)) and posterior p(xt|y1:t) at each step. Although equation (7) is not satisfied in these cases, equations (8) and (9) are still relevant and therefore a proposal can be drawn and the Metropolis acceptance probability computed.
4 Metropolis MCMC
4.1 Adaptive Mixture Model Metropolis for States
The mixture modelling strategy defines a computationally feasible method for evaluating and sampling from a useful approximation to the full joint posterior density of states of equation (2) and, in reverse form, equation (3). Forward filtering computations apply to sequentially update the mixture forms p(xt|y1:t) over t = 1:T using equation (8) and the regenerating procedure. This is followed by backward sampling over t = T, T − 1, …, 0 using the mixture forms of equation (9). Write q(x0:T |y1:T) for the implied joint density of states from this analysis; that is, q(·|y1:T) has the form of the reverse equation (3) in which each p(·| ·) is replaced by the corresponding mixture density.
We treat the analysis via Metropolis-Hasting MCMC analysis. With a current sample of states x0:T, apply the FFBS to generate a new, candidate draw from the proposal distribution with density q(x0:T |y1:T). This is assessed via the standard accept/reject test, accepting with probability
| (10) |
where w(·) = p(·|y1:T)/q(·|y1:T), p(·|y1:T) is given by equation (1), and is given by equation (9). With no unknown fixed parameters, q(x0:T |y1:T) is known from the previous MCMC iteration, otherwise the densities to evaluate q(x0:T |y1:T) will need to be calculated by repeating equation (9) for x0:T.
This is a global MCMC, applying to the full set of consecutive states, that will generally define an ergodic Markov chain on the x0:T based on everywhere-positivity of both p(x0:T |y1:T) and q(x0:T |y1:T). As q is expected to provide a good global approximation, the resulting MCMC can be expected to perform well, and as mentioned above the acceptance rates provide some indication of the accuracy of the approximation. Evidently, acceptance rates can generally be expected to decay with increasing time series length T. Experiences of Ravines et al. (2007) in the simpler DGLM context, of our own group in this and related model contexts, bear out the utility of the method. Some additional comments and numerical comparisons of acceptance rates appear below. The overall procedure for latent state sampling is termed adaptive mixture modelling Metropolis method (AM4) and is provided in Table 2. In this algorithm, regenerate refers to the procedure in Table 1. Depending on the model and prior, these regeneration procedures may be unnecessary.
Table 2.
AM4 algorithm to sample from the full posterior of states given in (2) for the model of equation (1).
|
4.2 Combined MCMC for States and Fixed Model Parameters
Practical applications involve models with fixed, uncertain parameters as well as the latent states, and a complete analysis embeds the above simulator for states within an overall MCMC that also includes parameters (e.g., see West and Harrison, 1997, section 15.2). With a vector of parameters θ, extend the model notation to
where, now, the initial prior p(x0|θ) may involve elements of θ as may the variances Vt, Wt (one key case being constant, unknown variances that are then elements of θ).
The overall computational strategy is then to apply the above state sampler at each stage of an overall MCMC conditional on θ, and to couple this with sampling of θ values using the implied distribution p(θ|x0:T, y1:T) at each step of the chain. Since θ is changing at each step in the MCMC, the filtered distributions, i.e. component probabilities, means, and variances in equation (8), are recomputed at each iteration of the MCMC for joint sampling of x0:T. Depending on the model form and priors specified for θ, sampling fixed parameters will typically be performed in terms of a series of Gibbs sampling steps, perhaps with some blocking of subsets of parameters. Under a specified prior p(θ), the complete conditional posterior for any subset of elements θi given the remaining elements θ−i is
| (11) |
Sometimes this conditional can be sampled directly; a key example is when V = Vt, W = Wt and θi = (V, W) when, under independent inverse gamma priors, the above conditional posterior is also the product of independent inverse gammas. In other cases, resampling some elements θi will use random-walk Metropolis-Hastings methods involving an accept/reject test to resample θi, i.e., a standard Metropolis-within-Gibbs series of moves. So long as the prior density p(θ) that can be directly and easily evaluated up to a constant, such moves are easy to implement since the terms in equation (11) can be trivially evaluated at any point θ. Our example in Section 5.2 illustrates this overall strategy.
5 Examples
5.1 Illuminating Example
Consider the example model with
| (12) |
where Vt = V = 10, Wt = W = 1 and, initially, x0 ~ N (0, 10). This model was originally introduced by Andrade Netto et al. (1978) and has since been studied by Kitagawa (1987); West (1993b); Gordon et al. (1993); Hürzeler and Künsch (1998); Doucet et al. (2000). Interest in this model has stemmed from the non-linear nature found in both the observation and evolution equations. In addition, the squared term in the observation equation introduces a bimodal likelihood for xt whenever yt > 0. As will be seen, this causes multi-modality in the resulting smoothed distribution for the states.
Simulated states and observations with T = 100 appear in Figure 2. Using J = 1, 000 and regenerating p(xt|xt−1, y1:(t−1)) and p(xt|y1:t) at each t, the resulting filtering distributions are shown for selected time points in Figure 3. This figure shows filtering densities that display markedly non-Gaussian behavior, the bimodality being induced by the lack of identification of the sign of xt from the data alone.
Figure 2.
Model of equation (12): Simulated latent states and observations.
Figure 3.
Model of equation (12): Filtering densities for selected time points that display markedly non-Gaussian behavior.
AM4 applied to the 101−dimensional set of states x0:T generated 100,000 samples following 100,000 burn-in steps, achieving acceptance rates of around 20%. Figure 4 shows histograms of the sampled states at the same time points in Figure 3, again evidence of the high degree of non-Gaussianity.
Figure 4.
Model of equation (12): Histograms of the smoothed samples for the same time points shown in Figure 3.
Figure 5 compares AM4 to the standard MCMC method of Carlin et al. (1992), where univariate states are sampled, and Kitagawa (1987), which evaluates the smoothed density on a grid. For state x58 all three methods achieve equivalent results, but for state x72 the Carlin et al. method never visits the left-most mode of the distribution. Therefore the autocorrelations for Carlin et al. are misleadingly low.
Figure 5.
Model of equation (12): Traceplots, autocorrelation functions, and histograms for two different states comparing AM4 (‘o’), Carlin et al. (1992) (‘+’), and Kitagawa (1987) (‘−’). (A color version of this figure is available in the electronic version of this article.)
More interesting are the bivariate smoothing distributions for (xt, xt−1|y1:T) for each t, obtained simply from marginal samples from the MCMC. Figure 6 shows a scatterplot of the posterior samples for x71 versus x72, while Figure 7 shows a smoothed density estimate for this same data. These two figures display a high degree of non-Gaussian behavior and the reconstructions are simply not obtainable under standard linearization methods or easily, if at all, via other numerical approaches. Standard MCMC methods such as sampling univariate states or a multivariate random walk have difficulty escaping the modes shown in Figure 7. Particle filtering approaches such as Godsill et al. (2004) can produce results with similar multimodal posteriors. But, for models with unknown fixed parameters such as those to follow, these methods quickly suffer from the added dimensionality.
Figure 6.
Model of equation (12): Scatterplot of MCMC samples for x71:72.
Figure 7.
Model of equation (12): Reconstruction of bivariate density p(x71,72|y1:100).
5.2 Example Motivated by Pathway Studies in Systems Biology
Discrete-time models have been gaining popularity for modelling biochemical pathways. Initial uses were aimed at approximate parameter inference of stochastic differential equations (Golightly and Wilkinson, 2005, 2006b,a; Wilkinson, 2006). More recently discrete-time models have been suggested as alternatives to ordinary and stochastic differential equations (Gadkar et al., 2005a,b; He et al., 2007, 2008). Here we analyze a discrete-time model of a biological system. Derivation of this model can be found in Niemi (2009).
Consider a biological system that has two proteins of interest: an activator and a target. Two molecules of the activator and two molecules of the target can combine to form a tetramer. This tetramer can then enhance the production of the target. This type of system is very common in human cells, and particularly in gene pathways that control cell developmental processes that play key roles in cancer when deregulated (Sears et al., 1997; Nevins, 1998; Bild et al., 2006). Activators themselves often have an oscillating pattern – related to progression through cell cycles. As a synthetic example that mirrors this structure, suppose that we obtain noisy measurements of the activator and target proteins
| (13) |
where it ∈ {on,off}, again independent and mutually independent, and p(x0|D0) = N (1.5, 0.5); here at and xt indicate mean fluorescence levels of the activator and target proteins, respectively, at time t. The tetramer binding of these proteins is represented through the a2x2 term since the exponents are determined from the number of molecules of each component. To recreate the oscillating pattern seen in activators, experimental conditions are controlled such that the activator can be modeled as an autoregressive process with two distinct means, μon, and μoff, where the state it is known at all time points. Figure 8 provides a pictorial representation of the experimental setup and synthetic gene circuit.
Figure 8.

Model of equation (13): Depiction of a tetramer experiment. A. Experimental setup with switched glass burets containing chemical solutions to control the on-off state of the activator. The desired solution flows into a channeled microscopy slide. Bacteria containing plasmids with a synthetic gene circuit are adhered to the gray area of the channel. B. A synthetic gene circuit containing three chemical species: an activator, a target, and a tetramer, A2X2. Arrows indicate that the activator and target are produced, degraded, and can form tetramers. Tetramers can decay back to the activator and target or can enhance the production of the target.
Typically, the evolution equation is addressed separately through their production and decay functions. Consider first the evolution equation for the target protein, ψx indicates that from one time point to the next 100(1 − ψ)% of the protein decays on average; decay is linear in x and independent of the activator. On the other hand, the production function, , has a logistic form in xt and also varies depending on the activator. This logistic form implies a minimal level of target protein production to be k/β and a maximal amount to be α. Figure 9 shows examples for high, medium, and low levels of the activator. The figure is easiest to interpret by choosing an x and then looking at whether the production line is above or below the decay line. The increase or decrease in x will be, on average, proportional to the difference between these two lines. The autoregressive nature of the activator’s evolution equation is a simple reparameterization of constitutive production, μi(1 − φ), and decay, (1 − φ), linear in a. This parameterization has the interpretation that μi is the steady-state mean of the activator under the on and off experimental conditions which can be accurately estimated in steady-state experiments.
Figure 9.
Model of equation (13): Production and decay functions for various levels of the activator. (A color version of this figure is available in the electronic version of this article.)
Of particular interest in these systems is the true level of the target since it may affect genes downstream in the overall biological pathway. In order to provide accurate estimates of the target protein, we need to account for uncertainty present in fixed parameters as well as the activator level. The analysis performed through MCMC is decomposed into Gibbs and Metropolis steps. The steps are all univariate with the exception of the draws for the latent states of the activator a0:T and the target x0:T. With appropriate priors, most fixed parameters are available as Gibbs steps. The full conditional for β is unavailable and hence we use a random walk Metropolis. The joint draw for a0:T is available through the standard FFBS augmented with a Metropolis-Hastings step to account for the target’s evolution equation. The diagonal evolution error structure allows for sampling x0:T through AM4.
Informative priors are used for all fixed parameters either to truncate the parameters to reasonable regions or to provide information on plausible biological knowledge. For example, parameters in the evolution of the target protein are products of chemical kinetic reaction parameters. These parameters are all positive and therefore their products are also positive. Truly informative priors are provided for some parameters including steady-state means of the activator in the on and off states, which could be measured with accuracy in steady-state experiments.
The full MCMC analysis was performed using J = 10 mixture components and regeneration of both the prior and posterior at each time step and saving 50,000 iterations after 5,000 burn-in. The Metropolis steps achieved acceptance probabilities of 34%, 46%, and 8% for β, x0:T, and a0:T respectively. Figure 10 provides posterior marginal histograms for fixed model parameters as well as their priors and true values. Figure 11 provides posterior median and pointwise 95% credible intervals for the activator and target protein. In more realistic scenarios, many of the proteins of interest may have no observations. In these situations, the methodology will work, but generally more information will need to be provided through the priors for meaningful inferences.
Figure 10.
Model of equation (13): Histograms of marginal posterior estimates (shaded histogram), prior (superimposed curve), and true value (‘x’) for fixed model parameters. (A color version of this figure is available in the electronic version of this article.)
Figure 11.
Model of equation (13): Pointwise median (solid line) and 95% credible interval (dashed line) results for the underlying state (dots) and the observed data (circles). The activator is shown in the top pane and the target protein in the bottom. (A color version of this figure is available in the electronic version of this article.)
6 Further Discussion
The adaptive mixture modelling Metropolis method developed and exemplified here represents an efficient, effective, and relatively easily implemented computational strategy for Bayesian inference in non-linear state-space models. For implementation, the method as presented requires only availability of first-order derivatives of the evolution and observation equation similar to the extended Kalman filter, coupled with simulation routines. Extensions to non-Gaussian models require the first two derivatives of the log-likelihood. The overall approach represents a nice update on the use of Gaussian mixtures as direct analytic approximations and in a real sense a completion of a line of computational development for state-space models that stretches back nearly forty years. Many old ideas are good ideas, and the simple strategy of “Metropolizing” a global, analytic approximation to the full posterior distribution of a set of states adds a modern computational touch to a good, older idea. Critically, however, the mixture regeneration concept and strategy introduced here is simply fundamental to practical utility, as otherwise mixture approximations can and often will degenerate as discussed. The broader utility of the overall approach is clear once the state sampler is embedded in a larger MCMC that couples in fixed parameter sampling, as illustrated in our examples.
We have compared this method to other standard methods for analysis of state-space models including univariate state sampling and a multivariate random walk. In the example of Section 5.1, the Carlin et al. method failed to sufficiently explore the posterior over the set of states due to getting caught in modes and similar results are obtained with random walk proposals. Even after adjusting for computational time, AM4 outperfomed these standard methods. AM4 is a widely applicable and effective methodology for MCMC analysis of state-space models.
It is worth noting that there are various other uses of adaptive mixture approximations in time series and state-space models (West and Harrison, 1997). One related line of development to note is that of Chen and Liu (2000), who use mixtures of Gaussians to approximate the filtering densities for each xt. They then proceed with a particulate, sequential Monte Carlo approach: conditioning on state particles to induce a linearized system and adjusting the weights to create the filtered densities for xt+1. That approach adds particle filtering uncertainties to the problem. The approach is closely allied to the earlier sequential adaptive importance sampling approach of West (1992, 1993a,b), though replaces the uses of iteratively refined mixture approximations to posteriors for states at each time point with particle filtering. Though based on mixtures, the computational focus and method of Chen and Liu (2000) is clearly very different to our work; again, we generate an accurate, global approximation to the full set of posteriors over states using deterministic mixture approximations, and use it as a global proposal for a Metropolis step within an MCMC framework. This accomplishes fully Bayesian inference for filtering and smoothing analyses in an over-arching framework that includes fixed parameters as well as dynamic states.
On matters of statistical and computational efficiency, we are encouraged by the high acceptance rates in the examples here, and a range of other studies with similar models. A nice by-product of the Metropolis strategy is that the acceptance rates can, in any specific application, be viewed as benchmarks on the inherent accuracy of the underlying analytic mixture-based posterior approximation. The tradeoffs, in terms of statistical versus computational efficiencies, relate to the number of mixture components chosen and the length of the observed time series, among other things. As noted, filtering approximations with mixtures can be made arbitrarily accurate, theoretically, by increasing the number of mixture components, at the cost of cpu time increasing linearly with the number of components. Also, as is true much more widely, Metropolis acceptance rates will decrease roughly linearly on a log scale with the time series sample size; while this generic issue can be addressed with longer MCMC run lengths and sub-sampling – a standard response to decreased acceptance rates –modifying the underlying proposal distribution by simply increasing the number of mixture components helps too. Tables 3 and 4 provide some insights into this, with a set of empirical estimates of acceptance rates and cpu times in a simple simulated model context. Extensions and novel strategies are clearly needed to extend analyses to much longer time series while maintaining acceptance rates at practically useful levels, and some of our current research is exploring some new directions for this.
Table 3.
Empirical mean (sd) % acceptance rates from 100 simulations of the model defined by: ft(x) = x, gt(x) = sin(x), Vt = 1, Wt = 1 and x0 ~ N (0, 10).
| Number of Components | Length of time series | |||
|---|---|---|---|---|
| 10 | 50 | 100 | 500 | |
| 1 | 0 (4) | 15 (5) | 5 (2) | 0 (1) |
| 5 | 62 (8) | 42 (5) | 27 (6) | 2 (1) |
| 10 | 66 (7) | 44 (5) | 28 (5) | 2 (2) |
| 50 | 74 (6) | 46 (10) | 28 (11) | 1 (2) |
| 100 | 79 (12) | 54 (13) | 34 (14) | 2 (3) |
Table 4.
Relative empirical mean (sd) computation time in minutes for 10,000 iterations using the model in Table 3. Analysis was performed using Matlab R2007b on a 3.4 Ghz Intel Pentium 4 processor.
| Number of Components | Length of time series | |||
|---|---|---|---|---|
| 10 | 50 | 100 | 500 | |
| 1 | 0.4 (.0) | 1.7 (.2) | 3.3 (.3) | 15.9 (1.6) |
| 5 | 0.6 (.1) | 2.5 (.3) | 4.9 (.5) | 24.2 (2.5) |
| 10 | 0.6 (.1) | 2.7 (.3) | 5.2 (.6) | 25.7 (2.7) |
| 50 | 0.8 (.1) | 3.7 (.5) | 7.2 (.9) | 36.2 (4.5) |
| 100 | 1.1 (.2) | 5.2 (.8) | 10.2 (1.5) | 51.1 (7.1) |
Code
The web page http://ftp.stat.duke.edu/WorkingPapers/08-21.html provides freely available Matlab code that implements the method described here. This includes support functions and the examples from this paper as templates for other more general models. (The version of this code used to produce this article is available on the JCGS webpage, see Supplemental Materials Section for details.)
Supplementary Material
Acknowledgments
The authors thank the editor and two anonymous reviewers for their constructive suggestions that helped improve this work. We are grateful to Lingchong You and Chee-Meng Tan for discussion of dynamic models in systems biology. We acknowledge support of the National Science Foundation (grants DMS-0342172 and BES- 0625213) and the National Institutes of Health (grants P50-GM081883-01 and NCI U54-CA-112952-01). Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH.
Footnotes
Supplemental materials are available for download via a single zipped file (am4-supplemental.zip). This file contains the following files:
Non-linear non-Gaussian models: This file describes an extension of the AM4 framework for nonlinear non-Gaussian dynamic models. It contains an example based on a stochastic volatility model. (SV.pdf)
MATLAB code for AM4 Folders contain all code and datasets necessary for executing the examples in this paper. A README.txt file contains all details about the code and datasets in those folders.
Contributor Information
Jarad Niemi, Email: jarad@stat.duke.edu.
Mike West, Email: mw@stat.duke.edu.
References
- Alspach DL, Sorenson HW. Non-linear Bayesian estimation using Gaussian sum approximations. IEEE Transactions on Automatic Control. 1972;AC-17:439–448. [Google Scholar]
- Andrade Netto ML, Gimeno L, Mendes MJ. A new spline algorithm for non-linear filtering of discrete time systems. Proceedings of the 4th IFAC Symposium on Identification and System Parmaeter Estimation; 1978. pp. 2123–2130. [Google Scholar]
- Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi M, Harpole D, Lancaster JM, Berchuck A, Olson JA, Marks JR, Dressman HK, West M, Nevins J. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006;439:353–357. doi: 10.1038/nature04296. [DOI] [PubMed] [Google Scholar]
- Carlin BP, Polson NG, Stoffer DS. A Monte Carlo Approach to Nonnormal and Nonlinear State-space Modeling. Journal of the American Statistical Association. 1992;87:493–500. [Google Scholar]
- Carter CK, Kohn R. On Gibbs sampling for state-space models. Biometrika. 1994;81:541–553. [Google Scholar]
- Chen R, Liu JS. Mixture Kalman filters. Journal of the Royal Statistical Society B. 2000;62:493–508. [Google Scholar]
- Doucet A, Godsill S, Andrieu C. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing. 2000;10:197–208. [Google Scholar]
- Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo Methods in Practice. New York: Springer-Verlag; 2001. [Google Scholar]
- Fearnhead P, Meligkotsidou L. Filtering methods for mixture models. Journal of Computational and Graphical Statistics. 2007;16:586–607. [Google Scholar]
- Frühwirth-Schnatter S. Data augmentation and dynamic linear models. Journal of Time Series Analysis. 1994;15:183–202. [Google Scholar]
- Gadkar KG, Gunawan R, Doyle FJ. Iterative approach to model identification of biological networks. BMC Bioinformatics. 2005a;6:155. doi: 10.1186/1471-2105-6-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gadkar KG, Varner J, Doyle FJ. Model identification of signal transduction networks from data using a state regulator problem. IEE Systems Biology. 2005b:2. doi: 10.1049/sb:20045029. [DOI] [PubMed] [Google Scholar]
- Geweke J, Tanizaki H. On Markov Chain Monte Carlo Methods for Nonlinear and Non-Gaussian State-space Models. Communications in Statistics: Simulation and Computation. 1999;28:867–894. [Google Scholar]
- Godsill SJ, Doucet A, West M. Monte Carlo smoothing for nonlinear time series. Journal of the American Statistical Association. 2004;99:156–168. [Google Scholar]
- Golightly A, Wilkinson DJ. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics. 2005;61:781–788. doi: 10.1111/j.1541-0420.2005.00345.x. [DOI] [PubMed] [Google Scholar]
- Golightly A, Wilkinson DJ. Bayesian sequential inference for nonlinear multivariate diffusions. Statistics and Computing. 2006a;16:323–338. [Google Scholar]
- Golightly A, Wilkinson DJ. Bayesian sequential inference for stochastic kinetic biochemical network models. Journal of Computational Biology. 2006b;13:838–851. doi: 10.1089/cmb.2006.13.838. [DOI] [PubMed] [Google Scholar]
- Gordon NJ, Salmond DJ, Smith AFM. Novel approach to non-linear/non-Gaussian Bayesian state estimation. IEE Proceedings Part F: Communications, Radar and Signal Processing. 1993;140:107–113. [Google Scholar]
- Harrison P, Stevens C. A Bayesian approach to short-term forecasting. Operations Research Quarterly. 1971;22:341–362. [Google Scholar]
- Harrison P, Stevens C. Bayesian forecasting. Journal of the Royal Statistical Society B. 1976;38:205–247. [Google Scholar]
- He F, Yeung LF, Brown M. Discrete-time model representation for biochemical pathway systems. IAENG International Journal of Computer Science. 2007:34. [Google Scholar]
- He F, Yeung LF, Brown M. Trends in Intelligent Systems and Computer Engineering. Springer; US: 2008. Discrete-time model representations for biochemical pathways; pp. 255–271. [Google Scholar]
- Hürzeler M, Künsch HR. Monte Carlo approximations for general state-space models. Journal of Computational and Graphical Statistics. 1998;7:175–193. [Google Scholar]
- Kitagawa G. Non-Gaussian State-space Modeling of Nonstationary Time Series (C/R: P1041-1063; C/R: V83 P1231) Journal of the American Statistical Association. 1987;82:1032–1041. [Google Scholar]
- Kitagawa G. Monte Carlo filter and smoother for non-Gaussian non-linear state space models. Journal of Computational and Graphical Statistics. 1996;5:1–25. [Google Scholar]
- Liu J, West M. Combined parameter and state estimation in simulation-based filtering. In: Doucet A, De Freitas J, Gordon N, editors. Sequential Monte Carlo Methods in Practice. Springer-Verlag; New York: 2001. pp. 197–217. [Google Scholar]
- Nevins J. Toward an understanding of the functional complexity of the E2F and Retinoblastoma families. Cell Growth and Differentiation. 1998;9:585–593. [PubMed] [Google Scholar]
- Niemi J. Ph.D. thesis. Duke University; 2009. Bayesian analysis and computational methods for dynamic modeling. [Google Scholar]
- Ravines RR, Migon HS, Schmidt AM. An efficient sampling scheme for dynamic generalized models. Technical Report, Departamento de Métodos Estatísticos #201/2007, UFRJ 2007 [Google Scholar]
- Rosenfeld N, Young J, Alon U, Swain P, Elowitz M. Gene regulation at the single-cell level. Science. 2005;307:1962–1965. doi: 10.1126/science.1106914. [DOI] [PubMed] [Google Scholar]
- Sears R, Ohtani K, Nevins JR. Identification of positively and negatively acting elements regulating expression of the E2F2 gene in response to cell growth signals. Molecular and Cellular Biology. 1997;17:5227–5235. doi: 10.1128/mcb.17.9.5227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sorenson HW, Alspach DL. Recursive Bayesian estimation using Gaussian sums. Automatica. 1971;7:465–479. [Google Scholar]
- Stroud JR, Müller P, Polson NG. Nonlinear State-space Models with State-dependent Variances. Journal of the American Statistical Association. 2003;98:377–386. [Google Scholar]
- Tan C, Song H, Niemi J, You L. A synthetic biology challenge: Making cells compute. Molecular BioSystems. 2007;3:343–353. doi: 10.1039/b618473c. [DOI] [PubMed] [Google Scholar]
- Wang Q, Niemi J, Tan C, You L, West M. Image segmentation and dynamic lineage analysis in single-cell fluorescent microscopy. Synthetic Biology. 2009 doi: 10.1002/cyto.a.20812. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- West M. Modelling with mixtures (with discussion) In: Bernardo J, Berger J, Dawid A, Smith A, editors. Bayesian Statistics. Vol. 4. Oxford: University Press; 1992. pp. 503–524. [Google Scholar]
- West M. Approximating posterior distributions by mixtures. Journal of the Royal Statistical Society B. 1993a;54:553–568. [Google Scholar]
- West M. Mixture models, Monte Carlo, Bayesian updating and dynamic models. Computing Science and Statistics. 1993b;24:325–333. [Google Scholar]
- West M, Harrison J. Bayesian Forecasting and Dynamic Models. 2 New York: Springer-Verlag; 1997. [Google Scholar]
- Wilkinson D. Stochastic Modelling for Systems Biology. London: Chapman & Hall/CRC; 2006. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.










