Abstract
An adaptive strategy is proposed for reducing the number of unknowns in the calculation of a proposal distribution in a sequential Monte Carlo implementation of a Bayesian filter for nonlinear dynamics. The idea is to solve only in directions in which the dynamics is expanding, found adaptively; this strategy is suggested by earlier work on optimal prediction. The construction should be of value in data assimilation, for example, in geophysical fluid dynamics.
We consider the following filtering problem. A vector stochastic process x(t) evolves according to an Ito stochastic differential equation:
![]() |
[1] |
where x = x(t), f and w are vectors of dimension n, w is a vector Brownian motion with independent components, and g is a matrix, which in the present article will always be diagonal. A vector y(t), which depends both on x(t) and on a noise, is observed; for simplicity we consider only the case in which the observations are made at discrete times tk = kΔ, k = 1, 2,..., with a fixed time increment Δ, so that
![]() |
[2] |
where yk, wk, and h are q-dimensional vectors with q ≤ n, yk = y(tk), and xk = x(tk), the components of wk are Gaussian variables of unit variance and zero mean, and G is a diagonal matrix. The problem is to estimate x(t) given the sequence of the yk values, i.e., for kΔ ≤ t < (k + 1)Δ, evaluate the conditional expectation E[x(t)|y0,..., yk].
If the system (Eq. 1) is linear and the observations depend linearly on the states, x(t), this problem is solved by the Kalman filter (1). In general, one has to determine the full probability distribution Pk = P[x(t)|ȳk] of the variables x(t) in the time interval kΔ ≤ t < (k + 1)Δ, conditioned by ȳk = (y0, y1,..., yk), and then estimate the most likely values of the xk from that distribution (2).
In principle Pk can be determined recursively. Given Pk at time kΔ, P[x(t)| ȳk] for the interval kΔ ≤ t < (k + 1)Δ (i.e., before the next observation) is determined by Eq. 1. Given the probability distribution P[yk+1|x] for obtaining the measurement yk+1, if x(t) is currently in the state x, Pk+1 can be found from Bayes' formula (see, e.g., ref. 3):
![]() |
[3] |
where P[x,tk+1|ȳk], the “proposal distribution,” is the probability distribution of the xk up to but not including the time (k + 1)Δ. The resulting filter is the Bayesian filter (see, e.g., refs. 2 and 4).
It is natural to implement the Bayesian filter by sequential Monte Carlo (see, e.g., refs. 2 and 5), so that the evaluation of the normalizing integral in the denominator of Eq. 4 is trivial. Find N samples of Pk at time tk; this produces N samples of x(t), usually known as “particles.” Give all particles an equal weight. The evolution of the particles up to time tk+1 is described by the system (Eq. 1); when each weight is multiplied by the probability P[yk+1|x], where x is the location of the corresponding particle at time tk+1, one obtains a representation of Pk+1 at time tk+1. The catch is that the system (Eq. 1) must be solved repeatedly. One such solution may be bad enough for the kind of systems one encounters in practice, and it becomes quickly unacceptable as the number of dimensions increases, for the number of particles required to describe a distribution increases very rapidly as the number of dimensions increases (see, e.g., ref. 6).
What this article proposes is a partial fix that reduces the size of the system of equations to be solved for evolving the particles. This fix is accomplished by adaptively finding subsets of variables that do not need to be recomputed in the repeated evolutions.
We begin the detailed presentation by a standard implementation of the Bayesian filter and apply it to a stochastic Lorenz model. We then explain how to modify it so that one need not compute in “shrinking” directions. We also present a brief survey that relates the construction to earlier work. Applications to partial differential equations, which require an additional simplification, are then described.
Sequential Monte Carlo Implementation of a Bayesian Filter
We now describe a sequential Monte Carlo implementation of a Bayesian filter for a system of stochastic ordinary differential equations and then apply it to a stochastic Lorenz model according to ref. 4; the detailed comparisons of various approximate models in that article can then be used to assess the model presented here. In Eq. 1 the nonrandom part, dx/dt = f(x,t), is a system of ordinary differential equations, with initial data x(t = 0) = x0. The initial probability distribution is P0 = P(t = 0) = δ(x), where δ is the delta function.
We first integrate the stochastic ordinary differential equations in time once, obtaining a sample evolution that will serve as “experimental data.” We shall call this sample “the experiment.” The problem is to reconstruct this experiment given a set of observations. If the distribution Pk is represented by N particles, these particles can be evolved in time until t = tk+1; if the noise in the observations is white noise and the matrix G is diagonal, as we assume, then
![]() |
[4] |
where the Gk are the diagonal terms in G, which we assume to be nonzero and which can then be taken as positive. The new weights of the particles can be obtained by multiplying their weights before the observation by factors deduced from Eq. 4. The new probability distribution after observation is the empirical distribution given by δ functions at the new locations of the particles, with the new weights divided by a normalizing factor so that the sum of the weights equals 1 (for some theory, see ref. 2). The initial data for the further evolution of particles are obtained by resampling this new empirical distribution; this resampling is essential to the convergence of sequential Monte Carlo. We take the new estimate of the location of the experiment to be the mean of the new empirical distribution.
Following ref. 4, we apply this algorithm to the Lorenz model driven by white noise:
![]() |
[5] |
where x, y, and z are the components of the vector x and σ, ρ, β, and g are constants. To simplify comparisons, we use the same values for the constants as ref. 4: σ = 10, ρ = 28, β = 8/3 (the original Lorenz values), and g = 0.5; we also use the same initial vector (–5.91652, –5.52332, 24.5723). (Despite all the digits, this initial vector does not seem to have any special properties.) The time interval between data is Δ = 0.48; the variance of the observation noise is 2.
The numerical integrations were carried out by Euler's method with time step Δt = 0.001. In the present case where the parameter g is a constant, the Euler method is a fully first-order accurate method (see ref. 7) and is basically the method used in ref. 4. Note that for the present purpose the accuracy of the integration method is somewhat irrelevant. We are filtering a discrete process by a discrete filter and trying to show that this discrete filter does a good job. This is unaffected by how close the discrete system is to the continuous system. Nevertheless, the time step is small enough for convergence to have been achieved in the runs below.
In Fig. 1 we display the z component of two sample solutions of Eq. 6 with the parameters above without any filtering. The figure shows that one sample is not a good guide to the behavior of another, and the accurate localization of the experiment requires data and a filter. In Fig. 2 we display an experiment and its localization by the filter just described. These results are fully consistent with what was found in ref. 4 by another implementation of a Bayesian filter.
Fig. 1.
The z component of two trajectories for the stochastic Lorenz system (showing that a filter is needed).
Fig. 2.
The Bayesian filter for the stochastic Lorenz system. Dots denote data points; the dotted line (almost hidden) is the z component of the experiment; the continuous line is the analysis. N = 200.
Dimensional Reduction for a System of Ordinary Differential Equations
Before describing how to reduce the number of variables in the evaluation of a proposal distribution for the filter of the preceding section, we need a few simple facts about differential systems. Consider the system dx/dt = f(x) (Eq. 1 without noise), and let x(t), x(t) + δx(t), start from the initial data x0, x0 + δx0. For small δx we have (d/dt)dx(t) = Jδx(t), where J = (∂f/∂x) is the Jacobian matrix of f evaluated at x(t). Thus, at time δt, δx(δt) = exp(δtJ)δx0 and, therefore,
![]() |
where T is the time-ordering operator (see, e.g., ref. 8). The length of the vector δx is (δx, δx) = (Mδx0, Mδx0) = (δx0, M*Mδx0) (the asterisk denotes a conjugate matrix). The eigenvalues of M*M are the squares of the amounts by which vectors δx0 in the directions of the eigenvectors are stretched and are related to the Liapounov exponents of the system. If an eigenvalue of M*M is <1, the corresponding eigenvector shrinks, and if the eigenvalue is >1, the corresponding vector expands (as in Fig. 3). The matrix M*M depends on the particular trajectory x(t) whose vicinity we examine. We shall refer to the shrinking and expanding directions at any point as the “stable” and “unstable” directions (see Fig. 3).
Fig. 3.
Expanding and contracting directions.
An obvious idea for reducing the dimensionality of the computation of the proposal distribution in the preceding section is to compute only in expanding directions. Suppose the system we are trying to follow has reached a time t1; construct an estimate of the state of the system at that time from the distribution P. Start a sample of the stochastic differential system (Eq. 1) from the estimate at time t1 and run it from t1 to, say, t2 (how to pick t2 – t1 will be discussed below). We call this sample the “test” run. Evaluate the operators J along this sample and use them to find the matrix M. Find the eigenvalues λi and eigenvectors ei of M*M, order them so that λ1 ≥ λ2 ≥ ··· ≥ λn, and form the matrix R, whose adjoint R* has as columns these eigenvectors, R* = (e1,..., en). Note that the stable and unstable directions are stable under perturbation, so that the stable and unstable direction determined from a single sample can be used to find the stable and unstable directions for all particles if t2 – t1 is small enough (9). This is the key point. The interval t2 – t1 may have to be smaller than the interval between observations, and estimates of the system between observations may have to be found from the proposal distribution unmodified by current observations.
Now make the change of variables v = Rx; the system (Eq. 1) without noise becomes
![]() |
[6] |
Pick the most unstable modes (those for which λi >Λ, where Λ< 1 is a positive number chosen by the user); say this inequality holds for i = 1,2,..., m < n. For i > m the components vi should be in narrow intervals near the corresponding components of the experiment we are trying to locate; these n – m components of the particles can be replaced by the corresponding components of the single test flow, and Eq. 6 can be solved only for the vi with i ≤ m. While one is determining R from the test sample one can simultaneously store the vi, i > m, and subsequently substitute them in Eq. 6.
If Eq. 6 is solved by an explicit method, and if the entries of R are rij, one can evaluate the sums ∑i>mrij vi in the evaluation of R*v once for all the components found from the test run. In the evaluation of Rf, one has to evaluate only the first m components. If Eq. 6 is solved by an implicit method with Newton iterations, one has only to solve m by m systems. For a fuller discussion of operation counts, see Conclusions.
The time interval t2 – t1 is to be determined by the user, so that the determination of the stable and unstable directions from the single sample remains valid. If the interval t2 – t1 is short enough for accuracy, one may still weigh the advantages of a shorter interval (less to store) vs. those of a longer interval (fewer eigenvalue calculations).
As an illustration, we display in Fig. 4 a reconstruction of an experiment from data for the Lorenz model, with all parameters as in Fig. 3 but with the filter, by using only a single sampled variable, i.e., with m = 1 in the algorithm just described. Typically, eigenvalues of M*M were found around λ1 = 1.2, λ2 = 0.75, and λ3 = 0.25. Not much loss or gain in accuracy occurs compared with Fig. 3, nor, in this low-dimensional situation, a great gain in computer time. The point is, however, that the lower-dimensional solutions are sufficient to obtain a good estimate.
Fig. 4.
The reduced dimension filter for the stochastic Lorenz system (m = 1). Other parameters are as before.
Before moving on to further examples we endeavor to provide some perspective on the construction of the preceding section by contrasting it with some previous work. We were led to this construction by our earlier work on optimal prediction (10–12). In the simplest optimal prediction scheme one averages the equations of motion for some of the variables over the variables that are omitted; the variables that are kept here are the variables vi, i ≤ m, and the variables averaged over are the other components of v; however, if the distribution of these other components is sharply peaked around their mean values, the averaging and an evaluation at a single sample should be close. Furthermore, the simple optimal prediction scheme works best when the variables that are omitted have a small variance, as we make them have here.
A more complete theory of dimensional reduction (10, 11) reveals that, in addition to averaged terms, the reduced system needs a noise term and a memory or dissipation term to be an accurate approximation. The lack of such a term in the reduced model just described is revealed when the proposal distribution it produces is contrasted with the proposal distribution produced by the full model; the former has lower variance. The appropriate noise and dissipation can be deduced from the theory in ref. 11, but we do not reproduce the argument here because in no example we have run was the narrower distribution inadequate to the task.
One difference between the numerical methods in refs. 11 and 12 and the ones here lies in the assumptions about the model; in those earlier articles we aimed at the solution of problems where the full model was so complex that the evaluation of even one sample evolution was out of reach, whereas here we are assuming that one evaluation can be performed and the corresponding stable and unstable directions can be found.
One can also view the construction proposed here as producing an approximate inertial manifold (13, 14) locally in space and locally in time. This observation is indeed what should be expected; optimal prediction reduces to inertial manifold methods when the omitted variables are functions of the retained variables and there is therefore no noise. Furthermore, the retained variables can be viewed as “determining modes” in the sense of refs. 13 and 15 and Kreiss et al. (16). Kreiss et al. (16) pointed out the connection between data assimilation and the theory of determining modes (see the next section for more comments). The algorithm can also be viewed as an approximate marginalization, or approximate “Rao–Blackwellization” in the sense of ref. 17, with the leading variables chosen adaptively.
The singular values of R have been used before in nonlinear filters and other applications, albeit along different lines (see, e.g., ref. 18).
Filters for Partial Differential Equations
We now apply the construction of the preceding section to partial differential equations in a spectral representation but with data in real space. We shall be working a little with the Burgers equation,
![]() |
but mainly with the Kuramoto–Sivashinski (KS) equation
![]() |
where in both cases u is the unknown function, t is the time, x is the spatial variable, ν is a constant parameter (the viscosity), and subscripts denote differentiation. In both cases we take 0 ≤ x ≤ 2π with periodic boundary conditions. The solution is represented as a Fourier series, leading to the obvious equations for the coefficients ak(t):
![]() |
in the KS case, where Fk is the Fourier representation of the noise, with an analogous equation in the Burgers case. In both cases we assume that the kth component of the noise is Fk = gdw(t)/k2 for k ≠ 0, where g is a constant common to all the components and w(t) is a Brownian motion. The divisor k2 makes the noise structure in space differentiable 1.5 times. The coefficients ak are different from zero for |k| ≤ n.
For simplicity, we limit ourselves to initial data in which the real part of the ak is zero; the ak values are then purely imaginary for all times t; we write ak = ia′k and henceforth drop the prime. The matrix J = ∂f/∂u = (Jjk) for the KS equation is:
![]() |
where aj = –a–j and δ is the Dirac delta, and
![]() |
for the Burgers equation.
We solve the ordinary differential equations for the spectral components ak by a backward Euler scheme (for stability in these stiff problems) followed by the addition of suitable Gaussian variables; we are content here with this first-order scheme for the same reasons as in the preceding section. Note that we just presented an explicit formula for the Jacobian matrix to be used in the iterations for the solution of the implicit equations.
The stochastically driven Burgers equation is not very interesting as a problem to filter because there are neither bifurcations nor chaos, and we display here only one result. In Fig. 5 we display a real-space solution of the Burgers equation with initial data u = sin(x), together with the inverse Fourier transform of the eigenvector of M*M corresponding to the largest eigenvalue; in real space the eigenvector is large in absolute value in those parts of the steepening wave where the wave is stretched, and small where the wave is compressed. The corresponding figure for the KS equation is not as transparent.
Fig. 5.
The leading eigenvector in physical space for the Burgers equation and the corresponding solution. —, The eigenvector; –·–, corresponding solution.
We can further simplify the evolution of the sample solutions when we observe the following (very frequent) conditions: (i) the expanding components are overwhelmingly lowwavenumber components (for an analysis of why this should usually be the case, see, e.g., refs. 13 and 16), and (ii) the dependence of the expanding components on the highfrequency components is weak. Under these conditions one can neglect the components of the expanding modes in the subspace spanned by the modes with k >
, with m <
< n; the matrix R can be taken in the form
![]() |
[7] |
where
is an
by
submatrix and
is the
) by
) identity. The eigenvectors and eigenvalues of only an
by
matrix have to found when the test run is made. The parameters m,
are, of course, problem-dependent.
We now present some results obtained with the resulting filter for the KS equation, with ν = 0.085 as in ref. 19, a value for which the dynamics contain a perturbed homoclinic orbit (20). The initial data are again u = sin(x). n = 28 Fourier modes and a time step of 1·10–3 are needed for convergence, as is consistent with the observations of ref. 19. Typically, three eigenvalues of M*M were >1, and others were <0.5, showing that the dynamics were essentially 3D, although 28 modes were needed for the convergence of the solution of the partial differential equation. We suppose that data are collected in physical space at q equidistant points; in the runs below we chose q = n/2; the observation error is assumed to be Gaussian, with variance 0.125 at all points of observation. The numerical experiment showed that for
> 5 the matrix R has the form (Eq. 7) to a good approximation, and thus is it enough to take m = 3 and
= 6; after each test run one has to evaluate and diagonalize a 6 by 6 matrix M, and then solve repeatedly a 3D system of stochastic ordinary differential equations.
In Fig. 6 we display an experiment for the KS equation and the result of following it with the filter just described.
Fig. 6.
The reduced-dimension filter for the KS equation. Broken line, experiment; solid line, analysis; N = 200, n = 28,
= 6, and m = 3.
Conclusions
We first assess the savings in computer time that result from the construction we have described. The cost of making the single test run, forming the operator M*M, and then the operator R is negligible in our experience compared with the cost of running the many particles. One can deduce from earlier comments that the cost of evaluating R*v and then Rf in Eq. 6 is to leading order m
. The issue is the cost of evaluating the first m components of dv/dt. If the system is linear, our scheme is very economical but better filters are available. In problems with the quadratic nonlinearity above, with a full J = ∂f/∂x matrix and an explicit scheme, a single evaluation for the nonreduced problem costs O(n) operations, whereas the reduced scheme costs O(m) + O(m
) operations. On the other hand, for the same nonlinearity and with an implicit scheme, one finds that the leading cost of a step lies in the iterations for solving the implicit equations, and their costs per iteration goes down from O(n3) to O(m3), although the number of iterations per step does not change noticeably. This makes implicit schemes more attractive here.
We do not expect the number of particles needed to represent the various densities to decrease significantly by our construction, although it sometimes seems to do so, because the construction only reveals and uses lower-dimensional dynamics but does not create them. The moral here is that one should not be afraid to use Bayesian filters for partial differential equations.
We have thus demonstrated that a straightforward analysis of the stable and unstable directions in a dynamical system can sharply reduce the dimension of the system of equations one must solve when one is using a Bayesian filter to track an experiment. The next task is to apply these methods in data assimilation for real-world problems.
Acknowledgments
We thank Prof. Robert Miller for his help and encouragement and Prof. G. I. Barenblatt and Drs. K. Lin and P. Stinis for helpful discussions and comments. This work was supported in part by National Science Foundation Grant DMS 97-32710 and in part by the Office of Science, Office of Advanced Scientific Computing Research, Mathematical, Information, and Computational Sciences Division, Applied Mathematical Sciences Subprogram, U.S. Department of Energy Contract DE-AC03-76SF00098.
Abbreviation: KS, Kuramoto–Sivashinski.
References
- 1.Chui, C. K. & Chen, G. (1987) Kalman Filtering (Springer, Berlin).
- 2.Doucet, A., de Freitas, N. & Gordon, N., eds. (2001) Sequential Monte Carlo Methods in Practice (Springer, New York).
- 3.Jazwinski, A. (1970) Stochastic Processes and Filtering Theory (Academic, New York).
- 4.Miller, R., Cartier, E. & Blue, S. (1999) Tellus 51, 167–194. [Google Scholar]
- 5.Evensen, G. (1994) J. Geophys. Res. 99, 10143–10162. [Google Scholar]
- 6.Silverman, B. (1986) Density Estimation for Statistics and Data Analysis (Chapman & Hall, London).
- 7.Kloeden, P. & Platen, E. (1992) Numerical Solution of Stochastic Differential Equations (Springer, Berlin).
- 8.Dorfman, J. (1999) An Introduction to Chaos in Nonequilibrium Statistical Mechanics, Cambridge Lecture Notes in Physics (Cambridge Univ. Press, Cambridge, U.K.).
- 9.Lin, K. (2003) Ph.D. thesis (Univ. of California, Berkeley).
- 10.Chorin, A. J., Hald, O. & Kupferman, O. (2000) Proc. Natl. Acad. Sci. USA 97, 2968–2973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chorin, A. J., Hald, O. & Kupferman, O. (2002) Physica D 166, 239–257. [Google Scholar]
- 12.Chorin, A. J., Kast, A. & Kupferman, O. (1999) Contemp. Math. 238, 53–75. [Google Scholar]
- 13.Foias, C., Sell, G. & Titi, E. (1989) J. Dynamics Diff. Eqs. 1, 199–224. [Google Scholar]
- 14.Jolly, M., Kevrikides, I. & Titi, E. (1990) Physica D 44, 38–60. [Google Scholar]
- 15.Olson, E. & Titi, E. (2004) J. Stat. Phys., in press.
- 16.Henshaw, W., Kreiss, H. O. & Yström, J. (2003) Multiscale Model. Simul. 1, 119–149. [Google Scholar]
- 17.Murphy, K. & Russell, S. (2001) in Sequential Monte Carlo Methods in Practice, eds. Doucet, A., de Freitas, N. & Gordon, N. (Springer, New York), pp. 499–515.
- 18.Miller, R. & Ehret, L. (2002) Mon. Weather Rev. 130, 2313–2333. [Google Scholar]
- 19.Stinis, P. (2004) Multiscale Model. Simul., in press.
- 20.Hyman, J. & Nicolaenko, B. (1986) Physica D 18, 113–126. [Google Scholar]



















