Significance
Understanding ensemble-based data assimilation methods, including their performance when applied to high-dimensional nonlinear models with low ensemble size, is a crucial problem in science and engineering. Catastrophic filter divergence is a well-documented but mechanistically mysterious phenomenon whereby ensemble-state estimates explode to machine infinity despite the true state remaining in a bounded region. We provide breakthrough insight into the phenomenon by proposing a simple forecast model that experiences catastrophic filter divergence under all ensemble-based methods. This is the first instance to our knowledge of a forecast model that plainly and rigorously illustrates that simple mechanisms can lead to such a drastic filter malfunction and thereby sheds light on when catastrophic filter divergence should be expected and how it can be avoided.
Keywords: data assimilation, ensemble Kalman filter, filter divergence
Abstract
The ensemble Kalman filter and ensemble square root filters are data assimilation methods used to combine high-dimensional, nonlinear dynamical models with observed data. Ensemble methods are indispensable tools in science and engineering and have enjoyed great success in geophysical sciences, because they allow for computationally cheap low-ensemble-state approximation for extremely high-dimensional turbulent forecast models. From a theoretical perspective, the dynamical properties of these methods are poorly understood. One of the central mysteries is the numerical phenomenon known as catastrophic filter divergence, whereby ensemble-state estimates explode to machine infinity, despite the true state remaining in a bounded region. In this article we provide a breakthrough insight into the phenomenon, by introducing a simple and natural forecast model that transparently exhibits catastrophic filter divergence under all ensemble methods and a large set of initializations. For this model, catastrophic filter divergence is not an artifact of numerical instability, but rather a true dynamical property of the filter. The divergence is not only validated numerically but also proven rigorously. The model cleanly illustrates mechanisms that give rise to catastrophic divergence and confirms intuitive accounts of the phenomena given in past literature.
With the growing importance of accurate weather forecasting and the expanding availability of geophysical measurements, data assimilation has never been more vital to society. Ensemble-based assimilation methods, including the ensemble Kalman filter (EnKF) (1) and ensemble square root filters (ESRF) (2, 3), are crucial components of data assimilation that are applied ubiquitously across the geophysical sciences (4, 5). Despite their widespread application, the theoretical understanding of these methods remains underdeveloped. Recent efforts have been made to understand the dynamical properties of EnKF/ESRF in the practical setting of high-dimensional turbulent forecast models with low ensemble size, focusing on well-posedness (6) and stability.
One of the main motivations for these theoretical studies was the curious numerical phenomenon known as catastrophic filter divergence (7, 8). In refs. 7 and 8, it was numerically demonstrated that state estimates provided by ensemble-based methods can explode to machine infinity, despite the forecast model’s being dissipative and satisfying the absorbing ball property (9), which demands that the true state be always absorbed by a bounded region of the state space. The genesis of this phenomenon has previously been attributed to an interplay between stiffness in the forecast models and forecast ensemble alignment. Nevertheless, until now there has been no concrete example that transparently illustrates the mechanism behind catastrophic filter divergence. Without an explicit example or concrete understanding of the phenomenon, it is difficult to identify which models are vulnerable to it and how we should modify the ensemble methods to overcome it.
In this article, we bridge the gap by providing an extremely simple forecast model that has the absorbing ball property but regardless exhibits catastrophic filter divergence under both EnKF and ESRF methods. The mechanism behind the divergence is straightforward and is explained with a simple diagram (Fig. 1). The divergence is not a pathological example, but rather occurs for a large set of filter initializations and parameter values. We not only demonstrate the blowup numerically, we also prove it rigorously in Theorem 2. The mechanism behind the blowup validates the intuition from ref. 8 and involves an interplay between model stiffness and ensemble alignment.
Fig. 1.
(A) The forecast ensemble (green circles) and the truth (red circle). (B) In the analysis step, the posterior ensemble remains in the subspace spanned by the forecast ensemble. Owing to the observation matrix H, all observations will shift the ensemble downward by an amount (approximately) proportional to the x coordinate of the forecast ensemble. (C) The map is applied and the flow lines of are shown (blue dashed). (D) The map is applied, which locks the ensemble into new alignment (red dashed line). The ensemble has returned to the aligned state of A, but with larger x value. The increase in x is proportional to the original x value. Catastrophic filter divergence occurs when this cycle repeats itself.
When encountered in practical models, catastrophic filter divergence is often attributed to numerical instabilities. Such numerical instabilities are common in practical geophysical simulations, where Courant–Friedrichs–Lewy conditions are routinely violated (10). We emphasize that for our model catastrophic filter divergence is not a numerical artifact but rather a genuine and mathematically verifiable property of the filters.
EnKF and ESRF Formulation
We now briefly describe the EnKF and ESRF algorithms. Let be the forecast model, with the true model state satisfying for all integers and with some given (possibly random) initial state . At each time step we make an observation , where H is the observation matrix and are independent and identically distributed (iid) random variables distributed as and for simplicity we take . The objective of data assimilation is to use the forecast model to combine an estimate of the previous state with the new observational data to produce an estimate of the current state .
In both EnKF and ESRF algorithms, an ensemble is used as an empirical estimate of the posterior distribution of the state given the history of observations . The empirical mean of the ensemble is a useful estimate of the state of the model and the empirical covariance matrix provides a quantification of uncertainty.
The EnKF algorithm is the iteration of two steps, the forecast step and the analysis step. In the forecast step, the time posterior ensemble is evolved to the forecast ensemble , where . The primary function of the forecast ensemble is to represent uncertainty in the forecast model; this uncertainty is quantified via the empirical covariance matrix :
| [1] |
where . In the analysis step, the time n observation is assimilated with the forecast ensemble to produce the posterior ensemble . The assimilation update of each ensemble member is described by a Kalman-type update in a possibly nonlinear setting. The update uses a perturbed observation , where is an iid sequence distributed identically to . The Kalman update is then given by
| [2] |
The noise perturbations are introduced to maintain the classical Kalman prior–posterior covariance relation
| [3] |
in an average sense, where denotes the sample covariance of the posterior ensemble.
The ESRF algorithms considered in this article are the ensemble transform Kalman filter (ETKF) (2) and the ensemble adjustment Kalman filter (EAKF) (3). Both filters use the same forecast step as EnKF but differ from EnKF (and from each other) in the analysis step. In both ETKF and EAKF, the posterior ensemble mean is updated from the forecast mean via
| [4] |
Given the updated mean, to compute the update for each ensemble member it is sufficient to compute the posterior ensemble spread matrix . This is computed using the similarly defined forecast spread matrix . Any reasonable choice of should satisfy the Kalman covariance identity
where, by definition, . The ETKF algorithm achieves Eq. 3 by setting where is the transform matrix defined by
The EAKF algorithm achieves Eq. 3 by setting , where is the adjustment matrix defined by
Here is the singular value decomposition (SVD) of and is the diagonalization of , and indicates pseudo inverse of a matrix. For more details on EnKF and ESRF see ref. 4.
From this point on, for our specific model-filter setup we will assume K to be even and divide the ensemble members into two groups and , where . The difference between the two groups is only in how the ensemble members are initialized. As will become apparent later, this division serves as a convenient way of extending a catastrophic divergence result to an arbitrary even K.
Emergence of Catastrophic Filter Divergence
When filtering a dissipative nonlinear system with EnKF or ESRF, the root of catastrophic filter divergence must be in the analysis step. The dissipation guarantees that the forecast step will reduce the energy of the ensemble, so the only opportunity to increase the energy is in the analysis step. In fact, the Kalman update formulas, Eqs. 2 and 4, have no general mechanisms that preserve the energy, and we will see that alignment of forecast ensemble members can lead to large increases of energy in the analysis step.
In refs. 7 and 8, the authors argue that catastrophic filter divergence is strongly associated with alignment of forecast ensemble members. Specifically, in ref. 8 the forecast model is a five-dimensional Lorenz-96 model with one observed variable. Alignment of the forecast ensemble is caused by stiffness in the forecast ordinary differential equation, as evidenced by a strongly negative Lyapunov exponent. When the ensemble aligns in a subspace perpendicular to the observed direction, the analysis update can shift the ensemble (within the subspace) to points that lie on higher energy trajectories of the forecast model. Because the observation is perpendicular to the ensemble subspace, the observation does not contain any useful information and hence the analysis update can potentially affect the performance of the filter. In the next forecast step, the stiffness of the forecast leads to realignment of the ensemble, but on much higher energy trajectories than in the previous forecast step.
In the next section, we construct a simple and concrete forecast model that exhibits the energy growth described above. By an appropriate choice of parameters in the model we can further show catastrophic filter divergence rigorously exists.
A Forecast and Observational Model
The forecast model is defined as the composition of two maps, a linear rotation map and a “locking map” . Hence, from this point on we will refer to as the rotation-lock map. The map is simply the linear transformation
for some angle and with . Hence, the dynamics described by is anticlockwise rotation by angle θ with attraction toward the origin, with contraction parameter ρ. The map defines the macroscopic dynamics of and moreover ensures the dissipation of energy for . The locking map is defined by mapping the input vector to the nearest member of the grid
To be precise, we have
where r is the function that rounds off a real number to the nearest integer, with the “tie-breaker” constraint for any integer n. The map refines the microscopic structure of ; it generates the stiffness that forces the ensemble to align perfectly in a nontrivial subspace of the state space. In this 2D setting, we will see that the nontrivial subspace is simply a vertical line.
The forecast model is then given by
| [5] |
hence we first apply the rotation and then lock the resulting vector to the grid . The model is given by
We will typically choose as initialization for our truth model, which, due to the definition of the round-off function r, is still a fixed point for the dynamics .
At large scales, the locking map makes very little difference to the qualitative and quantitative behavior of the forecast. Indeed, the following result shows that the forecast model still satisfies energy dissipation and consequently has the absorbing ball property (9). The proof is contained in Supporting Information.
Proposition 1.
The nonlinear system generated by the rotation-lock map in 5 satisfies an energy principle:
for all and hence satisfies the absorbing ball property. Moreover, the origin is a fixed point of .
The observational model is of the standard form , with the observation matrix
and where the observation noise are iid random variables. Hence, the observations matrix is full rank, but when ε is small the condition number becomes quite large. In particular, the y observation becomes very poor when ε is small. When the forecast ensemble is aligned in the y direction with an erroneous x coordinate, the Kalman update with matrix H cannot correct this error but will instead magnify the error and transport it to the y coordinate. This is the catalyst for the exponential explosion of the EnKF/EAKF ensembles.
The Energy Blow-Up Cycle
With the rotation-lock dynamics concretely defined, the mechanism behind filter divergence can now be explained graphically. In Fig. 1 we illustrate one cycle of this mechanism, each iteration of the cycle leads to an exponential increase in the ensemble energy. In this illustration there are only two ensemble members, hence we will simply denote them and . The same picture holds equally well for the case with arbitrary members and will be explained at the end of the section. The truth is chosen to be the trivial solution for all times n.
In Fig. 1A we show the forecast ensemble (green circles) and the truth signal (red circle). The forecast ensemble members have respective coordinates and . Owing to the locking map it is natural for the forecast ensemble to be aligned in such a way. In Fig. 1B, an observation is made and the analysis step is performed; hence, the green dots now represent the updated posterior ensemble members. Owing to the configuration of the ensemble and the choice of observation matrix, the actual location of the observation is not particularly relevant. In fact, the analysis step will always move the ensemble members downward within the space spanned by the ensemble (black dashed line) by an amount approximately proportional to . Indeed, from the analysis update (Eq. 2), we can explicitly calculate
where we ignore terms. The details of this computation are given in Supporting Information before Eq. S2 with . In Fig. 1C, we perform the first half of the forecast map by applying the rotation . Here must be chosen in such a way that and get rotated to the attracting region of some grid points , , respectively. In Fig. 1D, we complete the forecast step by applying . Because the rotation leaves the ensemble members in the appropriate attracting regions, we will have and . Hence, we are in a position similar to that in Fig. 1A, but having increased the value of the x coordinate in both ensembles. Each time this cycle occurs, the energy of the ensemble increases exponentially. By tuning the algorithm, we can make this cycle repeat itself many times, resulting in catastrophic filter divergence.
In the case of more than two ensemble members, , the picture is largely unchanged. In Fig. 1A the forecast ensemble members are clustered together at the upper green dot and at the lower green dot. As with the two member case, the locking mechanism makes this a natural forecast configuration. In Fig. 1B, when the analysis step is performed, the ensemble members are all shifted down along the ensemble subspace (black dashed line), but now the clusters break up due to the presence of additional perturbative noise. However, the model is tuned in such a way that the declustering will be undone by the locking map applied in Fig. 1D, and hence the ensemble returns to the same initial picture, but on higher energy trajectories.
Mathematically Rigorous Divergence
In this section, we will specify the precise choices of parameters in the model that guarantee catastrophic filter divergence and we will list the rigorous mathematical results. The proofs can be found in Supporting Information.
Before giving the precise statement of catastrophic filter divergence, we first explain the flavor of the result. Let be an arbitrary integer and let . We will show that there is an exponentially increasing sequence , such that with probability the EnKF/ESRF ensemble is bounded below by for every . In particular, by choosing parameters appropriately, we can ensure that the ensemble will diverge exponentially quickly for an arbitrary long time N and with arbitrarily high probability .
We will parameterize the forecast model and EnKF/ESRF algorithm in the following way. First, we pick the angle θ such that
This choice of rotation ensures the ensemble transportation illustrated by Fig. 1C takes place. We also require that ρ be a rational number with and that , but . The requirement of rationality is a technicality in the proof that is clearly of no consequence for numerical implementation. Both the rationality condition and the restrictions on can be replaced with a much more general (but much more technical) condition. In particular, one can use any and replace the rationality condition with a more specific -dependent condition. For the sake of exposition we will defer the fully general treatment to Supporting Information.
We define the truth signal with and hence for all . The posterior ensemble members will be initialized as
| [6] |
with any satisfying and with any . This choice of initialization ensures the ensemble starts from Fig. 1C and has iterative exponential growth. Owing to the freedom in and , this initialization is not a pathological case, but rather a positive volume set. The exponentially increasing positive sequence is defined by the recursion
With the above choice of it is easy to check that is indeed increasing exponentially (verified in Supporting Information). In the following, denotes the probability measure for the probability space on which the observational noise is defined.
Theorem 2.
Let denote either the EnKF or ESRF ensemble with the forecast model and initialization described above. For any integer and any , we can find such that for any we have
In particular, with probability the ensemble is increasing exponentially with rate at least for at least N consecutive steps.
The proof works exactly as the illustration in Fig. 1. The parameters have been chosen in such a way that the cycle described in Fig. 1 will repeat itself N times, with probability . The details are given in Supporting Information.
Numerical Evidence
Here we use elementary numerical experiments to demonstrate the claimed catastrophic filter divergence phenomenon. We use EnKF ensemble members to filter the rotation-lock system with contraction ratio . With , the angle of rotation θ converges to . From Proposition 1 one can deduce that the attractors of lie inside the bounded set
The true model is initialized at the fixed point . If the EnKF is working properly, the filter ensemble will be attracted to 0 or at least stay close to .
According to Theorem 2, with ε sufficiently small, if we start the ensemble from positions given by 6 with and , the filter ensemble will diverge to machine infinity exponentially fast for a finite number of steps N, with high probability. Here we test the claim with a range of ε and . Fig. 2 shows a log-plot of the energy of the first ensemble member against iteration count n with ε taken from the range . The dashed line indicates the upper bound for the attractors of the forecast model .The experiment confirms the claim of Theorem 2; in Fig. 2, for ε that are sufficiently small (), the energy of the ensemble grows exponentially fast for significantly many iterations (>70) to an extremely high level (1015 to 1021), despite the attractors of the system being in a bounded region (with energy ≤25.31). With the exception of , taking ε smaller leads to a longer continuation of exponential growth, which is in agreement with Theorem 2. In the case, the ensemble does accurately track the true signal and no divergence occurs. This is a testament to the fact that, even though moderately small choices, like , can lead to exponential growth to several orders of magnitude, it cannot be guaranteed unless ε is taken very small.
Fig. 2.
Exponential energy growth of one ensemble member of EnKF. The rotation-lock model is filtered using ensembles, with initialization , and ε takes value from 0.1 to 0.002.
In Fig. 3 we show a log-plot of a similar experiment, but with fixed and chosen from the range . On closer examination, even though the filter is stable with , choosing an initialization that is only one or two orders of magnitude higher can lead to many orders of magnitude in exponential growth.
Fig. 3.
Exponential energy growth of one ensemble member of EnKF. The rotation-lock system is filtered using ensembles, and , and takes value from 10 to 500.
We have also tested different combinations of with ε. For small ε such as , the long-lasting exponential growth of ensemble energy is extremely stable with respect to changes in other parameters. For relatively large ε, the exponential growth phenomenon is rather unstable. Take for example. In the setting of Fig. 2, there is no filter divergence, but when the initialization parameter is chosen differently from 20 to 500, there is exponential growth of length ranging from 15 to 90 steps. As above, this is another testament to the fact that, although sustained exponential growth occurs with moderately small ε, it must be taken very small to guarantee filter divergence.
The numerical experiments above have also been performed with both EAKF and ETKF methods and the results are unchanged. In the EAKF setting, one must make the seemingly innocuous assumption that the diagonalization of an already diagonal matrix is the trivial one . In some cases, numerical packages will not adhere to this rule. Interestingly, a nontrivial choice of diagonalization will lead to stabilization of the filter in the EAKF case.
Discussion
In the preceding text we have constructed an elementary forecast model that exhibits the absorbing ball property and that nevertheless leads to drastic malfunction of ensemble-based Kalman filters. In particular, even when the true model signal is stationary at the origin, the filter estimate diverges exponentially fast to infinity in a phenomena known as catastrophic filter divergence (7, 8). The mechanism that creates this filter malfunction is very intuitive and is illustrated by a simple picture. Moreover, the mechanism builds on generic dynamical properties that are true of much more complex dissipative systems. Namely, the locking mechanism present in our model can be reproduced in more general models via stiffness due to strongly negative Lyapnuov exponents, as present in the five-dimensional Lorenz-96 model (8). Our claims are not only intuitively clear but are backed by rigorous results guaranteeing that the filter will diverge with arbitrarily high probability. The theoretical results are in complete agreement with the numerical experiments provided.
The benefits of this work extend beyond simply proving that catastrophic filter divergence is a genuine mathematical phenomenon and not a numerical artifact. By documenting a simple mechanism that leads to catastrophic filter divergence, we better understand what types of systems are susceptible to malfunction. In particular, the user should be cautious of catastrophic filter divergence in unstable nonlinear systems with strongly negative Lyapnuov exponents, because this can lead to an ensemble alignment mechanism, as found in refs. 7 and 8. An interesting project for future research is to develop the intuition illustrated here into a test to identify and correct catastrophic filter divergence before it destroys the state estimate entirely.
Moreover, the present article implies that the stability properties of ensemble-based filters, including boundedness on an infinite time horizon and geometric ergodicity, have delicate dependence over both the dynamical properties of the nonlinear system and the form of observations. In fact, the authors have discovered a simple criterion for nonlinear dissipative systems that guarantees the prescribed stability properties. This criterion is known as observable energy dissipation and essentially requires that the model dissipate energy in the observed directions. The present article suggests that this criterion is sharp; the rotation-locking map is an example of a forecast model that satisfies an energy criterion but not an observable energy criterion and for which ensemble-based methods exhibit no stability properties whatsoever. The reason this model–filter combination does not satisfy the observable energy criterion is simply due to the large condition number of the matrix H. More details on the criterion and the relation to the condition number can be found in ref. 11.
The catastrophic filter divergence experienced here can be easily averted through additive variance inflation. Intuitively, additive variance inflation will cause the ensemble to dealign itself and the cycle described by Fig. 1 will not iterate. In ref. 12, the authors have used the intuition presented in this article to design a notion of adaptive additive covariance inflation that not only guarantees stability of ensemble-based filters for the above forecast model, but indeed for any model that possesses an energy dissipation principle.
Supplementary Material
Acknowledgments
D.K. is supported as a Courant instructor. This work was supported by Multidisciplinary University Research Initiative Grant N-000-1412-10912 (to A.J.M. and X.T.T.). A.J.M. is the principal investigator, and X.T.T. is supported as a postdoctoral fellow.
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1511063112/-/DCSupplemental.
References
- 1.Evensen G. The ensemble kalman filter: Theoretical formulation and practical implementation. Ocean Dyn. 2003;53(4):343–367. [Google Scholar]
- 2.Bishop CH, Etherton BJ, Majumdar SJ. Adaptive sampling with the ensemble transform kalman filter. Part i: Theoretical aspects. Mon Weather Rev. 2001;129(3):420–436. [Google Scholar]
- 3.Anderson JL. An ensemble adjustment kalman filter for data assimilation. Mon Weather Rev. 2001;129(12):2884–2903. [Google Scholar]
- 4.Majda AJ, Harlim J. Filtering Complex Turbulent Systems. Cambridge Univ Press; Cambridge, UK: 2012. [Google Scholar]
- 5.Kalnay E. Atmospheric Modeling, Data Assimilation, and Predictability. Cambridge Univ Press; Cambridge, UK: 2003. [Google Scholar]
- 6.Kelly D, Law KJ, Stuart AM. Well-posedness and accuracy of the ensemble kalman filter in discrete and continuous time. Nonlinearity. 2014;27:2579–2603. [Google Scholar]
- 7.Majda AJ, Harlim J. Catastrophic filter divergence in filtering nonlinear dissipative systems. Commun Math Sci. 2008;8:27–43. [Google Scholar]
- 8.Gottwald G, Majda AJ. A mechanism for catastrophic filter divergence in data assimilation for sparse observation networks. Nonlinear Process Geophys. 2013;20:705–712. [Google Scholar]
- 9.Temam R. 1997. Infinite Dimensional Dynamical Systems in Mechanics and Physics. Applied Mathematical Sciences, eds Marsden JE, Sirovich L, John F (Springer, New York), Vol 68.
- 10.Grote MJ, Majda AJ. Stable time filtering of strongly unstable spatially extended systems. Proc Natl Acad Sci USA. 2006;103(20):7548–7553. doi: 10.1073/pnas.0602385103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tong XT, Majda AJ, Kelly D. 2015. Nonlinear stability and ergodicity of ensemble based Kalman filters. arXiv:1507.08307. [DOI] [PMC free article] [PubMed]
- 12.Tong XT, Majda AJ, Kelly D. 2015. Nonlinear stability of the ensemble Kalman filter with adaptive covariance inflation. arXiv:1507.08319. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



