Projection Filtering with Observed State Increments with Applications in Continuous-Time Circular Filtering

Anna Kutschireiter; Luke Rast; Jan Drugowitsch

doi:10.1109/tsp.2022.3143471

. Author manuscript; available in PMC: 2023 Jan 14.

Published in final edited form as: IEEE Trans Signal Process. 2022 Jan 14;70:686–700. doi: 10.1109/tsp.2022.3143471

Projection Filtering with Observed State Increments with Applications in Continuous-Time Circular Filtering

Anna Kutschireiter ^1,^*, Luke Rast ¹, Jan Drugowitsch ¹

PMCID: PMC9634992 NIHMSID: NIHMS1777826 PMID: 36338544

Abstract

Angular path integration is the ability of a system to estimate its own heading direction from potentially noisy angular velocity (or increment) observations. Non-probabilistic algorithms for angular path integration, which rely on a summation of these noisy increments, do not appropriately take into account the reliability of such observations, which is essential for appropriately weighing one’s current heading direction estimate against incoming information. In a probabilistic setting, angular path integration can be formulated as a continuous-time nonlinear filtering problem (circular filtering) with observed state increments. The circular symmetry of heading direction makes this inference task inherently nonlinear, thereby precluding the use of popular inference algorithms such as Kalman filters, rendering the problem analytically inaccessible. Here, we derive an approximate solution to circular continuous-time filtering, which integrates state increment observations while maintaining a fixed representation through both state propagation and observational updates. Specifically, we extend the established projection-filtering method to account for observed state increments and apply this framework to the circular filtering problem. We further propose a generative model for continuous-time angular-valued direct observations of the hidden state, which we integrate seamlessly into the projection filter. Applying the resulting scheme to a model of probabilistic angular path integration, we derive an algorithm for circular filtering, which we term the circular Kalman filter. Importantly, this algorithm is analytically accessible, interpretable, and outperforms an alternative filter based on a Gaussian approximation.

Keywords: Bayesian methods, nonlinear filtering, circular filtering, sensor fusion, continuous-time estimation, stochastic processes

I. Introduction

A compass is an immensely useful tool for a traveler trying to find their way in a barren and featureless landscape. Absent such a tool, the traveler must employ dead reckoning, using what they roughly know about how often and how much they have turned, and summing up those turns to maintain an internal sense of their spatial orientation. Angular path integration, i.e., estimation of heading direction or orientation based on angular self-motion cues, plays an essential role in spatial navigation of humans, other animals and robots [1, 2]. Imperfect sensors make angular path integration an inherently noisy process, and inevitably lead to an accumulation of error in the heading estimate over time. Other cues, such as those from visual landmarks, can help correct the estimate’s error, despite being also noisy and ambiguous. Importantly, properly combining path integration with these external cues requires a reliability-weighted update of the orientation estimate. Computing with uncertainties in such a strategic way is a hallmark of dynamic Bayesian inference, and calls for a probabilistic description. Our goal in this work is to derive a dynamic probabilistic algorithm for angular path integration.

It is well known that many organisms are able to maintain an internal compass which they update by self-motion cues. Since the discovery of orientation-selective head-direction cells in rats [3], and, more recently, the heading direction circuit in Drosophila [4], theoretical efforts to unravel the mechanism of angular path integration in the brain have highlighted the role of angular velocity observations of self-initiated turns [5, 6], e.g., from proprioceptive or vestibular feedback, or from visual flow. Current theories suggest that these biological systems implement angular path integration by neural network motifs called ring attractors [7, 6]. Such networks maintain a heading direction estimate through sustained neural activity, but lack the ability to simultaneously represent the estimate’s certainty. However, the question of whether such biological systems indeed only operate with single point estimates or instead perform probabilistic inference has been hampered by the lack of a probabilistic algorithm for path integration in the brain. The reason for this is the complex set of conditions that are required of such an algorithm: (i) the state-space is circular, (ii) the path integration must operate with a continuous time stream of inputs, (iii) it must maintain a fixed representation of the underlying probability distribution, in line with the expectation that, within a certain area of computation, the brain maintains a similarly fixed representation e.g., in terms of a parametric [8] or sampling-based [9] representation, and (iv) in addition to angular velocity observations, (noisy) direct angular observations (e.g., visual landmarks) may be present, which need to be integrated accordingly.

One approach to designing an algorithm that satisfies these conditions is to consider it in the broader context of continuous-time filtering, which aims to continuously update a posterior distribution over a dynamically evolving hidden state variable from noisy observational data. From condition (i), a circular state-space implies that the underlying filtering task is nonlinear, which precludes the use of popular linear schemes such as the Kalman filter [10, 11]. Furthermore, the solution to this so-called circular filtering problem is analytically intractable [12], and needs to be approximated. The approximation we derive here goes beyond existing circular filtering algorithms [13, 14, 15] (see also the review by Kurz et al. [12]) by both considering increment observations — like observations of angular velocity — and by addressing a continuous stream of observations (condition (ii)). Furthermore, while these previous approaches changed the representations used between prediction and update steps, ours uses a fixed representation, satisfying our condition (iii).

A recent promising approach to circular filtering [16] supports continuous-time state transitions, but is limited to discrete-time and direct (rather than increment) observations. Their method is based on projection filtering [17, 18], a rigorous approach that combines nonlinear filtering with information geometry. The idea behind projection filtering is to approximate the posterior between consecutive discrete-time observations with a parametric distribution. This approximation is chosen to minimize the distance between the true and the approximated posterior, as measured by the Fisher metric. By updating only the values of the parameters, this approach automatically keeps a fixed representation for the posterior in terms of a parametric distribution. Furthermore, if the approximated posterior and the emission probabilities of the observations are conjugate, updating posterior parameters in light of further discrete-time direct observations is straightforward. Projection filtering can be generalized to hidden processes that evolve on arbitrary submanifolds of Euclidean space [19], which makes it applicable to the circular filtering problem. Two challenges hamper the direct application of projection filters to angular path integration. First, no variants currently exist that handle increment observations, either in discrete or continuous time, as we would require to process angular velocity observations. In fact, increment observations have generally received little attention in the filtering literature (but see [20]). Second, there is currently no framework that combines projection-filtering with angular-valued continuous-time observations. We will address both challenges in this work.

We introduce a novel continuous-time nonlinear filtering algorithm based on projection filtering, that includes increment observations and can be applied to circular filtering and to angular path integration, and that meets conditions (i)-(iv) outlined above. To do so, we first describe the general nonlinear filtering problem with observed increment observations in Euclidean space in Section II. In Section III, we review the projection filtering framework [17, 18] as an approximate solution for nonlinear filtering, and extend this approach to account for increment observations. We demonstrate that applying this framework to a linear filtering problem recovers the generalized Kalman filter. In Section IV, we revisit the continuous-time circular filtering problem. Therein, we first derive a probabilistic algorithm for angular path integration, i.e., when only increment observations are present. We then account for direct angular-valued observations, in addition to increment observations, by proposing a generative model based on a constant information-rate criterion that supports seamless inclusion into the filtering algorithm. Combining all of the above, we finally retrieve, as a special case of the general framework, a circular filtering algorithm for Gaussian-type increment and angular direct observations, which we term the circular Kalman filter. We demonstrate in numerical simulations that this algorithm performs comparably to an asymptotically-exact particle filter, and outperforms a Gaussian approximation in the estimation of both heading direction as well as its associated certainty.

II. The filtering problem with increment observations

We consider multivariate filtering with observations generated by increments of the hidden state, rather than the hidden state itself. We assume that the hidden state variable $X_{t} \in ℝ^{N}$ evolves according to a stochastic differential equation (SDE) of the form:

d X_{t} = f (X_{t}, t) d t + Σ_{x}^{1 / 2} d W_{t},

(1)

with $W_{t}$ an $ℝ^{N}$ -Brownian motion (BM) process, a vector-valued drift function $f : ℝ^{N} \times ℝ \to ℝ^{N}$ , and a matrix $\sum_{x}^{1 / 2} \in ℝ^{N \times N}$ , which determines the error covariance of the hidden state process. In the following, we will use the shorthand $f_{t} (x) : = f (x, t)$ or skip its argument completely (whenever there is no notational ambiguity). The density ${\tilde{p}}_{t} (x) : = p (X_{t} = x)$ of this stochastic process evolves according to a partial differential equation, the Fokker–Planck equation (FPE):

d {\tilde{p}}_{t} = L^{†} [{\tilde{p}}_{t}] d t,

(2)

with

L^{†} [{\tilde{p}}_{t} (x)] = - \sum_{i = 1}^{N} \frac{\partial}{\partial x_{i}} (f_{i} (x, t) {\tilde{p}}_{t} (x)) + \frac{1}{2} \sum_{i, j}^{N} {(Σ_{x})}_{i j} \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} {\tilde{p}}_{t} (x) .

(3)

Equivalently, expectations of a scalar test function $ϕ_{t} : = ϕ (X_{t})$ with respect to ${\tilde{p}}_{t} (x)$ evolve according to

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t,

(4)

with the propagator

L [ϕ_{t} (x, t)] = f_{t} {(x)}^{T} \nabla ϕ_{t} (x) + \frac{1}{2} Tr (H_{ϕ} Σ_{x}),

(5)

where $\nabla$ denotes the gradient with respect to x, $Tr (\cdot)$ denotes the trace operator and $H_{ϕ}$ is the Hessian matrix with ${(H_{ϕ})}_{i j} = \frac{\partial^{2} ϕ_{t}}{\partial x_{i} \partial x_{j}}$ . Note that $L$ and $L^{†}$ are adjoint operators with respect to the L² inner product.

We assume that the hidden state process X_t in (1) cannot be observed directly, but instead is partially observed through the process $d U_{t}$ , which is governed by the infinitesimal state increments $d X_{t}$ :

d U_{t} = C d X_{t} + Σ_{u}^{1 / 2} d V_{t} = C f_{t} (X_{t}) d t + C Σ_{x}^{1 / 2} d W_{t} + Σ_{u}^{1 / 2} d V_{t} .

(6)

where $V_{t}$ is an $ℝ^{M}$ -BM process, and $C \in ℝ^{M \times N}$ . The matrix $Σ_{u} \in ℝ^{M \times M}$ determines the level of noise in the increment observations. Due to its dependency on the increment dX, the observation process $d U_{t}$ is effectively governed by two noise sources, $d W_{t}$ and $d V_{t}$ . The first is correlated with the noise in the hidden state dynamics. The second is independent of it. This is in contrast to classical filtering problems, which consider the noise in hidden states and observations to be independent (cf. Appendix V-A).

If $U_{0 : t} = {U_{τ} : τ < t}$ denotes the filtration generated by the process U_t, then the filtering problem is to compute the posterior density $p_{t} (x) = p (X_{t} = x ∣ U_{0 : t})$ or, equivalently, the posterior expectation $E [ϕ_{t}] : = E [ϕ (X_{t}) ∣ U_{0 : t}]$ . In an uncorrelated noise setting, the Kushner–Stratonovich equation describes the temporal evolution of the posterior expectation $E [ϕ_{t}]$ [21, 22, 23] (Appendix V-A). To find a similar formal solution for filtering with observed state increments (i.e., correlated noise), we can account for the correlations between observations and the hidden state by introducing a slight modification in the Kushner–Stratonovich equation, resulting in a generalized Kushner–Stratonovich equation (gKSE) [23, Chapter 3.8] (cf. Nüsken et al. [20]). For our particular problem, we show in Appendix V-B that the dynamics of posterior expectations satisfy

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} \cdot C^{T} {\tilde{Σ}}_{u}^{- 1} (d U_{t} - C E [f_{t}] d t),

(7)

with $cov (ϕ_{t}, f_{t}) = E [ϕ_{t} f_{t}] - E [ϕ_{t}] E [f_{t}]$ , and

{\tilde{Σ}}_{u} = C Σ_{x} C^{T} + Σ_{u} .

(8)

Note that the right-hand side of (7) does not only depend on $E [ϕ_{t}]$ , but also on $E [ϕ_{t} f_{t}]$ , $E [\nabla ϕ_{t}]$ and other expectations of potentially nonlinear functions, which in general cannot be computed from $E [ϕ_{t}]$ alone. Therefore, in order to completely characterize the probabilistic solution, we would need one equation (7) for every moment of the posterior, with each moment corresponding to a specific choice of $ϕ_{t}$ . Thus, except for a few very specific generative models, such as linear ones, the dynamics of posterior expectations in (7) will result in a system of an infinite number of coupled SDEs, which in general is analytically intractable.

Remark 1.

Equation (7) is the gKSE if state increments are the only available type of observations. In Appendix V-B, we extend it to the Kushner–Stratonovich equation when both state increments and (Gaussian-type) direct observations are present.

In what follows, we will approximately solve the continuous-time filtering problem with observed state increments by projecting the gKSE onto a submanifold of parametric densities with a finite number of parameters, resulting in a finite system of coupled SDEs for these parameters. For this, we will first review the general projection method, which so far has only been applied to classical filtering problems with uncorrelated state and observation noise, and then extend this framework to filtering problems with observed hidden state increments.

III. Projection filtering for observed continuous-time state increments

A. The general projection filtering method

Projection filtering is a method for approximate nonlinear filtering that is based on differential geometry. In this subsection, we outline briefly the differential geometric setup and derivation of the projection filter, and refer the reader to the seminal papers on projection filters [17, 24, 18], or the intuitive introduction to the subject matter presented in [25], for more detailed derivations and in-depth discussion of the method.

In general, we can interpret the solution of a (stochastic) differential equation (such as the FPE (2)) as a (stochastic) vector field on an infinite-dimensional function space $M$ of probability density functions p_t. If the vector field is stochastic (which is usually the case in nonlinear filtering [22]), we consider it to be given in Stratonovich form

d p_{t} = A^{†} [p_{t}] d t + B^{†} [p_{t}] \circ d U_{t},

(9)

which is the standard choice for stochastic calculus on manifolds. Let us further assume a parametrization $p_{θ} (x) : = p (x; θ)$ with a finite set of parameters $θ = {θ_{1}, \dots θ_{m}} \in Θ$ , such that the solution to (9) is reasonably well approximated by these parametrized densities.

Projection filtering provides a solution to the filtering problem by evolving the parameters θ, thereby constraining the approximate posterior to evolve on a finite-dimensional submanifold $S = {p_{θ} (x); θ \in Θ}$ of $M$ , rather than on $M$ itself. This implies that we have to project the vector field in (9) onto the tangent space $T_{θ} S$

T_{θ} S = Span (\frac{\partial p_{θ}}{\partial θ_{1}}, \dots, \frac{\partial p_{θ}}{\partial θ_{m}}) \subset L^{1},

(10)

where the $\frac{\partial p_{θ}}{\partial θ_{i}}$ denote the basis vectors of this tangent space. Intuitively, an orthogonal projection minimizes at each timestep the distance between the true posterior p_t and its approximation p_θ with respect to a Riemannian metric which, for probability distributions, corresponds to the Fisher metric [26]

g_{i j} = E_{p_{θ}} [\frac{\partial \log p_{θ} (x)}{\partial θ_{i}} \frac{\partial \log p_{θ} (x)}{\partial θ_{j}}] .

(11)

This allows us to use the general orthogonal projection formula

Π_{θ} [Z] = \sum_{i} \sum_{j} g^{i j} {〈 Z, \frac{\partial p_{θ}}{\partial θ_{i}} 〉}_{θ} \cdot \frac{\partial p_{θ}}{\partial θ_{j}},

(12)

where $Π_{θ}$ denotes the projection operator, the $g^{i j}$ are the components of the inverse Fisher metric, and

{〈 Z_{1}, Z_{2} 〉}_{θ} : = \int_{Ω} d x \frac{Z_{1} (x) Z_{2} (x)}{p_{θ} (x)}, Z_{1}, Z_{2} \in T_{θ} S .

(13)

is the inner product that is associated with the Fisher metric (see Proposition 1 in Appendix V-C).

To find the parameter updates resulting from this projection, we apply the projection (12) to the dynamics of the probability density p_t in (9). Since our approximate dynamics evolve along tangent vectors to the manifold parametrized by θ, the posterior p_θ will stay on this manifold. Hence, the left-hand side of (9) can be written in terms of the basis vectors of $T_{θ} S$ by using the chain rule:

Π_{θ} [d p_{θ}] = \sum_{j} \frac{\partial p_{θ}}{\partial θ_{j}} d θ_{j} .

(14)

Further, by letting the projection act on the right-hand side of this equation, and consecutively comparing coefficients in front of the basis vectors $\frac{\partial p_{θ}}{\partial θ_{j}}$ , we find the following Stratonovich SDEs for the parameters of the projected density following the evolution in (9):

d θ_{j} = \sum_{i} g^{i j} {〈 A^{†} [p_{θ}], \frac{\partial p_{θ}}{\partial θ_{i}} 〉}_{θ} d t + \sum_{i} g^{i j} {〈 B^{†} [p_{θ}], \frac{\partial p_{θ}}{\partial θ_{i}} 〉}_{θ} \circ d U_{t} .

(15)

This is the result of Brigo et al. [18, Theorem 4.3].

To facilitate the comparison to the gKSE (7), we further slightly rewrite this SDE:

d θ_{j} = \sum_{i} g^{i j} (\int_{Ω} d x A^{†} [p_{θ}] \frac{\partial \log p_{θ}}{\partial θ}) d t + \sum_{i} g^{i j} (\int_{Ω} d x B^{†} [p_{θ}] \frac{\partial \log p_{θ}}{\partial θ}) \circ d U_{t}

(16)

= \sum_{i} g^{i j} E_{θ} [A [\frac{\partial \log p_{θ}}{\partial θ_{i}}]] d t + \sum_{i} g^{i j} E_{θ} [B [\frac{\partial \log p_{θ}}{\partial θ_{i}}]] \circ d U_{t},

(17)

where $E_{θ} [\cdot]$ here denotes the expectation with respect to the projected density p_θ, and $A$ and $B$ denote the adjoint of $A^{†}$ and $B^{†}$ with respect to the L² scalar product. Rewriting the SDE in such a way allows us to immediately identify the operators $A$ and $B$ on the right-hand side of this equation with the operators used for propagating expectations rather than densities.

To illustrate how to identify the operator $A$ concretely, let us consider the filtering problem without any observations, i.e., a simple diffusion, which is formally solved by the Fokker–Planck Equation (2). Noting that the operator $L^{†}$ propagates the density through time, we identify $A^{†} = L^{†}$ . Similarly, we find $A = L$ for the adjoint. Thus, the projection can be immediately determined from the time evolution of the expectation in Eq. (4), with $ϕ_{t} = \frac{\partial \log p_{θ}}{\partial θ_{i}}$ :

d θ_{j} = \sum_{i} g^{i j} E [L [\frac{\partial \log p_{θ}}{\partial θ_{i}}]] d t .

(18)

If state increments are observed, expectations are propagated by using the gKSE (7), and identification of the operators $A$ and $B$ is possible after having transformed the gKSE to Stratonovich form, as we will see in the next section. Since this is usually easier to do than transforming the generalized Kushner equation for the posterior density in Stratonovich form, Eq. (17) is a more convenient choice for the parameter dynamics than Eq. (15).

Remark 2.

The derivation in the seminal papers on projection filtering [17, 18] follows a slightly different route but leads to the same result (15). We also refer the reader to the very accessible derivation presented in [25].

B. Projection with observed state increments

As we have seen, the adjunction between the SDE evolution operators for densities and associated expectations allowed us to use the expectation’s evolution equation to derive the projection filter for a specific problem (18). The same applies if the evolution of the density (or equivalently, that of the expectations) is a stochastic differential equation, as is the case for the KSE and the gKSE (7), as long as these are given in Stratonovich form. This allows us to formulate the projection filter for filtering with observed state increments:

Theorem 1.

The projection filter for the filtering problem with observed state increments is given by the following SDE of the parameters θ of a projected density p_θ:

d θ_{j} = \sum_{i} g^{i j} [E_{θ} [\tilde{L} [\frac{\partial \log p_{θ}}{\partial θ_{i}}]] - \frac{1}{2} (E_{θ} [{‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t} ‖}^{2} \frac{\partial \log p_{θ}}{\partial θ_{i}}] + E_{θ} [Tr ({\tilde{Σ}}_{u}^{- 1} C J_{f} Σ_{x} C^{T}) \frac{\partial \log p_{θ}}{\partial θ_{i}}])] d t + \sum_{i} g^{i j} {(E_{θ} [\frac{\partial \log p_{θ}}{\partial θ_{i}} f_{t}] + Σ_{x} E_{θ} [\nabla \frac{\partial \log p_{θ}}{\partial θ_{i}}])}^{T} \cdot C^{T} {\tilde{Σ}}_{u}^{- 1} \circ d U_{t},

(19)

with ${(J_{f})}_{i j} = \frac{\partial f_{i}}{\partial x_{j}}$ denoting the Jacobian matrix, and with the short-hand modified generator

\tilde{L} [ϕ_{t}] : = {(\nabla ϕ_{t} (x))}^{T} C^{- 1} Σ_{u} {\tilde{Σ}}_{u}^{- 1} C f_{t} (x) + \frac{1}{2} T r (H_{ϕ} Σ_{x} C^{- 1} Σ_{u} {\tilde{Σ}}_{u}^{- 1} C) .

(20)

Proof.

As a first step, let us rewrite the gKSE (7) in Stratonovich form (Corollary 3 in V-B):

d E [ϕ_{t}] = [E [\tilde{L} [ϕ_{t}]] - \frac{1}{2} (cov (ϕ_{t}, {‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t} ‖}^{2}) + Tr ({\tilde{Σ}}_{u}^{- 1} C cov (ϕ_{t}, J_{f}) Σ_{x} C^{T}))] d t + {[cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}]]}^{T} C^{T} {\tilde{Σ}}_{u}^{- 1} \circ d U_{t} .

(21)

This equation allows us to identify the operators $A$ and $B$ in Eq. (17) as the operations acting on $ϕ_{t}$ in front of dt and $d U_{t}$ , respectively. By letting these operators act on $ϕ_{t} = \frac{\partial \log p_{θ}}{\partial θ_{j}}$ in Eq. (17), and evaluating the expectations under the projected density p_θ, we obtain SDEs for the desired parameters. These can be further simplified by using

E_{θ} [\frac{\partial \log p_{θ}}{\partial θ_{j}}] = \frac{\partial}{\partial θ_{j}} \int_{Ω} d x p_{θ} (x) = 0,

which yields (19). □

The 1D special case follows directly from (19).

Corollary 1.

For univariate filtering problems, i.e. a filtering problem with $N = M = 1$ in (1) and (6), with $C = c$ , $Σ_{x} = σ_{x}^{2}$ , $Σ_{u} = σ_{u}^{2}$ and ${\tilde{Σ}}_{u} = {\tilde{σ}}_{u}^{2} = c^{2} σ_{x}^{2} + σ_{u}^{2}$ , the projection filter with observed state increments reads:

d θ_{j} = \sum_{i} g^{i j} [\frac{σ_{u}^{2}}{{\tilde{σ}}_{u}^{2}} E_{θ} [L [\frac{\partial \log p_{θ}}{\partial θ_{i}}]] - \frac{c^{2}}{2 {\tilde{σ}}_{u}^{2}} (E_{θ} [f_{t}^{2} \frac{\partial \log p_{θ}}{\partial θ_{i}}] + σ_{x}^{2} E_{θ} [\frac{\partial f_{t}}{\partial x} \cdot \frac{\partial \log p_{θ}}{\partial θ_{i}}])] d t + \sum_{i} g^{i j} [\frac{c}{{\tilde{σ}}_{u}^{2}} (E_{θ} [f_{t} \frac{\partial \log p_{θ}}{\partial θ_{i}}] + σ_{x}^{2} E_{θ} [\frac{\partial}{\partial x} \frac{\partial \log p_{θ}}{\partial θ_{i}}])] \circ d U_{t} .

(22)

C. Projection on exponential family distributions

Analogous to [18], it is possible to derive explicit filter equations for the natural parameters of a projected exponential family distribution. Consider the following exponential family parametrization:

p_{θ} (x) : = \exp (θ^{⊤} T (x) - Ψ (θ)),

(23)

where θ is the vector of natural or canonical parameters, T(x) is the vector of sufficient statistics and $\exp (Ψ (θ))$ is the normalization.

Corollary 2.

The projection filter for the filtering problem with observed state increments is given by the following SDE of the natural parameters θ of a projected density p_θ belonging to the exponential family:

d θ_{j} = \sum_{i} g^{i j} [E_{θ} [\tilde{L} [T_{i} (x)]] - \frac{1}{2} (E_{θ} [{‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t} ‖}^{2} (T_{i} (x) - η_{i})] + E_{θ} [Tr ({\tilde{Σ}}_{u}^{- 1} C J_{f} Σ_{x} C^{T}) (T_{i} (x) - η_{i})])] d t + \sum_{i} g^{i j} {(E_{θ} [(T_{i} (x) - η_{i}) f_{t}] + Σ_{x} E_{θ} [\nabla T_{i} (x)])}^{T} \cdot C^{T} {\tilde{Σ}}_{u}^{- 1} \circ d U_{t},

(24)

where $η_{i} = E_{θ} [T_{i} (x)]$ is the posterior expectation of the sufficient statistic $T_{i} (x)$ . The parameters $η$ are sometimes referred to as dual or expectation parameters.

Proof.

Making use of the duality relation between natural and expectation parameters for exponential families,

\frac{\partial}{\partial θ_{i}} \log p_{θ} (x) = T_{i} (x) - \frac{\partial}{\partial θ_{i}} Ψ (θ) = T_{i} (x) - η_{i},

(25)

and the fact that $\frac{\partial}{\partial x} η_{i} = 0$ , Eq. (24) follows directly from the projection filter with observed state increments (19). □

Example 1

(Generalized Kalman–Bucy filter). In order to demonstrate the general approach, let us consider a model with linear state dynamics

d X_{t} = a X_{t} d t + σ_{x} d W_{t},

(26)

d U_{t} = c d X_{t} + σ_{u} d V_{t} .

(27)

Here, we will show that a projection on a Gaussian manifold with

p_{θ} (x) = N (x; μ_{t}, σ_{t}^{2})

(28)

results in dynamics for the parameters μ_t and σ_t that are consistent with the generalized Kalman–Bucy Filter for observed state increments [20, Section 4.2]. In fact, since Eqs. (26) and (27) are both linear, the posterior density is a Gaussian and thus the projection filter becomes exact. For this particular problem, the projection filter reads:

d θ_{j} = \sum_{i} g^{i j} [\frac{σ_{u}^{2}}{{\tilde{σ}}_{u}^{2}} E_{θ} [a X_{t} \frac{\partial}{\partial x} \frac{\partial \log p_{θ}}{\partial θ_{i}} + σ_{x}^{2} \frac{\partial^{2}}{\partial x^{2}} \frac{\partial \log p_{θ}}{\partial θ_{i}}] - \frac{c^{2} a^{2}}{2 {\tilde{σ}}_{u}^{2}} E_{θ} [X_{t}^{2} \frac{\partial \log p_{θ}}{\partial θ_{i}}]] d t + \sum_{i} g^{i j} [\frac{c}{{\tilde{σ}}_{u}^{2}} (a E_{θ} [X_{t} \frac{\partial \log p_{θ}}{\partial θ_{i}}] + σ_{x}^{2} E_{θ} [\frac{\partial}{\partial x} \frac{\partial \log p_{θ}}{\partial θ_{i}}])] \circ d U_{t} .

(29)

We will use this to determine SDEs for the parameters μ_t and $σ_{t}^{2}$ , with $\frac{\partial \log p_{θ}}{\partial μ_{t}} = \frac{x - μ_{t}}{σ_{t}^{2}}$ and $\frac{\partial \log p_{θ}}{\partial (σ_{t}^{2})} = - \frac{1}{2 σ_{t}^{2}} + \frac{(x - μ_{t})}{2 σ_{t}^{4}}$ , respectively, under the Gaussian assumption. First, the components of the Fisher information matrix of a Gaussian parametrized by its expectation parameters are given by

g_{μ μ} = \frac{1}{σ_{t}^{2}}, g_{σ^{2} σ^{2}} = \frac{1}{2 σ_{t}^{4}}, g_{σ^{2} μ} = g_{μ σ^{2}} = 0.

(30)

Since the matrix is diagonal, the components of its inverse are $g^{μ μ} = g_{μ μ}^{- 1} = σ_{t}^{2}$ and $g^{σ^{2} σ^{2}} = g_{σ^{2} σ^{2}}^{- 1} = 2 σ_{t}^{2}$ . This considerably simplifies the projection in Eq. (29) for the time evolution of μ_t and σ_t. Explicitly carrying out the expectations in Eq. (29) under the assumed Gaussian density, the SDEs for these parameters read:

d μ_{t} = a μ_{t} d t + \frac{c}{c σ_{x}^{2} + σ_{u}^{2}} (a σ_{t}^{2} + σ_{x}^{2}) \cdot (d U_{t} - a c μ_{t} d t),

(31)

d σ_{t}^{2} = [2 a σ_{t}^{2} + σ_{x}^{2} - \frac{c^{2}}{c σ_{x}^{2} + σ_{u}^{2}} {(a σ_{t}^{2} + σ_{x}^{2})}^{2}] d t .

(32)

We found these Ito SDEs from their Strontonovich form by noting for the first line that the quadratic variation between σ_t and the observations process U_t is zero. In other words, no correction term according to the Wong–Zakai theorem [27] is required, such that both Ito and Stratonovich form have the same representation. Note that for a nonlinear generative model, this will in general not be the case [18]. Equations (31) and (32) are identical to the generalized Kalman–Bucy filter [20, Eqs. (62) & (65)], thus demonstrating the validity of our approach.

Despite being able to reproduce existing results, the main purpose of the projection filtering approach is to simplify potentially hard filtering problems such that the parameter SDEs become analytically accessible. This becomes particularly useful if a certain parametric form of the posterior is desired, for instance for computational reasons, and can be very appealing if expectations are easily carried out under the assumed posterior and the Fisher matrix is straightforward to invert or even diagonal.

Example 2

(Multivariate Gaussian with diagonal covariance matrix). From a computational perspective, projection on a Gaussian density with diagonal covariance matrix can be advantageous in certain situations. In particular, such a solution only requires equations for 2N parameters, instead of $N^{2} + N$ for a general Gaussian with Fisher matrix components

g_{μ_{i}, μ_{j}} = {(Σ_{t}^{- 1})}_{i j}, g_{μ_{i}, σ_{n m}^{2}} = 0,

g_{σ_{i j}, σ n m} = \frac{1}{4} [{(Σ_{t}^{- 1})}_{i n} {(Σ_{t}^{- 1})}_{j m} + {(Σ_{t}^{- 1})}_{i m} {(Σ_{t}^{- 1})}_{j n}],

which is in general hard to invert. For a Gaussian with diagonal covariance matrix, these simplify to

g_{μ_{i}, μ_{t}} = {(Σ_{t}^{- 1})}_{i j}, g_{σ_{i i}^{2}, σ^{2} i i} = \frac{1}{2} {(Σ_{t}^{- 1})}_{i i}^{2} = \frac{1}{2 σ_{i i}^{2}},

while all other components evaluate to zero, making the Fisher matrix diagonal and straightforward to invert. Since the diagonality of the covariance matrix effectively decouples the dimensions, expectations can be carried out in each dimension separately. Nevertheless, the specific form of the parameter SDEs will crucially depend on the specific form of the nonlinear function $f_{t} (x)$ . For instance, considering the linear case, i.e. $f_{t} (x) = A x$ , yields

d μ_{t} = A μ_{t} + (Σ_{x} + diag (σ_{i i}^{2}) A) C^{T} {\tilde{Σ}}_{u}^{- 1} (d U_{t} - C A μ_{t} d t),

\frac{d σ_{i i}^{2}}{d t} = 2 σ_{i i}^{2} {(A - Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C A)}_{i i} - 2 σ_{i i}^{4} {(A^{T} C^{T} {\tilde{Σ}}_{u}^{- 1} C A)}_{i i} + {(Σ_{x} - Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C Σ_{x})}_{i i} .

IV. Continuous-time circular filtering

In this section, we will consider continuous-time circular filtering with observed state increments as a concrete application of the framework derived above. We will further extend it to account for quasi continuous-time von Mises-valued observations (formally defined below), to provide a continuous-time generalization of the discrete-time circular filtering problem that is frequently encountered in spatial navigation problems.

A. Assuming observed angular increments only

We assume that the hidden state $X_{t}$ is parametrized on $S^{1}$ by angle $φ_{t} \in [0, 2 π)$ , effectively embedding $S^{1}$ in $ℝ^{2}$ as a unit circle. We further assume that $φ_{t}$ follows a diffusion on the circle:

d φ_{t} = f (φ_{t}, t) d t + σ_{φ} d W_{t},

(33)

where W_t is now an $ℝ^{1}$ -BM process, and drift and diffusion functions are as defined in Section II. The propagator $L [\cdot]$ for this process is the same as for the corresponding process in $ℝ^{1}$ as given in Eq. (5).

For state processes that evolve on submanifolds of $ℝ^{n}$ , as the $S^{1}$ considered here, Tronarp and Särkkä [19] have shown that projection filter equations are identical to the case where the state variable X_t evolves in Euclidean space. Since the mathematical operations performed to derive the projection filter with observed state increments in Theorem 1 are essentially the same as in Tronarp and Särkkä [19] for the state diffusion, their result carries over to our problem. Thus, Theorem 1 can straightforwardly be applied to the circular filtering problem by considering a circular projected density $p_{θ} (φ)$ , such as the von Mises or a wrapped normal distribution.

Example 3

(Circular diffusion with observed state increments). In this example, we explicitly model angular path integration as the estimation of a circular diffusion based on observed angular increments. Consider a model where the hidden state evolves according to a Brownian motion on the circle, with noisy observations of its increment

d φ_{t} = \frac{1}{\sqrt{κ_{φ}}} d W_{t},

(34)

d U_{t} = d φ_{t} + \frac{1}{\sqrt{κ_{u}}} d V_{t} .

(35)

Here, $φ_{t}$ could, for instance, correspond to the heading direction of an animal (or a robot) that is navigating in darkness and only has access to self-motion cues $d U_{t}$ , i.e., measurements of angular increments, but not to direct heading cues such as landmark positions. We chose to parametrize the diffusion constants in terms of precisions, $κ_{φ}$ and $κ_{u}$ , to make units comparable to that of the precision of the projected density, which we will denote $κ_{t}$ . Thus, the parameter $κ_{φ}$ governs the speed of the hidden state diffusion, and the parameter $κ_{u}$ modulates the reliability of the observation process that is governed by the increments. The gKSE for this model’s posterior expectation of a test function $ϕ_{t} : = ϕ (φ_{t})$ in Stratonovich form reads:

d E [ϕ_{t}] = \frac{1}{2 (κ_{φ} + κ_{u})} E [\frac{\partial^{2}}{\partial φ^{2}} ϕ_{t}] d t + \frac{κ_{u}}{κ_{φ} + κ_{u}} E [\frac{\partial}{\partial φ} ϕ_{t}] \circ d U_{t} .

(36)

As a result, the projection filter for the parameters θ becomes (cf. Eq. (19))

d θ_{j} = \frac{1}{κ_{φ} + κ_{u}} \sum_{i} g^{i j} [\frac{1}{2} E_{θ} [\frac{\partial^{2}}{\partial φ^{2}} \frac{\partial \log p_{θ}}{\partial θ_{i}}] d t + κ_{u} E_{θ} [\frac{\partial}{\partial φ} \frac{\partial \log p_{θ}}{\partial θ_{i}}] \circ d U_{t}] .

(37)

We now want to solve the circular filtering problem with observed state increments by projecting on the von Mises density

p_{μ, κ} (φ) = V M (φ; μ_{t}, κ_{t}) = \frac{1}{2 π I_{0} (κ_{t})} \exp (κ_{t} \cos (φ - μ_{t})),

(38)

parametrized by mean μ_t and precision $κ_{t}$ , using Eq. (37). Unlike e.g., the wrapped normal distribution, which is another popular choice for unimodal circular distributions, the von Mises distribution is an exponential family distribution and could alternatively be written in natural parametrization (23). Here, we chose to parametrize it by μ_t and $κ_{t}$ , as it significantly simplifies the computation of the Fisher metric and its inverse, which appears on the right-hand side of (37). Noting that

\frac{\partial}{\partial μ_{t}} \log p_{μ, κ} (φ) = κ_{t} \sin (φ - μ_{t}),

(39)

\frac{\partial}{\partial κ_{t}} \log p_{μ, κ} (φ) = - F (κ_{t}) + \cos (φ - μ_{t}),

(40)

where $F (κ) = \frac{I_{1} (κ)}{I_{0} (κ)}$ denotes a ratio of Bessel functions, the components of the Fisher metric (with respect to the μ_t, $κ_{t}$ parametrization) are given by:

g_{μ μ} = κ_{t}^{2} E_{μ, κ} [\sin^{2} (φ_{t} - μ_{t})] = κ_{t} F (κ_{t}),

(41)

g_{κ κ} = E_{μ, κ} [{(F (κ_{t}) + \cos (φ_{t} - μ_{t}))}^{2}] = 1 - \frac{F (κ_{t})}{κ_{t}} - F {(κ_{t})}^{2},

(42)

g_{μ κ} = g_{κ μ} = 0.

(43)

Since the Fisher metric is diagonal, the components of its inverse are simply $g^{μ μ} = g_{μ μ}^{- 1}$ , $g^{κ κ} = g_{κ κ}^{- 1}$ and $g^{μ κ} = g^{κ μ} = 0$ .

Using these g_ij’s and explicitly computing the expectations on the right-hand side of Eq. (37) with respect to the von Mises approximation, we find the projection filter equations for this model:

d μ_{t} = \frac{κ_{u}}{κ_{φ} + κ_{u}} \cdot d U_{t},

(44)

d κ_{t} = - \frac{1}{2 (κ_{φ} + κ_{u})} F (κ_{t}) d t,

(45)

where we defined the strictly positive function

F (κ_{t}) = \frac{F (κ_{t})}{1 - \frac{F (κ_{t})}{κ_{t}} - F {(κ_{t})}^{2}},

(46)

and found the Ito form of $d μ_{t}$ from its Stratonovich form by noting that the noise variance is constant, making this conversion straightforward.

The projection filter defined by Eqs. (44) and (45) for orientation tracking in darkness has an intuitive interpretation: the mean μ_t is updated according to the angular increment observations, weighted by their reliability, as quanitified by $κ_{u}$ . Even in the presence of such observations, the estimate’s precision, $κ_{t}$ , decays towards zero, since $F (κ_{t})$ is strictly positive. This decay reflects the accumulation of noisy observations. Very informative angular velocity observations with large $κ_{u}$ may slow the decay, but cannot fully prevent it. In other words, without direct angular observations (which we will introduce in Section IV-B below), the estimate will inevitably become less accurate over time.

In Figure 1a, we illustrate in an example simulation that, despite the presence of angular increment observations, the estimate slowly drifts away from the true heading $φ_{t}$ . As a benchmark we use a particle filter, and further compare mean μ_t and precision $r_{t} = F (κ_{t})$ of the projection filter to that estimated by a Gaussian projection filter approximation (see Appendix V-F for details on these benchmarks). Such a filter relies on the assumption that the hidden state $φ_{t}$ evolves on the real line, and thus leads to a slight deviation in the dynamics of the estimated precision $κ_{t}$ .

Fig. 1. — a) The von Mises projection filter in Eqs. (44) and (45) is able to track a diffusion process on the circle based on observed increments, with mean *μ_t* and precision *r_t* (given by $r_{t} = \frac{I_{1} (κ_{t})}{I_{0} (κ_{t})}$ ) matching that of a particle filter (curves are on top of each other). b) While the precision ${\hat{r}}_{t}$ estimated from the deviation between mean *μ_t* and true trajectory $φ_{t}$ and the precision *r_t* estimated by the filter coincide for both the von Mises projection filter and the particle filter, the Gaussian projection filter (“Gauss filter”) tends to underestimate its precision. c) Empirical (upper panel) and estimated (lower panel) precision for different values of the observations precision *κ_u* at time $T = 10 κ_{φ}^{- 1}$ . Note that in the upper panel, the empirical precision ${\hat{r}}_{t}$ of the different filters is identical. Parameters for a) and b) are $κ_{φ} = 1$ , $κ_{u} = 10$ , times are in units of $κ_{φ}^{- 1}$ . Simulations in b) and c) were averaged over 5000 runs.

Numerically, our projection filter’s performance in this example is indistinguishable from that of the particle filter (Figure 1b, c), and its estimated precision r_t matches exactly the empirical precision evaluated by averaging the estimation error over 5000 simulation runs. The estimated precision of the Gaussian approximation, in contrast, systematically underestimates its precision for large observation reliability, and overestimates it when the angular increment observations become very noisy (Figure 1c).

Example 4

(Higher-order circular distributions). The previous example is one of the simplest examples of a circular filtering problem, and results in an approximated posterior that is always unimodal. This might be insufficient for certain settings in which we would like to consider more sophisticated projected densities. To show that our framework extends beyond such simple models, let us consider a class of circular distributions with exponential-family densities of the form

p (φ) = \frac{1}{Z (a, b)} \exp [\sum_{k = 1}^{K} a_{k} \cos (k φ) + b_{k} \sin (k φ)] .

(47)

The case K = 1 recovers the previously used von Mises distribution (38), but with a different parametrization. For K = 2, this density is referred to as the generalized von Mises distribution, whose properties have been studied extensively [28, 29]. As a proof of concept, we will now use the projection filter for the natural parameters (Eq. (24)) to project the solution to the generative model in Eqs. (34) and (35) onto a distribution with density (47).

The circular distributions with densities (47) belong to the exponential family distributions, with natural parameter vector $θ = (a, b)$ with $a = {a_{1}, \dots, a_{K}}$ and $b = {b_{1}, \dots, b_{K}}$ , and sufficient statistics given by

T_{k}^{c o s} (φ) = \cos (k φ), T_{k}^{s i n} (φ) = \sin (k φ) .

(48)

The corresponding expectation parameters are defined by

η_{k}^{c o s} = E_{θ} [T_{k}^{c o s}], η_{k}^{s i n} = E_{θ} [T_{k}^{s i n}] .

(49)

According to Corollary 2, the projection filter for the natural parameters of an exponential family density requires us to apply the right-hand side of the gKSE (36) to the sufficient statistics. For this, we need to compute

E_{θ} [\frac{\partial}{\partial φ} T_{k}^{c o s} (φ)] = - k E_{θ} [\sin (k x)] = - k η_{k}^{s i n},

(50)

E_{θ} [\frac{\partial^{2}}{\partial φ^{2}} T_{k}^{c o s} (φ)] = - k^{2} E_{θ} [\cos (k x)] = - k^{2} η_{k}^{c o s},

(51)

E_{θ} [\frac{\partial}{\partial φ} T_{k}^{s i n} (φ)] = k η_{k}^{c o s},

(52)

E_{θ} [\frac{\partial^{2}}{\partial φ^{2}} T_{k}^{s i n} (φ)] = - k^{2} η_{k}^{s i n} .

(53)

Furthermore, we note that the components of the Fisher matrix $g_{i j}$ are given by

g_{i j} = \frac{\partial^{2}}{\partial θ_{i} \partial θ_{j}} \log Z (a, b),

(54)

where the θ_i refer to the ith element in the parameter vector θ that contains the elements of both a and b. Inverting this matrix to get the inverse components $g^{i j}$ is in general not straightforward. In fact, already the Fisher metric needs to be computed numerically, as the normalization Z(a, b) is inaccessible in closed form. We will thus treat $G^{- 1}$ symbolically and sort the parameters such that $G^{- 1}$ is composed of the blocks

G^{- 1} = (\begin{matrix} {\tilde{G}}^{c, c} & {\tilde{G}}^{s, c} \\ {\tilde{G}}^{s, c} & {\tilde{G}}^{s, s} \end{matrix}),

(55)

Then, the projection filter for the natural parameters can be formally written as

d a = {\tilde{G}}^{c, c} (- \frac{1}{κ_{φ} + κ_{u}} k^{2} ⊙ η^{c o s} d t - \frac{κ_{u}}{κ_{φ} + κ_{u}} k ⊙ η^{s i n} \circ d U_{t}) + {\tilde{G}}^{s, c} (- \frac{1}{κ_{φ} + κ_{u}} k^{2} ⊙ η^{s i n} d t + \frac{κ_{u}}{κ_{φ} + κ_{u}} k ⊙ η^{c o s} \circ d U_{t}),

(56)

d b = {\tilde{G}}^{s, c} (- \frac{1}{κ_{φ} + κ_{u}} k^{2} ⊙ η^{c o s} d t - \frac{κ_{u}}{κ_{φ} + κ_{u}} k ⊙ η^{s i n} \circ d U_{t}) + {\tilde{G}}^{s, s} (- \frac{1}{κ_{φ} + κ_{u}} k^{2} ⊙ η^{s i n} d t + \frac{κ_{u}}{κ_{φ} + κ_{u}} k ⊙ η^{c o s} \circ d U_{t}),

(57)

where we denote with $k = {(1, \dots, k)}^{⊤}$ the vector of values k, k² results from the element-wise squaring of k, $η$ is the vector of expectation parameters, and $⊙$ the element-wise (Hadamard) product. This example demonstrates that our framework could, in principle, be applied to project the posterior to more general and inevitably more complicated densities, should the need arise. Although we do not show this here explicitly, an instance where this might increase filtering accuracy is one where the initial density $p_{0} (φ)$ is multimodal. The example also shows that, in general, a projection filtering approach might not be the most practical approach. Here in particular, computation of the Fisher matrix components might only be possible numerically, and in that case is computationally expensive. This highlights the need for a careful choice of the posterior density and its corresponding parametrization, where (ideally) expectations of sufficient statistics are available in closed-form.

B. Quasi continuous-time von Mises observations

So far we have focused on filtering algorithms that rely exclusively on observed angular increments. Such algorithms are bound to accumulate noise, such that their precision will decay to zero in the long run. To counteract this effect, let us now consider how to additionally include observations that are generated directly from the hidden state, rather than only its increments. Specifically, we will in this section propose an observation model for (quasi-)continuous time von Mises valued observations with $Z_{t} \in S^{1}$ , which we will refer to as direct angular observations, because they are governed by the hidden state $φ_{t}$ directly. This model will allow us to formulate a von Mises projection filter for both angular observations, and angular increment observations in the continuous-time circular filtering problem.

In classical continuous-time filtering settings, continuous-time observations Y_t are usually considered to follow a Gaussian diffusion process, whose drift component is governed by the hidden state X_t [23]. Equivalently, one could consider ‘time-discretized’ (or ‘quasi-continuous’) observations ${\tilde{Z}}_{t} : = \frac{d {\tilde{Y}}_{t}}{Δ t}$ with sampling time step $Δ t$ , according to

{\tilde{Z}}_{t} \sim N (h (X_{t}), σ_{z}^{2} Δ t^{- 1}),

(58)

which is the usual setting of discrete-time filtering (with fixed $Δ t$ ), with $h (x)$ being a potentially nonlinear function. Notably, the Fisher information $I (X_{t})$ about the state of the hidden variable X_t that is conveyed by these quasi-continuous observations ${\tilde{Z}}_{t}$ grows linearly with sampling time step $Δ t$ (seel Proposition 2 in Appendix V-D). The consequence of this scaling is rather intuitive: decreasing the sampling time step $Δ t$ will result in overall more observations per unit time, which, in turn, are individually less informative about the state X_t. This renders the information rate (information per unit time) independent of the chosen time step.

Analogously, we now consider observations that are drawn from a von Mises distribution centered around a nonlinear S¹ valued transformation $h : S^{1} \to S^{1}$ of the true hidden state $φ_{t}$ ,

Z_{t} \sim V M (h (φ_{t}), α (κ_{z}, Δ t)) .

(59)

We would like this observation model to have the same linear information scaling properties as the Gaussian observations encountered in the classical filtering problems, i.e. when hidden state noise and observation noise are uncorrelated. Thus, we need to choose the function $α (κ_{z}, Δ t)$ such that the information content about the state $φ_{t}$ scales linearly with step size $Δ t$ and observation precision $κ_{z}$ .

Theorem 2.

If $α (κ_{z}, Δ t)$ is chosen such that

α (κ_{z}, Δ t) = ξ^{- 1} (κ_{z} Δ t),

(60)

where $ξ^{- 1}$ is the inverse of $ξ (x) = x F (x)$ (and $F (x) = \frac{I_{1} (x)}{I_{0} (x)}$ as defined earlier), then the information about the state of the random variable $φ_{t}$ scales linearly with sampling time step and observation precision, i.e., $I (φ_{t}) \propto κ_{z} Δ t$ .

Proof.

The information content about the random variable $φ_{t}$ is given by the Fisher information

I (φ_{t}) = E_{Z_{t}} [{(\frac{\partial}{\partial ϕ} \log V M (Z_{t}; h (ϕ), α))}^{2} ∣ ϕ = φ_{t}]

(61)

= h^{'} {(φ_{t})}^{2} α \frac{I_{1} (α)}{I_{0} (α)} \propto α F (α) .

(62)

We require that the information content per time step $Δ t$ is constant and proportional in κ_z, which can be achieved if α varies with $Δ t$ and κ_z according to

α (κ_{z}, Δ t) F (α (κ_{z}, Δ t)) \propto κ_{z} Δ t .

(63)

This can be achieved if $α (κ_{z}, Δ t) = ξ^{- 1} (κ_{z} Δ t)$ , with

ξ (x) : = x \cdot \frac{I_{1} (x)}{I_{0} (x)} .

(64)

□

Once a step size $Δ t$ is chosen, $ξ^{- 1} (κ_{z} Δ t)$ can be computed numerically. For sufficiently small $κ_{z} Δ t$ , e.g., in the continuum limit, this function can be approximated by $ξ^{- 1} (κ_{z} Δ t) \approx \sqrt{2 κ_{z} Δ t}$ , while it becomes $ξ^{- 1} (κ_{z} Δ t) \approx κ_{z} Δ t$ for large $κ_{z} Δ t$ (Figure 2). The latter is consistent with the intuition that, for highly informative observations, the single observation likelihood is well approximated by a Gaussian (which breaks down in the limit $Δ t \to 0$ ).

Fig. 2. — Both a) the Fisher information per observation and b) the Fisher information rate are linear and constant, respectively, in the sampling time step $Δ t$ when using $α (κ_{z}, Δ t) = ξ^{- 1} (κ_{z} Δ t)$ (”Ideal”). For comparison, we also plot $α (κ_{z}, Δ t) = \sqrt{κ_{z} Δ t}$ (small κ_zΔt approximation,”Squareroot”) and $α (κ_{z}, Δ t) = κ_{z} Δ t$ (Gaussian approximation,”Linear”). c) Sample simulation with the constant position of the state $φ_{t} = φ$ estimated from quasi-continuous observations with different α functions and sampling time steps $Δ t$ . Ideally, the estimated precision *r_t* should be independent of the chosen simulation time step $Δ t$ . This is satisfied by all simulations except for the linear approximation for small $κ_{z} Δ t$ (dark orange), and the square root approximation for large $κ_{z} Δ t$ (light green). In these simulations, we used time units of seconds (s), and set $κ_{φ} = 100 / s$ and $κ_{z} = 100 / s$ (by design, *κ_z* has units of Fisher information per unit time), without loss of generality. Precision estimates were averaged over 10 simulation runs. Black and grey arrow in panel b) correspond to the two time step sizes shown in panel c).

What we have considered here is in essence a modified discrete time observation model, which implies that we can take advantage of filtering methods available for circular filtering with discrete-time observations [13, 12]. However, by allowing the precision $α (κ_{z}, Δ t)$ to vary with time step, we additionally ensure that the information rate stays constant: increasing the time step will result in more observations per unit time, which is accounted for by less informative individual observations. Thus, the observation model defined in (59) and (60) constitutes a quasi continuous-time observation model.

C. Adding quasi continuous-time observations to the circular projection filter

If the measurement function $h (φ)$ in (59) is the identity, we can add the direct observations to our filter by straightforwardly making use of Bayes’ theorem at every time step. Specifically, since we assumed our approximated (projected) density to be von Mises at all times, the measurement likelihood

p (Z_{t} ∣ φ_{t}) = V M (Z_{t}; φ_{t}, α (κ_{z}, Δ t))

(65)

is conjugate to the density before the update $p_{t^{-}} (φ) : = p (φ_{t} = φ ∣ Z_{0 : t - Δ T}) = V M (φ; μ_{t -}, κ_{t -})$ . In other words, the posterior $p_{t} (φ) : = p (φ_{t} = φ ∣ Z_{0 : t})$ is guaranteed to be a von Mises density as well:

p_{t} (φ) \propto p (Z_{t} ∣ φ_{t}) p_{t^{-}} (φ)

(66)

= \exp [{(\begin{matrix} \cos ϕ_{t} \\ \sin ϕ_{t} \end{matrix})}^{⊤} (α (κ_{z}, d t) (\begin{matrix} \cos Z_{t} \\ \sin Z_{t} \end{matrix}) + κ_{t} - (\begin{matrix} \cos μ_{t -} \\ \sin μ_{t -} \end{matrix}))] .

(67)

As expected from an exponential family distribution, the natural parameters $θ_{t} = κ_{t} {(\cos μ_{t}, \sin μ_{t})}^{⊤}$ are updated according to

θ_{t} = θ_{t -} + α (κ_{z}, Δ t) (\begin{matrix} \cos Z_{t} \\ \sin Z_{t} \end{matrix}) .

(68)

This operation is equivalent to a summation of vectors in $ℝ^{2}$ , where the natural parameters refer to Eucledian coordinates, and (μ, κ) are the corresponding polar coordinates (Figure 3a). In the continuum limit, we write

d θ_{t} = \sqrt{2 κ_{z} d t} (\begin{matrix} \cos Z_{t} \\ \sin Z_{t} \end{matrix}),

(69)

where we used that $α (κ_{z}, d t) \to \sqrt{2 κ_{z} d t}$ for $d t \to 0$ . A coordinate transform from θ to (μ, κ) recovers the update equations for mean μ_t and κ_t that result from quasi-continuous time observations

d μ_{t} = d \arctan (θ_{2}, θ_{1}) = \frac{\sqrt{2 κ_{z} d t}}{κ_{t}} \sin (Z_{t} - μ_{t}),

(70)

d κ_{t} = d \sqrt{θ_{1}^{2} + θ_{2}^{2}} = \sqrt{2 κ_{z} d t} \cos (Z_{t} - μ_{t}) .

(71)

Fig. 3. — a) Single quasi-continuous time update step (68) with angular observation *z_t*, where the length of the vector indicates observation reliability $α (κ_{z} d t)$ . The update step for Bayesian inference on the circle is equivalent to a vector addition in the 2D plane. The lower panel demonstrates that a conflicting observation leads to a decreased certainty of the estimate directly after the update, corresponding to a shorter vector. b) Empirical precision ${\hat{r}}_{t}$ (upper panel) and estimated precision *r_t* of the circular Kalman filter (circKF) and a Gaussian projection filter (Gauss filter), when compared to a particle filter, for different values of the observation precision *κ_z* at time $T = 10 κ_{φ}^{- 1}$ . Parameters were $κ_{φ} = 1$ and $κ_{u} = 1$ , times are in units of $κ_{φ}^{- 1}$ . c) Estimated versus empirical precision up to $T = 10 κ_{φ}^{- 1}$ for the different filters at $κ_{z} = 10$ . The precisions shown in b) and c) are averages across 5000 simulation runs.

The recovered update equations are appealingly simply for identity (or linear) observation functions, as such functions allow us to leverage Bayesian conjugacy properties. For nonlinear observation functions, in contrast, the posterior after the update is in general not in the same class of densities (and in this particular case not a von Mises distribution anymore). To apply the projection filtering framework to project such a nonlinear observation model back onto the desired manifold of densities, one would need to know the vector field for the density p_t that incorporates the observation-induced update, a derivation that is beyond the scope of this work.¹

Example 5

(The circular Kalman filter). Let us revisit angular path integration, which we introduced in Example 3 as a circular diffusion with observed angular increments, and extend it to include direct angular observations Z_t with likelihood (65). Such angular observations could, for instance, correspond to directly accessible angular cues, such as visual landmarks. Combining Eqs. (44) and (45) with Eqs. (70) and (71) to include the quasi-continuous updates results in

d μ_{t} = \frac{κ_{u}}{κ_{φ} + κ_{u}} \cdot d U_{t} + \frac{\sqrt{2 κ_{z} d t}}{κ_{t}} \sin (Z_{t} - μ_{t}),

(72)

d κ_{t} = - \frac{1}{2 (κ_{φ} + κ_{u})} F (κ_{t}) d t + \sqrt{2 κ_{z} d t} \cos (μ_{t} - Z_{t}) .

(73)

Due to the simplicity of these equations, as well as their structural similarity to the generalized Kalman filter, while taking full account of the circular state and measurement space, we coin the SDEs (72) and (73) the circular Kalman filter (circKF).

Considering that $\sqrt{2 κ_{z} d t}$ is modulated by the direct observation’s reliability $κ_{z}$ , the final terms in the filter’s update equations nicely reflect the reliability-weighting that is already present in classical filtering problems: more reliable direct observations, relative to the current certainty κ_t, have a stronger impact on the mean μ_t. Furthermore, if the current observation Z_t is similar to the current estimate μ_t, this estimate is hardly updated, while the estimate’s certainty κ_t increases. In contrast, if direct observations and current estimate are in conflict, the certainty κ_t might temporarily decrease (Figure 3a). What makes this last feature particularly interesting is that it only occurs in the circKF, but not in its Euclidean counterpart, the standard Kalman filter. In the latter an update induced by direct observations always leads to an increase in certainty. Numerically, the circKF features performance close to that of a particle filter over a wide range of parameters, while outperforming a Gaussian projection filter (Figure 3b, c). The reason for the deviation of the circKF from the optimal solution is that the von Mises distribution is still an approximation of the true posterior, which leads to slight deviations in the updates when the direct observations are integrated. These deviations are offset by the more than 10-fold decrease in computation time of the circKF that we observed in simulations: a single run in Figure 3c with the particle filter took 3.14±0.11s, while it only took 0.113±0.005s with the circKF on a MacBook Pro (Mid 2019) running 2.3 GHz 8-core Intel Core i9 using NumPy 1.19.2 on Python 3.9.1.

V. Conclusion

In this paper, we derived a continuous-time nonlinear filter with observed state increments, based on a projection filtering approach. Using this framework, we revisited the problem of probabilistic angular path integration. By additionally proposing a quasi continuous-time model for von Mises-valued direct observations, we were able to formulate a circular filtering algorithm that accounts for both increment and direct angular observations. Notably, this algorithm fulfills the following four conditions: (i) it operates on a circular state-space and (ii) in continuous time, (iii) it maintains a consistent representation during state propagation and observation update, as ensured by the projection filtering method, and (iv) it performs the proper integration of both increment and direct observations. Even though we have only fully worked out the algorithm for univariate circular filtering problems, we have formulated the overall projection filtering framework for more general multivariate problems. As by the results in [19], we expect our framework to carry over to multivariate circular filtering problems, such as reference vector tracking on the unit sphere. A possible shortcoming of this approach is that the class of generative models it can deal with is fairly limited. The generalized Kushner–Stratonovich equation is only valid if the error covariance of the observation process does not explicitly depend on the value of the hidden state. This constrains the generative model in the following way: first, we can only allow additive noise in the state process, as any multiplicative noise would enter the increment observation process U_t as state-dependent noise. Second, only linear transformations of state increments can be considered. As demonstrated in Example 4, another shortcoming of the projection-filtering method in general is that this approach is only computationally feasible for problems that are analytically accessible, i.e., where expectations under the projected density can be computed rapidly, or in closed form. In this paper, we thus focused on projected densities where these expectations could be efficiently computed.

Despite these limitations, the analytical accessibility and interpretability of our main result, in particular the circular Kalman filter, make it an attractive algorithm for unimodal circular filtering problems. First, it is straightforward to implement in software. Since its representation stays fixed through all times, it relies on only two equations which can be integrated straightforwardly, e.g. with an Euler–Maruyama scheme [30]. As we have seen in our numerical experiments, this makes this filter much faster than established methods, such as particle filters. Second, the interpretation of the dynamics are intuitively comprehensible. Third, since it is a continuous-time formulation, it automatically scales with respect to chosen sampling step size (as long as it is sufficiently small). This is an advantage over continuous-discrete filtering problems, which usually consider a fixed sampling step size, and need to be reformulated should the sampling rate in the observations change or vary across time. Lastly, animals navigate the world based on a continuous stream of sensory information, which motivates the use of continuous-time models when trying to understand how the brain operates under uncertainty. Thus, one possible application could be a conceptual description of how the brain performs angular path integration [31].

Acknowledgements

We would like to thank Melanie Basnak and Rachel Wilson (Harvard Medical School) for active and ongoing discussions that motivated the question of probabilistic path integration, and who helped us formulate the questions that we address in this manuscript. We further would like to thank Johannes Bill (Harvard Medical School) and Simone Surace (University of Bern), as well as the anonymous reviewers, for helpful feedback and suggestions on the manuscript.

Funding

A.K. was funded by the Swiss National Science Foundation (grant numbers P2ZHP2 184213 and P400PB 199242). J.D. and L.R. were supported by the Harvard Medical School Dean’s Initiative Award Program for Innovative Grants in the Basic and Social Sciences, and a grant from the National Institutes of Health (NIH/NINDS, 1R34NS123819).

APPENDIX

A. Nonlinear filtering in a nutshell

Let us briefly review the classical nonlinear filtering setting, i.e., filtering with observations that follow a diffusion process with noise uncorrelated to that of the hidden state process, in order to compare this with filtering with observed state increments. In line with standard literature [23], let $Y_{t}$ denote the process of $ℝ^{M}$ -valued direct observations. Then the generative model commonly referred to in classical nonlinear filtering is given by

d X_{t} = f (X_{t}, t) d t + Σ_{x}^{1 / 2} d W_{t},

(74)

d Y_{t} = h (X_{t}, t) d t + Σ_{y}^{1 / 2} d {\tilde{V}}_{t},

(75)

with ${\tilde{V}}_{t}$ and W_t independent standard Brownian motion processes, and $h : ℝ^{N} \times ℝ \to ℝ^{M}$ a potentially nonlinear, vector-valued observation function. All other quantities have been defined below (1). Note that the observations process is a diffusion with error covariance $Σ_{y} d t$ . The goal of nonlinear filtering is to compute dynamics of the posterior expectations $E [ϕ_{t}] : = E [ϕ (X_{t}) ∣ Y_{0 : t}]$ , where $Y_{0 : t} = {Y_{τ} : τ < t}$ denotes the filtration generated by the process Y_t. Formally, this is solved by the Kushner–Stratonovich equation (KSE, [23, Theorem 3.30]):

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + cov {(ϕ_{t}, h_{t})}^{T} Σ_{y}^{- 1} (d Y_{t} - E [h_{t}] d t),

(76)

with $cov (ϕ_{t}, h_{t}) = E [ϕ_{t}, h_{t}] - E [ϕ_{t}] E [h_{t}]$ .

B. Derivation of the generalized Kushner–Stratonovich equation for the posterior expectation (Eqs. (7) and (21))

Let us revisit the generative model in Eqs. (1) and (6):

d X_{t} = f (X_{t}, t) d t + Σ_{x}^{1 / 2} d W_{t},

(77)

d U_{t} = C f (X_{t}, t) d t + C Σ_{x}^{1 / 2} d W_{t} + Σ_{u}^{1 / 2} d V_{t},

(78)

with W_t and V_t independent vector-valued Brownian motion processes, as defined earlier. Here, we derive the gKSE (7) by treating this model as a correlated noise filtering problem, which allows us to directly apply established results [23, Corollary 3.38].

Lemma 1.

The generalized Kushner–Stratonovich equation (gKSE) for the evolution of the conditional expectation of a test function $E [ϕ_{t}] = E [ϕ (X_{t}) ∣ Y_{0 : t}]$ in the presence of state increment observations $d U_{t}$ is given by (It $\hat{o}$ form)

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} \cdot C^{T} {\tilde{Σ}}_{u}^{- 1} (d U_{t} - C E [f_{t}] d t) .

(79)

Proof.

To simplify calculations, we first require that the observations process has an error covariance that equals the identity, and thus we rescale the process $d U_{t}$ by ${\tilde{Σ}}_{u}^{1 / 2} = {(C Σ_{x} C^{T} + Σ_{u})}^{1 / 2}$ ,

d {\tilde{U}}_{t} = {\tilde{Σ}}_{u}^{- 1 / 2} d U_{t} .

(80)

Eqs. (77) and (80) define a filtering problem where the noise in hidden process X_t and observation process ${\tilde{U}}_{t}$ are correlated, with quadratic covariation

{[X, {\tilde{U}}^{T}]}_{t} = Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1 / 2} .

(81)

Thus, the result of [23, Corollary 3.38] (KSE for correlated noise filtering problems) is directly applicable to our problem. For more details on [23, Corollary 3.38], we kindly refer to the proof based on the innovations method provided therein. An alternative proof based on the change of measure approach is presented in [20].

Note that the KSE for correlated noise [23, Eq. 3.72] is similar to the classical KSE for uncorrelated noise problems (76), except for a correction term, given by a vector field $B [ϕ]$ ,

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + {(cov (ϕ_{t}, {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t}) + E [B [ϕ_{t}]])}^{T} \cdot (d \tilde{U} - {\tilde{Σ}}_{u} C E [f_{t}] d t) .

(82)

The vector field $B [ϕ]$ can be read out from the dynamics of the quadratic covariation between the process $ϕ_{t}$ and the rescaled observations process ${\tilde{U}}_{t}$ . Using Ito’s lemma, we can write:

d ϕ_{t} = {(\nabla ϕ_{t})}^{T} \cdot d X_{t} + \frac{1}{2} Tr (Σ_{x} H_{ϕ}) d t,

(83)

and thus

d {[ϕ, {\tilde{U}}^{T}]}_{t} = {(\nabla ϕ_{t})}^{T} d {[X, {\tilde{U}}^{T}]}_{t}

(84)

= {(\nabla ϕ_{t})}^{T} Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1 / 2} d t = : B {[ϕ_{t}]}^{T} d t .

(85)

Further identifying $h_{t} = {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t}$ , we find

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} \cdot C^{T} {\tilde{Σ}}_{u}^{- 1 / 2} (d {\tilde{U}}_{t} - {\tilde{Σ}}_{u}^{- 1 / 2} C E [f_{t}] d t) .

(86)

Rescaling $d {\tilde{U}}_{t} = {\tilde{Σ}}_{u}^{- 1 / 2} d U_{t}$ yields (79).

Remark 3.

By combining (76) and (79) it is possible to derive a generalized Kushner equation when both types of observations are present, i.e., when we consider both the process Y_t (Eq. 75) and the process U_t (Eq. 78) as the observations:

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t + cov {(ϕ_{t}, h_{t})}^{T} Σ_{y}^{- 1} (d Y_{t} - E [h_{t}] d t) + {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} C^{T} {\tilde{Σ}}_{u}^{- 1 / 2} \cdot (d {\tilde{U}}_{t} - {\tilde{Σ}}_{u}^{- 1 / 2} C E [f_{t}] d t) .

(87)

In this case, the expectations $E [\cdot]$ are with respect to the filtration $Y_{0 : t}$ and $U_{0 : t}$ , i.e., $E [ϕ_{t}] = E [ϕ (X_{t}) ∣ Y_{0 : t}, U_{0 : t}]$ .

Corollary 3.

In Stratonovich form, the generalized Kushner–Stratonovich equation (gKSE) for the evolution of the conditional expectation of a test function $E [ϕ_{t}] = E [ϕ (X_{t}) ∣ Y_{0 : t}]$ in the presence of increment observations $d U_{t}$ reads:

d E [ϕ_{t}] = [E [\tilde{L} [ϕ_{t}]] - \frac{1}{2} (cov (ϕ_{t}, {‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f ‖}_{t}^{2}) + Tr ({\tilde{Σ}}_{u}^{- 1} C \cdot cov (ϕ_{t}, J_{f}) Σ_{x} C^{T}))] d t + {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} C^{T} {\tilde{Σ}}_{u}^{- 1} \circ d U_{t},

(88)

with

E [\tilde{L} [ϕ_{t}]] = E [L [ϕ_{t}]] - E [{(\nabla ϕ_{t})}^{T} Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C f_{t}] - \frac{1}{2} Tr (Σ_{x} E [H_{ϕ}] Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C)

(89)

For one-dimensional problems with $N = M = 1$ , $C = c$ , $Σ_{x} = σ_{x}^{2}$ , $Σ_{u} = σ_{u}^{2}$ and ${\tilde{Σ}}_{u} = {\tilde{σ}}_{u}^{2} = c^{2} σ_{x}^{2} + σ_{u}^{2}$ , (88) simplifies to

d E [ϕ_{t}] = [\frac{σ_{u}^{2}}{{\tilde{σ}}_{u}^{2}} E [L [ϕ_{t}]] - \frac{c^{2}}{2 {\tilde{σ}}_{u}^{2}} (cov (ϕ_{t}, f_{t}^{2}) + σ_{x}^{2} cov (ϕ_{t}, \frac{\partial}{\partial x} f_{t}))] d t + \frac{c}{{\tilde{σ}}_{u}^{2}} [cov (ϕ_{t}, f_{t}) + σ_{x}^{2} E [\frac{\partial}{\partial x} ϕ_{t}]] \circ d U_{t} .

(90)

Proof.

We convert between Itô and Stratonovich calculus using the Wong–Zakai theorem [27]:

B_{t}^{T} \cdot d U_{t} = B_{t}^{T} \circ d U_{t} - \frac{1}{2} d {[B^{T}, U]}_{t},

(91)

where the symbol ◦ denotes Stratonovich calculus, and ${[B^{T}, Y]}_{t}$ is the quadratic covariation between the processes Y_t and B_t. We identify (cf. Eq. (79))

B_{t}^{T} = {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} C^{T} {\tilde{Σ}}_{u}^{- 1},

(92)

and write

d E [ϕ_{t}] = E [L [ϕ_{t}]] d t - B_{t}^{T} C E [f_{t}] d t + B_{t}^{T} \cdot d U_{t}

(93)

= E [L [ϕ_{t}]] d t - B_{t}^{T} C E [f_{t}] d t + B_{t}^{T} \circ d U_{t} - \frac{1}{2} d {[B^{T}, U]}_{t} .

(94)

To obtain the change in quadratic covariation $d {[B^{T}, U]}_{t}$ , it is helpful to find $d B_{t}$ by Itô‘s lemma

d B_{t}^{T} = (d E [ϕ_{t} f_{t}] - E [ϕ_{t}] d E [f_{t}] - d E [ϕ_{t}] E [f_{t}] - d E [ϕ_{t}] d E [f_{t}] + d E [\nabla ϕ_{t}] Σ_{x}) C^{T} {\tilde{Σ}}_{u}^{- 1} .

(95)

The evolution of the expectations is obtained by straightforward application of the gKSE (79), substituting $ϕ_{t}$ with the functions $ϕ_{t} f_{t}$ , f_t and $\frac{\partial}{\partial x} ϕ_{t}$ . This will result in terms that multiply $d U_{t}$ , which are those relevant for computing the change of the covariation process, $d {[B^{T}, U]}_{t}$ . Further note that the quadratic covariation of the observation process evolves according to $d {[U^{T}, U]}_{t} = Tr (Σ_{u}) d t$ . Some tedious but straightforward algebra and term rearrangements then result for the quadratic covariation in

d {[B^{T}, U]}_{t} = [cov (ϕ_{t}, {‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t} ‖}^{2}) + Tr (C^{T} {\tilde{Σ}}_{u}^{- 1} C \cdot cov (ϕ_{t}, J_{f}) Σ_{x}) + 2 cov ({(Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C f_{t})}^{T}, \nabla ϕ_{t}) - 2 cov (ϕ_{t}, f_{t}) C^{T} {\tilde{Σ}}_{u}^{- 1} C E {[f_{t}]}^{T} + Tr (Σ_{x} E [H_{ϕ}] Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C)] d t .

(96)

Plugging this into (94), and again some algebra, yields

d E [ϕ_{t}] = [E [L [ϕ_{t}]] - E [{(\nabla ϕ_{t})}^{T} Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C f_{t}] - \frac{1}{2} Tr (Σ_{x} E [H_{ϕ}] Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C)] d t - \frac{1}{2} [cov (ϕ_{t}, {‖ {\tilde{Σ}}_{u}^{- 1 / 2} C f_{t} ‖}^{2}) + Tr ({\tilde{Σ}}_{u}^{- 1} C \cdot cov (ϕ_{t}, J_{f}) Σ_{x} C^{T})] d t {(cov (ϕ_{t}, f_{t}) + Σ_{x} E [\nabla ϕ_{t}])}^{T} C^{T} {\tilde{Σ}}_{u}^{- 1} \circ d U_{t} .

(97)

This equation can be further simplified by noting that $I - Σ_{x} C^{T} Σ_{u}^{- 1} C = C^{- 1} Σ_{u} {\tilde{Σ}}_{u}^{- 1} C$ . Further, we can substitute

E [\tilde{L} [ϕ_{t}]] = E [L [ϕ_{t}]] - E [{(\nabla ϕ_{t})}^{T} Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C f_{t}] - \frac{1}{2} Tr (Σ_{x} E [H_{ϕ}] Σ_{x} C^{T} {\tilde{Σ}}_{u}^{- 1} C)

(98)

in the first line, which yields Eq. (88). Equation (90) follows from (88) as the 1D special case.

C. Fisher metric and scalar product

Proposition 1.

The scalar product defined in Eq. (13),

{〈 Z_{1}, Z_{2} 〉}_{θ} : = \int_{Ω} d x \frac{Z_{1} (x) Z_{2} (x)}{p_{θ} (x)}, Z_{1}, Z_{2} \in T_{θ} S,

(99)

is the scalar product on $T_{θ} S$ that is associated with the Fisher metric [26]

g_{i j} = E_{p_{θ}} [\frac{\partial \log p_{θ} (x)}{\partial θ_{i}} \frac{\partial \log p_{θ} (x)}{\partial θ_{j}}] .

(100)

Proof.

Consider Z₁ and Z₂ to correspond to two of the basis vectors of the tangent space $T_{θ} S$ , i.e., $Z_{1} = \frac{\partial p_{θ} (x)}{\partial θ_{i}}$ and $Z_{1} = \frac{\partial p_{θ} (x)}{\partial θ_{j}}$ . Then

{〈 \frac{\partial p_{θ}}{\partial θ_{i}}, \frac{\partial p_{θ}}{\partial θ_{j}} 〉}_{θ} = \int_{Ω} d x \frac{1}{p_{θ} (x)} \frac{\partial p_{θ} (x)}{\partial θ_{i}} \frac{\partial p_{θ} (x)}{\partial θ_{j}}

(101)

= \int_{Ω} d x \frac{\partial \log p_{θ} (x)}{\partial θ_{i}} \frac{\partial \log p_{θ} (x)}{\partial θ_{j}} p_{θ} (x)

(102)

= E [\frac{\partial \log p_{θ} (x)}{\partial θ_{i}} \frac{\partial \log p_{θ} (x)}{\partial θ_{j}}] = g_{i j} .

(103)

This concludes the proof. □

D. Information scaling

Proposition 2.

The Fisher information $I (X_{t})$ about the hidden state variable $X_{t}$ that is conveyed by Gaussian-type discrete-time observations,

{\tilde{Z}}_{t} \sim N ({\tilde{Z}}_{t}; g (X_{t}), Δ t^{- 1}),

(104)

grows linearly with the time step $Δ t$ .

Proof.

The information content about the variable $φ_{t}$ that is conveyed by the observation ${\tilde{Z}}_{t}$ is given by the Fisher information

I (X_{t}) = E_{{\tilde{Z}}_{t}} [{(\frac{\partial}{\partial x} \log N ({\tilde{Z}}_{t}; g (x), \frac{σ_{z}^{2}}{Δ t}))}^{2} ∣ x = X_{t}] = \frac{g^{'} {(x)}^{2}}{σ_{z}^{4}} {(Δ t)}^{2} E_{{\tilde{Z}}_{t}} [{({\tilde{Z}}_{t} - g (x))}^{2}] = \frac{g^{'} {(x)}^{2}}{σ_{z}^{2}} Δ t \propto Δ t .

□

E. Details on numerical simulations

Our numerical simulations in Figures 1 and 3, corresponding to Examples 3 and 5, were based on artificial data generated from the true model equations. In particular, the “true” state $φ_{t}$ is a single trajectory from Eq. (34), and observations are drawn at each time point from Eq. (35) and, in the case of Example 5, additionally from Eq. (65). To simulate trajectories and observations, we use the Euler–Maruyama approximation [30]. In this approximation, the time-discretized generative model with fixed time step size $Δ t$ in Examples 3 and 5 reads:

φ_{t} \sim N (φ_{t - Δ t}, \frac{1}{κ_{φ}} Δ t) \mod 2 π

(105)

Δ U_{t} \sim N (φ_{t} - φ_{t - Δ t}, \frac{1}{κ_{u}} Δ t)

(106)

Z_{t} \sim V M (φ_{t}, ξ^{- 1} (κ_{z} Δ t)) .

(107)

The same time-discretization scheme was used to numerically integrate the SDEs (44), (45), (72) and (73) for the von Mises parameters μ_t and κ_t. Unless stated otherwise, we used $Δ t = 0.01$ in all our numerical simulations, and give times in units of κ_ϕ.

F. Benchmarks for numerical simulations

For our numerical simulations in Figures 1 and 3, corresponding to Examples 3 and 5, we used the following filtering algorithms to compare the circKF against.

1). Particle Filter:

As benchmark, we used a Sequential Importance Sampling/Resampling particle filter (SIS-PF, [32]), that was modified to account for state increment observations $Δ U_{t}$ .

The N particles in the SIS-PF where propagated according to

π (φ_{t}^{(j)} ∣ φ_{t - Δ t}^{(j)}, Δ U_{t}) = N (φ_{t - Δ t}^{(j)} + \frac{κ_{u}}{κ_{u} + κ_{φ}} Δ U_{t}, \frac{1}{κ_{φ} + κ_{u}} Δ t) \mod 2 π,

(108)

and each particle j was weighted at each time step according to

w_{t}^{(j)} = w_{t - Δ t}^{(j)} \cdot V M (Z_{t}; φ_{t}^{(j)}, ξ^{- 1} (κ_{z} Δ t)),

(109)

yielding an SIS for this model that is asymptotically exact in the $N \to \infty$ limit. Mean μ_t and precision r_t of the filtering distribution were determined at each time step according to a weighted average on the circle, i.e. the first circular moment:

r_{t} \exp (i μ_{t}) = \sum_{j = 1}^{N} w_{t}^{(i)} \exp (i φ_{t}^{(i)}) .

(110)

For a von Mises distribution, the radius r of the first circular moment and the precision parameter κ are related via $r = \frac{I_{1} (κ)}{I_{0} (κ)}$ , which is why we use r rather than κ in our plots.

In our simulations, we used N = 10³ if direct angular observations Z_t were present, and N = 10⁴ if only state increment observations were present. We re-sampled the particles whenever the effective number of particles, $N_{eff} = \sum_{j} {(w^{(i)})}^{- 2}$ , was lower than N/2.

2). Gaussian approximation:

The reference filter, which we refer to as “Gauss filter”, is a heuristic method that assumes posterior mean μ_t and variance σ_t to evolve according to a generalized Kalman–Bucy filter (Eqs. (31) and (32)). Such a filter is often referred to as “assumed density filter” (ADF) in the literature, which under certain conditions, such as for the circular filtering problem we consider here, becomes fully equivalent to a Gaussian projection filter (see [18, Section 7] for in-depth discussion). In order to make the resulting distribution circular, this Gaussian is consecutively approximated by a von Mises distribution via $κ_{t} \approx σ_{t}^{- 2}$ , resulting in the following update equations for the model in Example 3:

d μ_{t} = \frac{κ_{u}}{κ_{φ} + κ_{u}} d U_{t},

(111)

d κ_{t} = d (\frac{1}{σ_{t}^{2}}) = - \frac{1}{κ_{φ} + κ_{u}} \cdot \frac{1}{κ_{t}^{2}} d t .

(112)

For the model used in Example 5, this is combined with the observations in the same way as for the circKF. Note that in absence of direct angular observations z_t, the mean dynamics are the same as for the circKF, while κ_t deviates (as shown in Figure 1). When direct angular observations are present, this, in turn, affects the computation of the update in the mean dynamics, which leads to a worse numerical performance than the circKF.

G. Code availability

Jupyter notebooks to generate Figs. 1–3, the underlying simulation data as well as Python scripts to generate this data has been deposited at Zenodo, and is publicly available at https://doi.org/10.5281/zenodo.5820406.

Footnotes

As outlined earlier, for a Gaussian setting this would be the update part of the Kushner–Stratonovich equation. To the best of our knowledge, no such equation exists for von Mises-valued observations.

References

[1].Heinze S, Narendra A, and Cheung A, “Principles of insect path integration,” Current biology : CB, vol. 28, no. 17, pp. R1043–R1058, Sep 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6462409/ [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Kreiser R, Cartiglia M, Martel JNP, Conradt J, and Sandamirskaya Y, “A Neuromorphic Approach to Path Integration: A Head-Direction Spiking Neural Network with Vision-driven Reset,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5, iSSN: 2379–447X. [Google Scholar]
[3].Taube JS, Muller RU, and Ranck JB, “Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis.” The Journal of Neuroscience, vol. 10, no. 2, pp. 420–35, 1990. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/2303851 [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Seelig JD and Jayaraman V, “Neural dynamics for landmark orientation and angular path integration,” Nature, vol. 521, no. 7551, pp. 186–191, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Skaggs WE, Knierim JJ, Kudrimoti HS, and McNaughton BL, “A Model of the Neural Basis of the Rat’s Sense of Direction,” Advances in neural information processing systems, p. 10, 1995. [PubMed] [Google Scholar]
[6].Turner-Evans D, Wegener S, Rouault H, Franconville R, Wolff T, Seelig JD, Druckmann S, and Jayaraman V, “Angular velocity integration in a fly heading circuit,” eLife, vol. 6, p. e23496, May 2017. [Online]. Available: https://elifesciences.org/articles/23496 [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Xie X, Hahnloser RH, and Seung HS, “Double-ring network model of the head-direction system,” Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 66, no. 4, pp. 9–9, 2002. [DOI] [PubMed] [Google Scholar]
[8].Ma WJ, Beck JM, Latham PE, and Pouget A, “Bayesian inference with probabilistic population codes.” Nature Neuroscience, vol. 9, no. 11, pp. 1432–8, Nov 2006. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/17057707 [DOI] [PubMed] [Google Scholar]
[9].Fiser J, Berkes P, Orban G, and Lengyel M, “Statisticallý optimal perception and learning: from behavior to neural representations,” Trends in Cognitive Sciences, vol. 14, no. 3, pp. 119–130, Mar 2010. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1364661310000045 [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Kalman RE, “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the ASME Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960. [Online]. Available: http://fluidsengineering.asmedigitalcollection.asme.org/article.aspx?articleid=1430402 [Google Scholar]
[11].Kalman RE and Bucy RS, “New Results in Linear Filtering and Prediction Theory,” Journal of Basic Engineering, vol. 83, no. 1, pp. 95–108, Mar 1961. [Online]. Available: https://asmedigitalcollection.asme.org/fluidsengineering/article/83/1/95/426820/New-Results-in-Linear-Filtering-and-Prediction [Google Scholar]
[12].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive Bayesian filtering in circular state spaces,” IEEE Aerospace and Electronic Systems Magazine, vol. 31, no. 3, pp. 70–87, Mar 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7475421 [Google Scholar]
[13].Azmani M, Reboul S, Choquel JB, and Benjelloun M, “A recursive fusion filter for angular data,” 2009 IEEE International Conference on Robotics and Biomimetics, ROBIO 2009, pp. 882–887, 2009, publisher: IEEE. [Google Scholar]
[14].Traa J. and Smaragdis P, “A Wrapped Kalman Filter for Azimuthal Speaker Tracking,” IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1257–1260, 2013. [Google Scholar]
[15].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive nonlinear filtering for angular data based on circular distributions,” 2013 American Control Conference, pp. 5439–5445, 2014. [Google Scholar]
[16].Tronarp F, Hostettler R, and Sarkk a S, “Continuous-Discrete von Mises-Fisher Filtering on S2 for Reference Vector Tracking,” 2018 21st International Conference on Information Fusion, FUSION 2018, no. July, pp. 1345–1352, 2018. [Google Scholar]
[17].Hanzon B. and Hut R, “New Results on the Projection Filter,” Research Memorandum, vol. 105, no. March, pp. 9457–9475, 1991. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0377221796004043 [Google Scholar]
[18].Brigo D, Hanzon B, and Le Gland F, “Approximate nonlinear filtering by projection on exponential manifolds of densities,” Bernoulli, vol. 5, no. 3, pp. 495–534, 1999. [Online]. Available: http://projecteuclid.org/euclid.bj/1172617201 [Google Scholar]
[19].Tronarp F. and Sarkk a S, “Continuous-Discrete Filtering and Smoothing on Submanifolds of Euclidean Space,” arXiv:2004.09335 [math, stat], Apr. 2020, arXiv: 2004.09335. [Online]. Available: http://arxiv.org/abs/2004.09335 [Google Scholar]
[20].Nusken N, Reich S, and Rozdeba PJ, “State and Parameter Estimation from Observed Signal Increments,” Entropy, vol. 21, no. 5, p. 505, May 2019. [Online]. Available: https://www.mdpi.com/1099-4300/21/5/505 [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Stratonovich RL, “Conditional Markov Processes,” Theory of Probability & Its Applications, vol. 5, no. 2, pp. 156–178, Jan 1960. [Online]. Available: http://epubs.siam.org/doi/10.1137/1105015 [Google Scholar]
[22].Kushner HJ, “On the Differential Equations Satisfied by Conditional Probablitity Densities of Markov Processes, with Applications,” Journal of the Society for Industrial and Applied Mathematics Series A Control, vol. 2, no. 1, pp. 106–119, Jan 1964. [Online]. Available: http://epubs.siam.org/doi/10.1137/0302009 [Google Scholar]
[23].Bain A. and Crisan D, Fundamentals of stochastic filtering, ser. Stochastic modelling and applied probability. New York: Springer, 2009, no. 60, oCLC: ocn213479403. [Google Scholar]
[24].Brigo D, Hanzon B, and LeGland F, “A differential geometric approach to nonlinear filtering: the projection filter,” IEEE Transactions on Automatic Control, vol. 43, no. 2, pp. 247–252, 1998. [Online]. Available: http://ieeexplore.ieee.org/document/661075/ [Google Scholar]
[25].van Handel R. and Mabuchi H, “Quantum projection filter for a highly nonlinear model in cavity QED,” Journal of Optics B: Quantum and Semiclassical Optics, vol. 7, no. 10, pp. S226–S236, Oct 2005. [Online]. Available: https://iopscience.iop.org/article/10.1088/1464-4266/7/10/005 [Google Scholar]
[26].Amari S, Differential-geometrical methods in statistics, 2nd ed., ser. Lecture notes in statistics. Berlin; New York: SpringerVerlag, 1990, no. 28. [Google Scholar]
[27].Wong E. and Zakai M, “On the relation between ordinary and stochastic differential equations,” International Journal of Engineering Science, vol. 3, no. 2, pp. 213–229, Jul 1965. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/0020722565900455 [Google Scholar]
[28].Gatto R. and Jammalamadaka SR, “The generalized von Mises distribution,” Statistical Methodology, vol. 4, no. 3, pp. 341–353, Jul 2007. [Online]. Available: https://escholarship.org/uc/item/8m91b0p7 [Google Scholar]
[29].Gatto R, “Some computational aspects of the generalized von Mises distribution,” Statistics and Computing, vol. 18, no. 3, pp. 321–331, Sep 2008. [Google Scholar]
[30].Kloeden PE and Platen E, Numerical solution of stochastic differential equations, corr. 3. print ed., ser. Applications of mathematics. Berlin: Springer, 2010, no. 23. [Google Scholar]
[31].Kutschireiter A, Basnak MA, Wilson RI, and Drugowitsch J, “A Bayesian perspective on the ring attractor for heading-direction tracking in the Drosophila central complex,” Tech. Rep., Dec. 2021, bioRxiv, Cold Spring Harbor Laboratory, Section: New Results Type: article. [Online]. Available: https://www.biorxiv.org/content/10.1101/2021.12.17.473253v1 [Google Scholar]
[32].Doucet A, Godsill S, and Andrieu C, “On sequential Monte Carlo sampling methods for Bayesian filtering,” Statistics and Computing, p. 12, 2010. [Google Scholar]

[R1] [1].Heinze S, Narendra A, and Cheung A, “Principles of insect path integration,” Current biology : CB, vol. 28, no. 17, pp. R1043–R1058, Sep 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6462409/ [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Kreiser R, Cartiglia M, Martel JNP, Conradt J, and Sandamirskaya Y, “A Neuromorphic Approach to Path Integration: A Head-Direction Spiking Neural Network with Vision-driven Reset,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5, iSSN: 2379–447X. [Google Scholar]

[R3] [3].Taube JS, Muller RU, and Ranck JB, “Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis.” The Journal of Neuroscience, vol. 10, no. 2, pp. 420–35, 1990. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/2303851 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Seelig JD and Jayaraman V, “Neural dynamics for landmark orientation and angular path integration,” Nature, vol. 521, no. 7551, pp. 186–191, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Skaggs WE, Knierim JJ, Kudrimoti HS, and McNaughton BL, “A Model of the Neural Basis of the Rat’s Sense of Direction,” Advances in neural information processing systems, p. 10, 1995. [PubMed] [Google Scholar]

[R6] [6].Turner-Evans D, Wegener S, Rouault H, Franconville R, Wolff T, Seelig JD, Druckmann S, and Jayaraman V, “Angular velocity integration in a fly heading circuit,” eLife, vol. 6, p. e23496, May 2017. [Online]. Available: https://elifesciences.org/articles/23496 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Xie X, Hahnloser RH, and Seung HS, “Double-ring network model of the head-direction system,” Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 66, no. 4, pp. 9–9, 2002. [DOI] [PubMed] [Google Scholar]

[R8] [8].Ma WJ, Beck JM, Latham PE, and Pouget A, “Bayesian inference with probabilistic population codes.” Nature Neuroscience, vol. 9, no. 11, pp. 1432–8, Nov 2006. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/17057707 [DOI] [PubMed] [Google Scholar]

[R9] [9].Fiser J, Berkes P, Orban G, and Lengyel M, “Statisticallý optimal perception and learning: from behavior to neural representations,” Trends in Cognitive Sciences, vol. 14, no. 3, pp. 119–130, Mar 2010. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1364661310000045 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Kalman RE, “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the ASME Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960. [Online]. Available: http://fluidsengineering.asmedigitalcollection.asme.org/article.aspx?articleid=1430402 [Google Scholar]

[R11] [11].Kalman RE and Bucy RS, “New Results in Linear Filtering and Prediction Theory,” Journal of Basic Engineering, vol. 83, no. 1, pp. 95–108, Mar 1961. [Online]. Available: https://asmedigitalcollection.asme.org/fluidsengineering/article/83/1/95/426820/New-Results-in-Linear-Filtering-and-Prediction [Google Scholar]

[R12] [12].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive Bayesian filtering in circular state spaces,” IEEE Aerospace and Electronic Systems Magazine, vol. 31, no. 3, pp. 70–87, Mar 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7475421 [Google Scholar]

[R13] [13].Azmani M, Reboul S, Choquel JB, and Benjelloun M, “A recursive fusion filter for angular data,” 2009 IEEE International Conference on Robotics and Biomimetics, ROBIO 2009, pp. 882–887, 2009, publisher: IEEE. [Google Scholar]

[R14] [14].Traa J. and Smaragdis P, “A Wrapped Kalman Filter for Azimuthal Speaker Tracking,” IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1257–1260, 2013. [Google Scholar]

[R15] [15].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive nonlinear filtering for angular data based on circular distributions,” 2013 American Control Conference, pp. 5439–5445, 2014. [Google Scholar]

[R16] [16].Tronarp F, Hostettler R, and Sarkk a S, “Continuous-Discrete von Mises-Fisher Filtering on S2 for Reference Vector Tracking,” 2018 21st International Conference on Information Fusion, FUSION 2018, no. July, pp. 1345–1352, 2018. [Google Scholar]

[R17] [17].Hanzon B. and Hut R, “New Results on the Projection Filter,” Research Memorandum, vol. 105, no. March, pp. 9457–9475, 1991. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0377221796004043 [Google Scholar]

[R18] [18].Brigo D, Hanzon B, and Le Gland F, “Approximate nonlinear filtering by projection on exponential manifolds of densities,” Bernoulli, vol. 5, no. 3, pp. 495–534, 1999. [Online]. Available: http://projecteuclid.org/euclid.bj/1172617201 [Google Scholar]

[R19] [19].Tronarp F. and Sarkk a S, “Continuous-Discrete Filtering and Smoothing on Submanifolds of Euclidean Space,” arXiv:2004.09335 [math, stat], Apr. 2020, arXiv: 2004.09335. [Online]. Available: http://arxiv.org/abs/2004.09335 [Google Scholar]

[R20] [20].Nusken N, Reich S, and Rozdeba PJ, “State and Parameter Estimation from Observed Signal Increments,” Entropy, vol. 21, no. 5, p. 505, May 2019. [Online]. Available: https://www.mdpi.com/1099-4300/21/5/505 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Stratonovich RL, “Conditional Markov Processes,” Theory of Probability & Its Applications, vol. 5, no. 2, pp. 156–178, Jan 1960. [Online]. Available: http://epubs.siam.org/doi/10.1137/1105015 [Google Scholar]

[R22] [22].Kushner HJ, “On the Differential Equations Satisfied by Conditional Probablitity Densities of Markov Processes, with Applications,” Journal of the Society for Industrial and Applied Mathematics Series A Control, vol. 2, no. 1, pp. 106–119, Jan 1964. [Online]. Available: http://epubs.siam.org/doi/10.1137/0302009 [Google Scholar]

[R23] [23].Bain A. and Crisan D, Fundamentals of stochastic filtering, ser. Stochastic modelling and applied probability. New York: Springer, 2009, no. 60, oCLC: ocn213479403. [Google Scholar]

[R24] [24].Brigo D, Hanzon B, and LeGland F, “A differential geometric approach to nonlinear filtering: the projection filter,” IEEE Transactions on Automatic Control, vol. 43, no. 2, pp. 247–252, 1998. [Online]. Available: http://ieeexplore.ieee.org/document/661075/ [Google Scholar]

[R25] [25].van Handel R. and Mabuchi H, “Quantum projection filter for a highly nonlinear model in cavity QED,” Journal of Optics B: Quantum and Semiclassical Optics, vol. 7, no. 10, pp. S226–S236, Oct 2005. [Online]. Available: https://iopscience.iop.org/article/10.1088/1464-4266/7/10/005 [Google Scholar]

[R26] [26].Amari S, Differential-geometrical methods in statistics, 2nd ed., ser. Lecture notes in statistics. Berlin; New York: SpringerVerlag, 1990, no. 28. [Google Scholar]

[R27] [27].Wong E. and Zakai M, “On the relation between ordinary and stochastic differential equations,” International Journal of Engineering Science, vol. 3, no. 2, pp. 213–229, Jul 1965. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/0020722565900455 [Google Scholar]

[R28] [28].Gatto R. and Jammalamadaka SR, “The generalized von Mises distribution,” Statistical Methodology, vol. 4, no. 3, pp. 341–353, Jul 2007. [Online]. Available: https://escholarship.org/uc/item/8m91b0p7 [Google Scholar]

[R29] [29].Gatto R, “Some computational aspects of the generalized von Mises distribution,” Statistics and Computing, vol. 18, no. 3, pp. 321–331, Sep 2008. [Google Scholar]

[R30] [30].Kloeden PE and Platen E, Numerical solution of stochastic differential equations, corr. 3. print ed., ser. Applications of mathematics. Berlin: Springer, 2010, no. 23. [Google Scholar]

[R31] [31].Kutschireiter A, Basnak MA, Wilson RI, and Drugowitsch J, “A Bayesian perspective on the ring attractor for heading-direction tracking in the Drosophila central complex,” Tech. Rep., Dec. 2021, bioRxiv, Cold Spring Harbor Laboratory, Section: New Results Type: article. [Online]. Available: https://www.biorxiv.org/content/10.1101/2021.12.17.473253v1 [Google Scholar]

[R32] [32].Doucet A, Godsill S, and Andrieu C, “On sequential Monte Carlo sampling methods for Bayesian filtering,” Statistics and Computing, p. 12, 2010. [Google Scholar]

PERMALINK

Projection Filtering with Observed State Increments with Applications in Continuous-Time Circular Filtering

Anna Kutschireiter

Luke Rast

Jan Drugowitsch

Abstract

I. Introduction

II. The filtering problem with increment observations

Remark 1.

III. Projection filtering for observed continuous-time state increments

A. The general projection filtering method

Remark 2.

B. Projection with observed state increments

Theorem 1.

Proof.

Corollary 1.

C. Projection on exponential family distributions

Corollary 2.

Proof.

Example 1

Example 2

IV. Continuous-time circular filtering

A. Assuming observed angular increments only

Example 3

Fig. 1. Circular filtering with observed angular increments.

Example 4

B. Quasi continuous-time von Mises observations

Theorem 2.

Proof.

Fig. 2. Time scaling of quasi-continuous angular observations.

C. Adding quasi continuous-time observations to the circular projection filter

Fig. 3. Filtering with quasi-continuous time angular observations.

Example 5

V. Conclusion

Acknowledgements

APPENDIX

A. Nonlinear filtering in a nutshell

B. Derivation of the generalized Kushner–Stratonovich equation for the posterior expectation (Eqs. (7) and (21))

Lemma 1.

Proof.

Remark 3.

Corollary 3.

Proof.

C. Fisher metric and scalar product

Proposition 1.

Proof.

D. Information scaling

Proposition 2.

Proof.

E. Details on numerical simulations

F. Benchmarks for numerical simulations

1). Particle Filter:

2). Gaussian approximation:

G. Code availability

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases