Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 14.
Published in final edited form as: IEEE Trans Signal Process. 2022 Jan 14;70:686–700. doi: 10.1109/tsp.2022.3143471

Projection Filtering with Observed State Increments with Applications in Continuous-Time Circular Filtering

Anna Kutschireiter 1,*, Luke Rast 1, Jan Drugowitsch 1
PMCID: PMC9634992  NIHMSID: NIHMS1777826  PMID: 36338544

Abstract

Angular path integration is the ability of a system to estimate its own heading direction from potentially noisy angular velocity (or increment) observations. Non-probabilistic algorithms for angular path integration, which rely on a summation of these noisy increments, do not appropriately take into account the reliability of such observations, which is essential for appropriately weighing one’s current heading direction estimate against incoming information. In a probabilistic setting, angular path integration can be formulated as a continuous-time nonlinear filtering problem (circular filtering) with observed state increments. The circular symmetry of heading direction makes this inference task inherently nonlinear, thereby precluding the use of popular inference algorithms such as Kalman filters, rendering the problem analytically inaccessible. Here, we derive an approximate solution to circular continuous-time filtering, which integrates state increment observations while maintaining a fixed representation through both state propagation and observational updates. Specifically, we extend the established projection-filtering method to account for observed state increments and apply this framework to the circular filtering problem. We further propose a generative model for continuous-time angular-valued direct observations of the hidden state, which we integrate seamlessly into the projection filter. Applying the resulting scheme to a model of probabilistic angular path integration, we derive an algorithm for circular filtering, which we term the circular Kalman filter. Importantly, this algorithm is analytically accessible, interpretable, and outperforms an alternative filter based on a Gaussian approximation.

Keywords: Bayesian methods, nonlinear filtering, circular filtering, sensor fusion, continuous-time estimation, stochastic processes

I. Introduction

A compass is an immensely useful tool for a traveler trying to find their way in a barren and featureless landscape. Absent such a tool, the traveler must employ dead reckoning, using what they roughly know about how often and how much they have turned, and summing up those turns to maintain an internal sense of their spatial orientation. Angular path integration, i.e., estimation of heading direction or orientation based on angular self-motion cues, plays an essential role in spatial navigation of humans, other animals and robots [1, 2]. Imperfect sensors make angular path integration an inherently noisy process, and inevitably lead to an accumulation of error in the heading estimate over time. Other cues, such as those from visual landmarks, can help correct the estimate’s error, despite being also noisy and ambiguous. Importantly, properly combining path integration with these external cues requires a reliability-weighted update of the orientation estimate. Computing with uncertainties in such a strategic way is a hallmark of dynamic Bayesian inference, and calls for a probabilistic description. Our goal in this work is to derive a dynamic probabilistic algorithm for angular path integration.

It is well known that many organisms are able to maintain an internal compass which they update by self-motion cues. Since the discovery of orientation-selective head-direction cells in rats [3], and, more recently, the heading direction circuit in Drosophila [4], theoretical efforts to unravel the mechanism of angular path integration in the brain have highlighted the role of angular velocity observations of self-initiated turns [5, 6], e.g., from proprioceptive or vestibular feedback, or from visual flow. Current theories suggest that these biological systems implement angular path integration by neural network motifs called ring attractors [7, 6]. Such networks maintain a heading direction estimate through sustained neural activity, but lack the ability to simultaneously represent the estimate’s certainty. However, the question of whether such biological systems indeed only operate with single point estimates or instead perform probabilistic inference has been hampered by the lack of a probabilistic algorithm for path integration in the brain. The reason for this is the complex set of conditions that are required of such an algorithm: (i) the state-space is circular, (ii) the path integration must operate with a continuous time stream of inputs, (iii) it must maintain a fixed representation of the underlying probability distribution, in line with the expectation that, within a certain area of computation, the brain maintains a similarly fixed representation e.g., in terms of a parametric [8] or sampling-based [9] representation, and (iv) in addition to angular velocity observations, (noisy) direct angular observations (e.g., visual landmarks) may be present, which need to be integrated accordingly.

One approach to designing an algorithm that satisfies these conditions is to consider it in the broader context of continuous-time filtering, which aims to continuously update a posterior distribution over a dynamically evolving hidden state variable from noisy observational data. From condition (i), a circular state-space implies that the underlying filtering task is nonlinear, which precludes the use of popular linear schemes such as the Kalman filter [10, 11]. Furthermore, the solution to this so-called circular filtering problem is analytically intractable [12], and needs to be approximated. The approximation we derive here goes beyond existing circular filtering algorithms [13, 14, 15] (see also the review by Kurz et al. [12]) by both considering increment observations — like observations of angular velocity — and by addressing a continuous stream of observations (condition (ii)). Furthermore, while these previous approaches changed the representations used between prediction and update steps, ours uses a fixed representation, satisfying our condition (iii).

A recent promising approach to circular filtering [16] supports continuous-time state transitions, but is limited to discrete-time and direct (rather than increment) observations. Their method is based on projection filtering [17, 18], a rigorous approach that combines nonlinear filtering with information geometry. The idea behind projection filtering is to approximate the posterior between consecutive discrete-time observations with a parametric distribution. This approximation is chosen to minimize the distance between the true and the approximated posterior, as measured by the Fisher metric. By updating only the values of the parameters, this approach automatically keeps a fixed representation for the posterior in terms of a parametric distribution. Furthermore, if the approximated posterior and the emission probabilities of the observations are conjugate, updating posterior parameters in light of further discrete-time direct observations is straightforward. Projection filtering can be generalized to hidden processes that evolve on arbitrary submanifolds of Euclidean space [19], which makes it applicable to the circular filtering problem. Two challenges hamper the direct application of projection filters to angular path integration. First, no variants currently exist that handle increment observations, either in discrete or continuous time, as we would require to process angular velocity observations. In fact, increment observations have generally received little attention in the filtering literature (but see [20]). Second, there is currently no framework that combines projection-filtering with angular-valued continuous-time observations. We will address both challenges in this work.

We introduce a novel continuous-time nonlinear filtering algorithm based on projection filtering, that includes increment observations and can be applied to circular filtering and to angular path integration, and that meets conditions (i)-(iv) outlined above. To do so, we first describe the general nonlinear filtering problem with observed increment observations in Euclidean space in Section II. In Section III, we review the projection filtering framework [17, 18] as an approximate solution for nonlinear filtering, and extend this approach to account for increment observations. We demonstrate that applying this framework to a linear filtering problem recovers the generalized Kalman filter. In Section IV, we revisit the continuous-time circular filtering problem. Therein, we first derive a probabilistic algorithm for angular path integration, i.e., when only increment observations are present. We then account for direct angular-valued observations, in addition to increment observations, by proposing a generative model based on a constant information-rate criterion that supports seamless inclusion into the filtering algorithm. Combining all of the above, we finally retrieve, as a special case of the general framework, a circular filtering algorithm for Gaussian-type increment and angular direct observations, which we term the circular Kalman filter. We demonstrate in numerical simulations that this algorithm performs comparably to an asymptotically-exact particle filter, and outperforms a Gaussian approximation in the estimation of both heading direction as well as its associated certainty.

II. The filtering problem with increment observations

We consider multivariate filtering with observations generated by increments of the hidden state, rather than the hidden state itself. We assume that the hidden state variable XtN evolves according to a stochastic differential equation (SDE) of the form:

dXt=f(Xt,t)dt+Σx1/2dWt, (1)

with Wt an N-Brownian motion (BM) process, a vector-valued drift function f:N×N, and a matrix x1/2N×N, which determines the error covariance of the hidden state process. In the following, we will use the shorthand ft(x):=f(x,t) or skip its argument completely (whenever there is no notational ambiguity). The density p˜t(x):=p(Xt=x) of this stochastic process evolves according to a partial differential equation, the Fokker–Planck equation (FPE):

dp˜t=L[p˜t]dt, (2)

with

L[p˜t(x)]=i=1Nxi(fi(x,t)p˜t(x))+12i,jN(Σx)ij2xixjp˜t(x). (3)

Equivalently, expectations of a scalar test function ϕt:=ϕ(Xt) with respect to p˜t(x) evolve according to

dE[ϕt]=E[L[ϕt]]dt, (4)

with the propagator

L[ϕt(x,t)]=ft(x)Tϕt(x)+12Tr(HϕΣx), (5)

where denotes the gradient with respect to x, Tr() denotes the trace operator and Hϕ is the Hessian matrix with (Hϕ)ij=2ϕtxixj. Note that L and L are adjoint operators with respect to the L2 inner product.

We assume that the hidden state process Xt in (1) cannot be observed directly, but instead is partially observed through the process dUt, which is governed by the infinitesimal state increments dXt:

dUt=CdXt+Σu1/2dVt=Cft(Xt)dt+CΣx1/2dWt+Σu1/2dVt. (6)

where Vt is an M-BM process, and CM×N. The matrix ΣuM×M determines the level of noise in the increment observations. Due to its dependency on the increment dX, the observation process dUt is effectively governed by two noise sources, dWt and dVt. The first is correlated with the noise in the hidden state dynamics. The second is independent of it. This is in contrast to classical filtering problems, which consider the noise in hidden states and observations to be independent (cf. Appendix V-A).

If U0:t={Uτ:τ<t} denotes the filtration generated by the process Ut, then the filtering problem is to compute the posterior density pt(x)=p(Xt=xU0:t) or, equivalently, the posterior expectation E[ϕt]:=E[ϕ(Xt)U0:t]. In an uncorrelated noise setting, the Kushner–Stratonovich equation describes the temporal evolution of the posterior expectation E[ϕt] [21, 22, 23] (Appendix V-A). To find a similar formal solution for filtering with observed state increments (i.e., correlated noise), we can account for the correlations between observations and the hidden state by introducing a slight modification in the Kushner–Stratonovich equation, resulting in a generalized Kushner–Stratonovich equation (gKSE) [23, Chapter 3.8] (cf. Nüsken et al. [20]). For our particular problem, we show in Appendix V-B that the dynamics of posterior expectations satisfy

dE[ϕt]=E[L[ϕt]]dt+(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1(dUtCE[ft]dt), (7)

with cov(ϕt,ft)=E[ϕtft]E[ϕt]E[ft], and

Σ˜u=CΣxCT+Σu. (8)

Note that the right-hand side of (7) does not only depend on E[ϕt], but also on E[ϕtft], E[ϕt] and other expectations of potentially nonlinear functions, which in general cannot be computed from E[ϕt] alone. Therefore, in order to completely characterize the probabilistic solution, we would need one equation (7) for every moment of the posterior, with each moment corresponding to a specific choice of ϕt. Thus, except for a few very specific generative models, such as linear ones, the dynamics of posterior expectations in (7) will result in a system of an infinite number of coupled SDEs, which in general is analytically intractable.

Remark 1.

Equation (7) is the gKSE if state increments are the only available type of observations. In Appendix V-B, we extend it to the Kushner–Stratonovich equation when both state increments and (Gaussian-type) direct observations are present.

In what follows, we will approximately solve the continuous-time filtering problem with observed state increments by projecting the gKSE onto a submanifold of parametric densities with a finite number of parameters, resulting in a finite system of coupled SDEs for these parameters. For this, we will first review the general projection method, which so far has only been applied to classical filtering problems with uncorrelated state and observation noise, and then extend this framework to filtering problems with observed hidden state increments.

III. Projection filtering for observed continuous-time state increments

A. The general projection filtering method

Projection filtering is a method for approximate nonlinear filtering that is based on differential geometry. In this subsection, we outline briefly the differential geometric setup and derivation of the projection filter, and refer the reader to the seminal papers on projection filters [17, 24, 18], or the intuitive introduction to the subject matter presented in [25], for more detailed derivations and in-depth discussion of the method.

In general, we can interpret the solution of a (stochastic) differential equation (such as the FPE (2)) as a (stochastic) vector field on an infinite-dimensional function space M of probability density functions pt. If the vector field is stochastic (which is usually the case in nonlinear filtering [22]), we consider it to be given in Stratonovich form

dpt=A[pt]dt+B[pt]dUt, (9)

which is the standard choice for stochastic calculus on manifolds. Let us further assume a parametrization pθ(x):=p(x;θ) with a finite set of parameters θ={θ1,θm}Θ, such that the solution to (9) is reasonably well approximated by these parametrized densities.

Projection filtering provides a solution to the filtering problem by evolving the parameters θ, thereby constraining the approximate posterior to evolve on a finite-dimensional submanifold S={pθ(x);θΘ} of M, rather than on M itself. This implies that we have to project the vector field in (9) onto the tangent space TθS

TθS=Span(pθθ1,,pθθm)L1, (10)

where the pθθi denote the basis vectors of this tangent space. Intuitively, an orthogonal projection minimizes at each timestep the distance between the true posterior pt and its approximation pθ with respect to a Riemannian metric which, for probability distributions, corresponds to the Fisher metric [26]

gij=Epθ[logpθ(x)θilogpθ(x)θj]. (11)

This allows us to use the general orthogonal projection formula

Πθ[Z]=ijgijZ,pθθiθpθθj, (12)

where Πθ denotes the projection operator, the gij are the components of the inverse Fisher metric, and

Z1,Z2θ:=ΩdxZ1(x)Z2(x)pθ(x),Z1,Z2TθS. (13)

is the inner product that is associated with the Fisher metric (see Proposition 1 in Appendix V-C).

To find the parameter updates resulting from this projection, we apply the projection (12) to the dynamics of the probability density pt in (9). Since our approximate dynamics evolve along tangent vectors to the manifold parametrized by θ, the posterior pθ will stay on this manifold. Hence, the left-hand side of (9) can be written in terms of the basis vectors of TθS by using the chain rule:

Πθ[dpθ]=jpθθjdθj. (14)

Further, by letting the projection act on the right-hand side of this equation, and consecutively comparing coefficients in front of the basis vectors pθθj, we find the following Stratonovich SDEs for the parameters of the projected density following the evolution in (9):

dθj=igijA[pθ],pθθiθdt+igijB[pθ],pθθiθdUt. (15)

This is the result of Brigo et al. [18, Theorem 4.3].

To facilitate the comparison to the gKSE (7), we further slightly rewrite this SDE:

dθj=igij(ΩdxA[pθ]logpθθ)dt+igij(ΩdxB[pθ]logpθθ)dUt (16)
=igijEθ[A[logpθθi]]dt+igijEθ[B[logpθθi]]dUt, (17)

where Eθ[] here denotes the expectation with respect to the projected density pθ, and A and B denote the adjoint of A and B with respect to the L2 scalar product. Rewriting the SDE in such a way allows us to immediately identify the operators A and B on the right-hand side of this equation with the operators used for propagating expectations rather than densities.

To illustrate how to identify the operator A concretely, let us consider the filtering problem without any observations, i.e., a simple diffusion, which is formally solved by the Fokker–Planck Equation (2). Noting that the operator L propagates the density through time, we identify A=L. Similarly, we find A=L for the adjoint. Thus, the projection can be immediately determined from the time evolution of the expectation in Eq. (4), with ϕt=logpθθi:

dθj=igijE[L[logpθθi]]dt. (18)

If state increments are observed, expectations are propagated by using the gKSE (7), and identification of the operators A and B is possible after having transformed the gKSE to Stratonovich form, as we will see in the next section. Since this is usually easier to do than transforming the generalized Kushner equation for the posterior density in Stratonovich form, Eq. (17) is a more convenient choice for the parameter dynamics than Eq. (15).

Remark 2.

The derivation in the seminal papers on projection filtering [17, 18] follows a slightly different route but leads to the same result (15). We also refer the reader to the very accessible derivation presented in [25].

B. Projection with observed state increments

As we have seen, the adjunction between the SDE evolution operators for densities and associated expectations allowed us to use the expectation’s evolution equation to derive the projection filter for a specific problem (18). The same applies if the evolution of the density (or equivalently, that of the expectations) is a stochastic differential equation, as is the case for the KSE and the gKSE (7), as long as these are given in Stratonovich form. This allows us to formulate the projection filter for filtering with observed state increments:

Theorem 1.

The projection filter for the filtering problem with observed state increments is given by the following SDE of the parameters θ of a projected density pθ:

dθj=igij[Eθ[L˜[logpθθi]]12(Eθ[Σ˜u1/2Cft2logpθθi]+Eθ[Tr(Σ˜u1CJfΣxCT)logpθθi])]dt+igij(Eθ[logpθθift]+ΣxEθ[logpθθi])TCTΣ˜u1dUt, (19)

with (Jf)ij=fixj denoting the Jacobian matrix, and with the short-hand modified generator

L˜[ϕt]:=(ϕt(x))TC1ΣuΣ˜u1Cft(x)+12Tr(HϕΣxC1ΣuΣ˜u1C). (20)
Proof.

As a first step, let us rewrite the gKSE (7) in Stratonovich form (Corollary 3 in V-B):

dE[ϕt]=[E[L˜[ϕt]]12(cov(ϕt,Σ˜u1/2Cft2)+Tr(Σ˜u1Ccov(ϕt,Jf)ΣxCT))]dt+[cov(ϕt,ft)+ΣxE[ϕt]]TCTΣ˜u1dUt. (21)

This equation allows us to identify the operators A and B in Eq. (17) as the operations acting on ϕt in front of dt and dUt, respectively. By letting these operators act on ϕt=logpθθj in Eq. (17), and evaluating the expectations under the projected density pθ, we obtain SDEs for the desired parameters. These can be further simplified by using

Eθ[logpθθj]=θjΩdxpθ(x)=0,

which yields (19). □

The 1D special case follows directly from (19).

Corollary 1.

For univariate filtering problems, i.e. a filtering problem with N=M=1 in (1) and (6), with C=c, Σx=σx2, Σu=σu2 and Σ˜u=σ˜u2=c2σx2+σu2, the projection filter with observed state increments reads:

dθj=igij[σu2σ˜u2Eθ[L[logpθθi]]c22σ˜u2(Eθ[ft2logpθθi]+σx2Eθ[ftxlogpθθi])]dt+igij[cσ˜u2(Eθ[ftlogpθθi]+σx2Eθ[xlogpθθi])]dUt. (22)

C. Projection on exponential family distributions

Analogous to [18], it is possible to derive explicit filter equations for the natural parameters of a projected exponential family distribution. Consider the following exponential family parametrization:

pθ(x):=exp(θT(x)Ψ(θ)), (23)

where θ is the vector of natural or canonical parameters, T(x) is the vector of sufficient statistics and exp(Ψ(θ)) is the normalization.

Corollary 2.

The projection filter for the filtering problem with observed state increments is given by the following SDE of the natural parameters θ of a projected density pθ belonging to the exponential family:

dθj=igij[Eθ[L˜[Ti(x)]]12(Eθ[Σ˜u1/2Cft2(Ti(x)ηi)]+Eθ[Tr(Σ˜u1CJfΣxCT)(Ti(x)ηi)])]dt+igij(Eθ[(Ti(x)ηi)ft]+ΣxEθ[Ti(x)])TCTΣ˜u1dUt, (24)

where ηi=Eθ[Ti(x)] is the posterior expectation of the sufficient statistic Ti(x). The parameters η are sometimes referred to as dual or expectation parameters.

Proof.

Making use of the duality relation between natural and expectation parameters for exponential families,

θilogpθ(x)=Ti(x)θiΨ(θ)=Ti(x)ηi, (25)

and the fact that xηi=0, Eq. (24) follows directly from the projection filter with observed state increments (19). □

Example 1

(Generalized Kalman–Bucy filter). In order to demonstrate the general approach, let us consider a model with linear state dynamics

dXt=aXtdt+σxdWt, (26)
dUt=cdXt+σudVt. (27)

Here, we will show that a projection on a Gaussian manifold with

pθ(x)=N(x;μt,σt2) (28)

results in dynamics for the parameters μt and σt that are consistent with the generalized Kalman–Bucy Filter for observed state increments [20, Section 4.2]. In fact, since Eqs. (26) and (27) are both linear, the posterior density is a Gaussian and thus the projection filter becomes exact. For this particular problem, the projection filter reads:

dθj=igij[σu2σ˜u2Eθ[aXtxlogpθθi+σx22x2logpθθi]c2a22σ˜u2Eθ[Xt2logpθθi]]dt+igij[cσ˜u2(aEθ[Xtlogpθθi]+σx2Eθ[xlogpθθi])]dUt. (29)

We will use this to determine SDEs for the parameters μt and σt2, with logpθμt=xμtσt2 and logpθ(σt2)=12σt2+(xμt)2σt4, respectively, under the Gaussian assumption. First, the components of the Fisher information matrix of a Gaussian parametrized by its expectation parameters are given by

gμμ=1σt2,gσ2σ2=12σt4,gσ2μ=gμσ2=0. (30)

Since the matrix is diagonal, the components of its inverse are gμμ=gμμ1=σt2 and gσ2σ2=gσ2σ21=2σt2. This considerably simplifies the projection in Eq. (29) for the time evolution of μt and σt. Explicitly carrying out the expectations in Eq. (29) under the assumed Gaussian density, the SDEs for these parameters read:

dμt=aμtdt+ccσx2+σu2(aσt2+σx2)(dUtacμtdt), (31)
dσt2=[2aσt2+σx2c2cσx2+σu2(aσt2+σx2)2]dt. (32)

We found these Ito SDEs from their Strontonovich form by noting for the first line that the quadratic variation between σt and the observations process Ut is zero. In other words, no correction term according to the Wong–Zakai theorem [27] is required, such that both Ito and Stratonovich form have the same representation. Note that for a nonlinear generative model, this will in general not be the case [18]. Equations (31) and (32) are identical to the generalized Kalman–Bucy filter [20, Eqs. (62) & (65)], thus demonstrating the validity of our approach.

Despite being able to reproduce existing results, the main purpose of the projection filtering approach is to simplify potentially hard filtering problems such that the parameter SDEs become analytically accessible. This becomes particularly useful if a certain parametric form of the posterior is desired, for instance for computational reasons, and can be very appealing if expectations are easily carried out under the assumed posterior and the Fisher matrix is straightforward to invert or even diagonal.

Example 2

(Multivariate Gaussian with diagonal covariance matrix). From a computational perspective, projection on a Gaussian density with diagonal covariance matrix can be advantageous in certain situations. In particular, such a solution only requires equations for 2N parameters, instead of N2+N for a general Gaussian with Fisher matrix components

gμi,μj=(Σt1)ij,gμi,σnm2=0,
gσij,σnm=14[(Σt1)in(Σt1)jm+(Σt1)im(Σt1)jn],

which is in general hard to invert. For a Gaussian with diagonal covariance matrix, these simplify to

gμi,μt=(Σt1)ij,gσii2,σ2ii=12(Σt1)ii2=12σii2,

while all other components evaluate to zero, making the Fisher matrix diagonal and straightforward to invert. Since the diagonality of the covariance matrix effectively decouples the dimensions, expectations can be carried out in each dimension separately. Nevertheless, the specific form of the parameter SDEs will crucially depend on the specific form of the nonlinear function ft(x). For instance, considering the linear case, i.e. ft(x)=Ax, yields

dμt=Aμt+(Σx+diag(σii2)A)CTΣ˜u1(dUtCAμtdt),
dσii2dt=2σii2(AΣxCTΣ˜u1CA)ii2σii4(ATCTΣ˜u1CA)ii+(ΣxΣxCTΣ˜u1CΣx)ii.

IV. Continuous-time circular filtering

In this section, we will consider continuous-time circular filtering with observed state increments as a concrete application of the framework derived above. We will further extend it to account for quasi continuous-time von Mises-valued observations (formally defined below), to provide a continuous-time generalization of the discrete-time circular filtering problem that is frequently encountered in spatial navigation problems.

A. Assuming observed angular increments only

We assume that the hidden state Xt is parametrized on S1 by angle φt[0,2π), effectively embedding S1 in 2 as a unit circle. We further assume that φt follows a diffusion on the circle:

dφt=f(φt,t)dt+σφdWt, (33)

where Wt is now an 1-BM process, and drift and diffusion functions are as defined in Section II. The propagator L[] for this process is the same as for the corresponding process in 1 as given in Eq. (5).

For state processes that evolve on submanifolds of n, as the S1 considered here, Tronarp and Särkkä [19] have shown that projection filter equations are identical to the case where the state variable Xt evolves in Euclidean space. Since the mathematical operations performed to derive the projection filter with observed state increments in Theorem 1 are essentially the same as in Tronarp and Särkkä [19] for the state diffusion, their result carries over to our problem. Thus, Theorem 1 can straightforwardly be applied to the circular filtering problem by considering a circular projected density pθ(φ), such as the von Mises or a wrapped normal distribution.

Example 3

(Circular diffusion with observed state increments). In this example, we explicitly model angular path integration as the estimation of a circular diffusion based on observed angular increments. Consider a model where the hidden state evolves according to a Brownian motion on the circle, with noisy observations of its increment

dφt=1κφdWt, (34)
dUt=dφt+1κudVt. (35)

Here, φt could, for instance, correspond to the heading direction of an animal (or a robot) that is navigating in darkness and only has access to self-motion cues dUt, i.e., measurements of angular increments, but not to direct heading cues such as landmark positions. We chose to parametrize the diffusion constants in terms of precisions, κφ and κu, to make units comparable to that of the precision of the projected density, which we will denote κt. Thus, the parameter κφ governs the speed of the hidden state diffusion, and the parameter κu modulates the reliability of the observation process that is governed by the increments. The gKSE for this model’s posterior expectation of a test function ϕt:=ϕ(φt) in Stratonovich form reads:

dE[ϕt]=12(κφ+κu)E[2φ2ϕt]dt+κuκφ+κuE[φϕt]dUt. (36)

As a result, the projection filter for the parameters θ becomes (cf. Eq. (19))

dθj=1κφ+κuigij[12Eθ[2φ2logpθθi]dt+κuEθ[φlogpθθi]dUt]. (37)

We now want to solve the circular filtering problem with observed state increments by projecting on the von Mises density

pμ,κ(φ)=VM(φ;μt,κt)=12πI0(κt)exp(κtcos(φμt)), (38)

parametrized by mean μt and precision κt, using Eq. (37). Unlike e.g., the wrapped normal distribution, which is another popular choice for unimodal circular distributions, the von Mises distribution is an exponential family distribution and could alternatively be written in natural parametrization (23). Here, we chose to parametrize it by μt and κt, as it significantly simplifies the computation of the Fisher metric and its inverse, which appears on the right-hand side of (37). Noting that

μtlogpμ,κ(φ)=κtsin(φμt), (39)
κtlogpμ,κ(φ)=F(κt)+cos(φμt), (40)

where F(κ)=I1(κ)I0(κ) denotes a ratio of Bessel functions, the components of the Fisher metric (with respect to the μt, κt parametrization) are given by:

gμμ=κt2Eμ,κ[sin2(φtμt)]=κtF(κt), (41)
gκκ=Eμ,κ[(F(κt)+cos(φtμt))2]=1F(κt)κtF(κt)2, (42)
gμκ=gκμ=0. (43)

Since the Fisher metric is diagonal, the components of its inverse are simply gμμ=gμμ1, gκκ=gκκ1 and gμκ=gκμ=0.

Using these gij’s and explicitly computing the expectations on the right-hand side of Eq. (37) with respect to the von Mises approximation, we find the projection filter equations for this model:

dμt=κuκφ+κudUt, (44)
dκt=12(κφ+κu)F(κt)dt, (45)

where we defined the strictly positive function

F(κt)=F(κt)1F(κt)κtF(κt)2, (46)

and found the Ito form of dμt from its Stratonovich form by noting that the noise variance is constant, making this conversion straightforward.

The projection filter defined by Eqs. (44) and (45) for orientation tracking in darkness has an intuitive interpretation: the mean μt is updated according to the angular increment observations, weighted by their reliability, as quanitified by κu. Even in the presence of such observations, the estimate’s precision, κt, decays towards zero, since F(κt) is strictly positive. This decay reflects the accumulation of noisy observations. Very informative angular velocity observations with large κu may slow the decay, but cannot fully prevent it. In other words, without direct angular observations (which we will introduce in Section IV-B below), the estimate will inevitably become less accurate over time.

In Figure 1a, we illustrate in an example simulation that, despite the presence of angular increment observations, the estimate slowly drifts away from the true heading φt. As a benchmark we use a particle filter, and further compare mean μt and precision rt=F(κt) of the projection filter to that estimated by a Gaussian projection filter approximation (see Appendix V-F for details on these benchmarks). Such a filter relies on the assumption that the hidden state φt evolves on the real line, and thus leads to a slight deviation in the dynamics of the estimated precision κt.

Fig. 1. Circular filtering with observed angular increments.

Fig. 1.

a) The von Mises projection filter in Eqs. (44) and (45) is able to track a diffusion process on the circle based on observed increments, with mean μt and precision rt (given by rt=I1(κt)I0(κt)) matching that of a particle filter (curves are on top of each other). b) While the precision r^t estimated from the deviation between mean μt and true trajectory φt and the precision rt estimated by the filter coincide for both the von Mises projection filter and the particle filter, the Gaussian projection filter (“Gauss filter”) tends to underestimate its precision. c) Empirical (upper panel) and estimated (lower panel) precision for different values of the observations precision κu at time T=10κφ1. Note that in the upper panel, the empirical precision r^t of the different filters is identical. Parameters for a) and b) are κφ=1, κu=10, times are in units of κφ1. Simulations in b) and c) were averaged over 5000 runs.

Numerically, our projection filter’s performance in this example is indistinguishable from that of the particle filter (Figure 1b, c), and its estimated precision rt matches exactly the empirical precision evaluated by averaging the estimation error over 5000 simulation runs. The estimated precision of the Gaussian approximation, in contrast, systematically underestimates its precision for large observation reliability, and overestimates it when the angular increment observations become very noisy (Figure 1c).

Example 4

(Higher-order circular distributions). The previous example is one of the simplest examples of a circular filtering problem, and results in an approximated posterior that is always unimodal. This might be insufficient for certain settings in which we would like to consider more sophisticated projected densities. To show that our framework extends beyond such simple models, let us consider a class of circular distributions with exponential-family densities of the form

p(φ)=1Z(a,b)exp[k=1Kakcos(kφ)+bksin(kφ)]. (47)

The case K = 1 recovers the previously used von Mises distribution (38), but with a different parametrization. For K = 2, this density is referred to as the generalized von Mises distribution, whose properties have been studied extensively [28, 29]. As a proof of concept, we will now use the projection filter for the natural parameters (Eq. (24)) to project the solution to the generative model in Eqs. (34) and (35) onto a distribution with density (47).

The circular distributions with densities (47) belong to the exponential family distributions, with natural parameter vector θ=(a,b) with a={a1,,aK} and b={b1,,bK}, and sufficient statistics given by

Tkcos(φ)=cos(kφ),Tksin(φ)=sin(kφ). (48)

The corresponding expectation parameters are defined by

ηkcos=Eθ[Tkcos],ηksin=Eθ[Tksin]. (49)

According to Corollary 2, the projection filter for the natural parameters of an exponential family density requires us to apply the right-hand side of the gKSE (36) to the sufficient statistics. For this, we need to compute

Eθ[φTkcos(φ)]=kEθ[sin(kx)]=kηksin, (50)
Eθ[2φ2Tkcos(φ)]=k2Eθ[cos(kx)]=k2ηkcos, (51)
Eθ[φTksin(φ)]=kηkcos, (52)
Eθ[2φ2Tksin(φ)]=k2ηksin. (53)

Furthermore, we note that the components of the Fisher matrix gij are given by

gij=2θiθjlogZ(a,b), (54)

where the θi refer to the ith element in the parameter vector θ that contains the elements of both a and b. Inverting this matrix to get the inverse components gij is in general not straightforward. In fact, already the Fisher metric needs to be computed numerically, as the normalization Z(a, b) is inaccessible in closed form. We will thus treat G1 symbolically and sort the parameters such that G1 is composed of the blocks

G1=(G˜c,cG˜s,cG˜s,cG˜s,s), (55)

Then, the projection filter for the natural parameters can be formally written as

da=G˜c,c(1κφ+κuk2ηcosdtκuκφ+κukηsindUt)+G˜s,c(1κφ+κuk2ηsindt+κuκφ+κukηcosdUt), (56)
db=G˜s,c(1κφ+κuk2ηcosdtκuκφ+κukηsindUt)+G˜s,s(1κφ+κuk2ηsindt+κuκφ+κukηcosdUt), (57)

where we denote with k=(1,,k) the vector of values k, k2 results from the element-wise squaring of k, η is the vector of expectation parameters, and the element-wise (Hadamard) product. This example demonstrates that our framework could, in principle, be applied to project the posterior to more general and inevitably more complicated densities, should the need arise. Although we do not show this here explicitly, an instance where this might increase filtering accuracy is one where the initial density p0(φ) is multimodal. The example also shows that, in general, a projection filtering approach might not be the most practical approach. Here in particular, computation of the Fisher matrix components might only be possible numerically, and in that case is computationally expensive. This highlights the need for a careful choice of the posterior density and its corresponding parametrization, where (ideally) expectations of sufficient statistics are available in closed-form.

B. Quasi continuous-time von Mises observations

So far we have focused on filtering algorithms that rely exclusively on observed angular increments. Such algorithms are bound to accumulate noise, such that their precision will decay to zero in the long run. To counteract this effect, let us now consider how to additionally include observations that are generated directly from the hidden state, rather than only its increments. Specifically, we will in this section propose an observation model for (quasi-)continuous time von Mises valued observations with ZtS1, which we will refer to as direct angular observations, because they are governed by the hidden state φt directly. This model will allow us to formulate a von Mises projection filter for both angular observations, and angular increment observations in the continuous-time circular filtering problem.

In classical continuous-time filtering settings, continuous-time observations Yt are usually considered to follow a Gaussian diffusion process, whose drift component is governed by the hidden state Xt [23]. Equivalently, one could consider ‘time-discretized’ (or ‘quasi-continuous’) observations Z˜t:=dY˜tΔt with sampling time step Δt, according to

Z˜tN(h(Xt),σz2Δt1), (58)

which is the usual setting of discrete-time filtering (with fixed Δt), with h(x) being a potentially nonlinear function. Notably, the Fisher information I(Xt) about the state of the hidden variable Xt that is conveyed by these quasi-continuous observations Z˜t grows linearly with sampling time step Δt (seel Proposition 2 in Appendix V-D). The consequence of this scaling is rather intuitive: decreasing the sampling time step Δt will result in overall more observations per unit time, which, in turn, are individually less informative about the state Xt. This renders the information rate (information per unit time) independent of the chosen time step.

Analogously, we now consider observations that are drawn from a von Mises distribution centered around a nonlinear S1 valued transformation h:S1S1 of the true hidden state φt,

ZtVM(h(φt),α(κz,Δt)). (59)

We would like this observation model to have the same linear information scaling properties as the Gaussian observations encountered in the classical filtering problems, i.e. when hidden state noise and observation noise are uncorrelated. Thus, we need to choose the function α(κz,Δt) such that the information content about the state φt scales linearly with step size Δt and observation precision κz.

Theorem 2.

If α(κz,Δt) is chosen such that

α(κz,Δt)=ξ1(κzΔt), (60)

where ξ1 is the inverse of ξ(x)=xF(x) (and F(x)=I1(x)I0(x) as defined earlier), then the information about the state of the random variable φt scales linearly with sampling time step and observation precision, i.e., I(φt)κzΔt.

Proof.

The information content about the random variable φt is given by the Fisher information

I(φt)=EZt[(ϕlogVM(Zt;h(ϕ),α))2ϕ=φt] (61)
=h(φt)2αI1(α)I0(α)αF(α). (62)

We require that the information content per time step Δt is constant and proportional in κz, which can be achieved if α varies with Δt and κz according to

α(κz,Δt)F(α(κz,Δt))κzΔt. (63)

This can be achieved if α(κz,Δt)=ξ1(κzΔt), with

ξ(x):=xI1(x)I0(x). (64)

Once a step size Δt is chosen, ξ1(κzΔt) can be computed numerically. For sufficiently small κzΔt, e.g., in the continuum limit, this function can be approximated by ξ1(κzΔt)2κzΔt, while it becomes ξ1(κzΔt)κzΔt for large κzΔt (Figure 2). The latter is consistent with the intuition that, for highly informative observations, the single observation likelihood is well approximated by a Gaussian (which breaks down in the limit Δt0).

Fig. 2. Time scaling of quasi-continuous angular observations.

Fig. 2.

Both a) the Fisher information per observation and b) the Fisher information rate are linear and constant, respectively, in the sampling time step Δt when using α(κz,Δt)=ξ1(κzΔt) (”Ideal”). For comparison, we also plot α(κz,Δt)=κzΔt (small κzΔt approximation,”Squareroot”) and α(κz,Δt)=κzΔt (Gaussian approximation,”Linear”). c) Sample simulation with the constant position of the state φt=φ estimated from quasi-continuous observations with different α functions and sampling time steps Δt. Ideally, the estimated precision rt should be independent of the chosen simulation time step Δt. This is satisfied by all simulations except for the linear approximation for small κzΔt (dark orange), and the square root approximation for large κzΔt (light green). In these simulations, we used time units of seconds (s), and set κφ=100/s and κz=100/s (by design, κz has units of Fisher information per unit time), without loss of generality. Precision estimates were averaged over 10 simulation runs. Black and grey arrow in panel b) correspond to the two time step sizes shown in panel c).

What we have considered here is in essence a modified discrete time observation model, which implies that we can take advantage of filtering methods available for circular filtering with discrete-time observations [13, 12]. However, by allowing the precision α(κz,Δt) to vary with time step, we additionally ensure that the information rate stays constant: increasing the time step will result in more observations per unit time, which is accounted for by less informative individual observations. Thus, the observation model defined in (59) and (60) constitutes a quasi continuous-time observation model.

C. Adding quasi continuous-time observations to the circular projection filter

If the measurement function h(φ) in (59) is the identity, we can add the direct observations to our filter by straightforwardly making use of Bayes’ theorem at every time step. Specifically, since we assumed our approximated (projected) density to be von Mises at all times, the measurement likelihood

p(Ztφt)=VM(Zt;φt,α(κz,Δt)) (65)

is conjugate to the density before the update pt(φ):=p(φt=φZ0:tΔT)=VM(φ;μt,κt). In other words, the posterior pt(φ):=p(φt=φZ0:t) is guaranteed to be a von Mises density as well:

pt(φ)p(Ztφt)pt(φ) (66)
=exp[(cosϕtsinϕt)(α(κz,dt)(cosZtsinZt)+κt(cosμtsinμt))]. (67)

As expected from an exponential family distribution, the natural parameters θt=κt(cosμt,sinμt) are updated according to

θt=θt+α(κz,Δt)(cosZtsinZt). (68)

This operation is equivalent to a summation of vectors in 2, where the natural parameters refer to Eucledian coordinates, and (μ, κ) are the corresponding polar coordinates (Figure 3a). In the continuum limit, we write

dθt=2κzdt(cosZtsinZt), (69)

where we used that α(κz,dt)2κzdt for dt0. A coordinate transform from θ to (μ, κ) recovers the update equations for mean μt and κt that result from quasi-continuous time observations

dμt=darctan(θ2,θ1)=2κzdtκtsin(Ztμt), (70)
dκt=dθ12+θ22=2κzdtcos(Ztμt). (71)

Fig. 3. Filtering with quasi-continuous time angular observations.

Fig. 3.

a) Single quasi-continuous time update step (68) with angular observation zt, where the length of the vector indicates observation reliability α(κzdt). The update step for Bayesian inference on the circle is equivalent to a vector addition in the 2D plane. The lower panel demonstrates that a conflicting observation leads to a decreased certainty of the estimate directly after the update, corresponding to a shorter vector. b) Empirical precision r^t (upper panel) and estimated precision rt of the circular Kalman filter (circKF) and a Gaussian projection filter (Gauss filter), when compared to a particle filter, for different values of the observation precision κz at time T=10κφ1. Parameters were κφ=1 and κu=1, times are in units of κφ1. c) Estimated versus empirical precision up to T=10κφ1 for the different filters at κz=10. The precisions shown in b) and c) are averages across 5000 simulation runs.

The recovered update equations are appealingly simply for identity (or linear) observation functions, as such functions allow us to leverage Bayesian conjugacy properties. For nonlinear observation functions, in contrast, the posterior after the update is in general not in the same class of densities (and in this particular case not a von Mises distribution anymore). To apply the projection filtering framework to project such a nonlinear observation model back onto the desired manifold of densities, one would need to know the vector field for the density pt that incorporates the observation-induced update, a derivation that is beyond the scope of this work.1

Example 5

(The circular Kalman filter). Let us revisit angular path integration, which we introduced in Example 3 as a circular diffusion with observed angular increments, and extend it to include direct angular observations Zt with likelihood (65). Such angular observations could, for instance, correspond to directly accessible angular cues, such as visual landmarks. Combining Eqs. (44) and (45) with Eqs. (70) and (71) to include the quasi-continuous updates results in

dμt=κuκφ+κudUt+2κzdtκtsin(Ztμt), (72)
dκt=12(κφ+κu)F(κt)dt+2κzdtcos(μtZt). (73)

Due to the simplicity of these equations, as well as their structural similarity to the generalized Kalman filter, while taking full account of the circular state and measurement space, we coin the SDEs (72) and (73) the circular Kalman filter (circKF).

Considering that 2κzdt is modulated by the direct observation’s reliability κz, the final terms in the filter’s update equations nicely reflect the reliability-weighting that is already present in classical filtering problems: more reliable direct observations, relative to the current certainty κt, have a stronger impact on the mean μt. Furthermore, if the current observation Zt is similar to the current estimate μt, this estimate is hardly updated, while the estimate’s certainty κt increases. In contrast, if direct observations and current estimate are in conflict, the certainty κt might temporarily decrease (Figure 3a). What makes this last feature particularly interesting is that it only occurs in the circKF, but not in its Euclidean counterpart, the standard Kalman filter. In the latter an update induced by direct observations always leads to an increase in certainty. Numerically, the circKF features performance close to that of a particle filter over a wide range of parameters, while outperforming a Gaussian projection filter (Figure 3b, c). The reason for the deviation of the circKF from the optimal solution is that the von Mises distribution is still an approximation of the true posterior, which leads to slight deviations in the updates when the direct observations are integrated. These deviations are offset by the more than 10-fold decrease in computation time of the circKF that we observed in simulations: a single run in Figure 3c with the particle filter took 3.14±0.11s, while it only took 0.113±0.005s with the circKF on a MacBook Pro (Mid 2019) running 2.3 GHz 8-core Intel Core i9 using NumPy 1.19.2 on Python 3.9.1.

V. Conclusion

In this paper, we derived a continuous-time nonlinear filter with observed state increments, based on a projection filtering approach. Using this framework, we revisited the problem of probabilistic angular path integration. By additionally proposing a quasi continuous-time model for von Mises-valued direct observations, we were able to formulate a circular filtering algorithm that accounts for both increment and direct angular observations. Notably, this algorithm fulfills the following four conditions: (i) it operates on a circular state-space and (ii) in continuous time, (iii) it maintains a consistent representation during state propagation and observation update, as ensured by the projection filtering method, and (iv) it performs the proper integration of both increment and direct observations. Even though we have only fully worked out the algorithm for univariate circular filtering problems, we have formulated the overall projection filtering framework for more general multivariate problems. As by the results in [19], we expect our framework to carry over to multivariate circular filtering problems, such as reference vector tracking on the unit sphere. A possible shortcoming of this approach is that the class of generative models it can deal with is fairly limited. The generalized Kushner–Stratonovich equation is only valid if the error covariance of the observation process does not explicitly depend on the value of the hidden state. This constrains the generative model in the following way: first, we can only allow additive noise in the state process, as any multiplicative noise would enter the increment observation process Ut as state-dependent noise. Second, only linear transformations of state increments can be considered. As demonstrated in Example 4, another shortcoming of the projection-filtering method in general is that this approach is only computationally feasible for problems that are analytically accessible, i.e., where expectations under the projected density can be computed rapidly, or in closed form. In this paper, we thus focused on projected densities where these expectations could be efficiently computed.

Despite these limitations, the analytical accessibility and interpretability of our main result, in particular the circular Kalman filter, make it an attractive algorithm for unimodal circular filtering problems. First, it is straightforward to implement in software. Since its representation stays fixed through all times, it relies on only two equations which can be integrated straightforwardly, e.g. with an Euler–Maruyama scheme [30]. As we have seen in our numerical experiments, this makes this filter much faster than established methods, such as particle filters. Second, the interpretation of the dynamics are intuitively comprehensible. Third, since it is a continuous-time formulation, it automatically scales with respect to chosen sampling step size (as long as it is sufficiently small). This is an advantage over continuous-discrete filtering problems, which usually consider a fixed sampling step size, and need to be reformulated should the sampling rate in the observations change or vary across time. Lastly, animals navigate the world based on a continuous stream of sensory information, which motivates the use of continuous-time models when trying to understand how the brain operates under uncertainty. Thus, one possible application could be a conceptual description of how the brain performs angular path integration [31].

Acknowledgements

We would like to thank Melanie Basnak and Rachel Wilson (Harvard Medical School) for active and ongoing discussions that motivated the question of probabilistic path integration, and who helped us formulate the questions that we address in this manuscript. We further would like to thank Johannes Bill (Harvard Medical School) and Simone Surace (University of Bern), as well as the anonymous reviewers, for helpful feedback and suggestions on the manuscript.

Funding

A.K. was funded by the Swiss National Science Foundation (grant numbers P2ZHP2 184213 and P400PB 199242). J.D. and L.R. were supported by the Harvard Medical School Dean’s Initiative Award Program for Innovative Grants in the Basic and Social Sciences, and a grant from the National Institutes of Health (NIH/NINDS, 1R34NS123819).

APPENDIX

A. Nonlinear filtering in a nutshell

Let us briefly review the classical nonlinear filtering setting, i.e., filtering with observations that follow a diffusion process with noise uncorrelated to that of the hidden state process, in order to compare this with filtering with observed state increments. In line with standard literature [23], let Yt denote the process of M-valued direct observations. Then the generative model commonly referred to in classical nonlinear filtering is given by

dXt=f(Xt,t)dt+Σx1/2dWt, (74)
dYt=h(Xt,t)dt+Σy1/2dV˜t, (75)

with V˜t and Wt independent standard Brownian motion processes, and h:N×M a potentially nonlinear, vector-valued observation function. All other quantities have been defined below (1). Note that the observations process is a diffusion with error covariance Σydt. The goal of nonlinear filtering is to compute dynamics of the posterior expectations E[ϕt]:=E[ϕ(Xt)Y0:t], where Y0:t={Yτ:τ<t} denotes the filtration generated by the process Yt. Formally, this is solved by the Kushner–Stratonovich equation (KSE, [23, Theorem 3.30]):

dE[ϕt]=E[L[ϕt]]dt+cov(ϕt,ht)TΣy1(dYtE[ht]dt), (76)

with cov(ϕt,ht)=E[ϕt,ht]E[ϕt]E[ht].

B. Derivation of the generalized Kushner–Stratonovich equation for the posterior expectation (Eqs. (7) and (21))

Let us revisit the generative model in Eqs. (1) and (6):

dXt=f(Xt,t)dt+Σx1/2dWt, (77)
dUt=Cf(Xt,t)dt+CΣx1/2dWt+Σu1/2dVt, (78)

with Wt and Vt independent vector-valued Brownian motion processes, as defined earlier. Here, we derive the gKSE (7) by treating this model as a correlated noise filtering problem, which allows us to directly apply established results [23, Corollary 3.38].

Lemma 1.

The generalized Kushner–Stratonovich equation (gKSE) for the evolution of the conditional expectation of a test function E[ϕt]=E[ϕ(Xt)Y0:t] in the presence of state increment observations dUt is given by (Ito^ form)

dE[ϕt]=E[L[ϕt]]dt+(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1(dUtCE[ft]dt). (79)
Proof.

To simplify calculations, we first require that the observations process has an error covariance that equals the identity, and thus we rescale the process dUt by Σ˜u1/2=(CΣxCT+Σu)1/2,

dU˜t=Σ˜u1/2dUt. (80)

Eqs. (77) and (80) define a filtering problem where the noise in hidden process Xt and observation process U˜t are correlated, with quadratic covariation

[X,U˜T]t=ΣxCTΣ˜u1/2. (81)

Thus, the result of [23, Corollary 3.38] (KSE for correlated noise filtering problems) is directly applicable to our problem. For more details on [23, Corollary 3.38], we kindly refer to the proof based on the innovations method provided therein. An alternative proof based on the change of measure approach is presented in [20].

Note that the KSE for correlated noise [23, Eq. 3.72] is similar to the classical KSE for uncorrelated noise problems (76), except for a correction term, given by a vector field B[ϕ],

dE[ϕt]=E[L[ϕt]]dt+(cov(ϕt,Σ˜u1/2Cft)+E[B[ϕt]])T(dU˜Σ˜uCE[ft]dt). (82)

The vector field B[ϕ] can be read out from the dynamics of the quadratic covariation between the process ϕt and the rescaled observations process U˜t. Using Ito’s lemma, we can write:

dϕt=(ϕt)TdXt+12Tr(ΣxHϕ)dt, (83)

and thus

d[ϕ,U˜T]t=(ϕt)Td[X,U˜T]t (84)
=(ϕt)TΣxCTΣ˜u1/2dt=:B[ϕt]Tdt. (85)

Further identifying ht=Σ˜u1/2Cft, we find

dE[ϕt]=E[L[ϕt]]dt+(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1/2(dU˜tΣ˜u1/2CE[ft]dt). (86)

Rescaling dU˜t=Σ˜u1/2dUt yields (79).

Remark 3.

By combining (76) and (79) it is possible to derive a generalized Kushner equation when both types of observations are present, i.e., when we consider both the process Yt (Eq. 75) and the process Ut (Eq. 78) as the observations:

dE[ϕt]=E[L[ϕt]]dt+cov(ϕt,ht)TΣy1(dYtE[ht]dt)+(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1/2(dU˜tΣ˜u1/2CE[ft]dt). (87)

In this case, the expectations E[] are with respect to the filtration Y0:t and U0:t, i.e., E[ϕt]=E[ϕ(Xt)Y0:t,U0:t].

Corollary 3.

In Stratonovich form, the generalized Kushner–Stratonovich equation (gKSE) for the evolution of the conditional expectation of a test function E[ϕt]=E[ϕ(Xt)Y0:t] in the presence of increment observations dUt reads:

dE[ϕt]=[E[L˜[ϕt]]12(cov(ϕt,Σ˜u1/2Cft2)+Tr(Σ˜u1Ccov(ϕt,Jf)ΣxCT))]dt+(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1dUt, (88)

with

E[L˜[ϕt]]=E[L[ϕt]]E[(ϕt)TΣxCTΣ˜u1Cft]12Tr(ΣxE[Hϕ]ΣxCTΣ˜u1C) (89)

For one-dimensional problems with N=M=1, C=c, Σx=σx2, Σu=σu2 and Σ˜u=σ˜u2=c2σx2+σu2, (88) simplifies to

dE[ϕt]=[σu2σ˜u2E[L[ϕt]]c22σ˜u2(cov(ϕt,ft2)+σx2cov(ϕt,xft))]dt+cσ˜u2[cov(ϕt,ft)+σx2E[xϕt]]dUt. (90)
Proof.

We convert between Itô and Stratonovich calculus using the Wong–Zakai theorem [27]:

BtTdUt=BtTdUt12d[BT,U]t, (91)

where the symbol ◦ denotes Stratonovich calculus, and [BT,Y]t is the quadratic covariation between the processes Yt and Bt. We identify (cf. Eq. (79))

BtT=(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1, (92)

and write

dE[ϕt]=E[L[ϕt]]dtBtTCE[ft]dt+BtTdUt (93)
=E[L[ϕt]]dtBtTCE[ft]dt+BtTdUt12d[BT,U]t. (94)

To obtain the change in quadratic covariation d[BT,U]t, it is helpful to find dBt by Itô‘s lemma

dBtT=(dE[ϕtft]E[ϕt]dE[ft]dE[ϕt]E[ft]dE[ϕt]dE[ft]+dE[ϕt]Σx)CTΣ˜u1. (95)

The evolution of the expectations is obtained by straightforward application of the gKSE (79), substituting ϕt with the functions ϕtft, ft and xϕt. This will result in terms that multiply dUt, which are those relevant for computing the change of the covariation process, d[BT,U]t. Further note that the quadratic covariation of the observation process evolves according to d[UT,U]t=Tr(Σu)dt. Some tedious but straightforward algebra and term rearrangements then result for the quadratic covariation in

d[BT,U]t=[cov(ϕt,Σ˜u1/2Cft2)+Tr(CTΣ˜u1Ccov(ϕt,Jf)Σx)+2cov((ΣxCTΣ˜u1Cft)T,ϕt)2cov(ϕt,ft)CTΣ˜u1CE[ft]T+Tr(ΣxE[Hϕ]ΣxCTΣ˜u1C)]dt. (96)

Plugging this into (94), and again some algebra, yields

dE[ϕt]=[E[L[ϕt]]E[(ϕt)TΣxCTΣ˜u1Cft]12Tr(ΣxE[Hϕ]ΣxCTΣ˜u1C)]dt12[cov(ϕt,Σ˜u1/2Cft2)+Tr(Σ˜u1Ccov(ϕt,Jf)ΣxCT)]dt(cov(ϕt,ft)+ΣxE[ϕt])TCTΣ˜u1dUt. (97)

This equation can be further simplified by noting that IΣxCTΣu1C=C1ΣuΣ˜u1C. Further, we can substitute

E[L˜[ϕt]]=E[L[ϕt]]E[(ϕt)TΣxCTΣ˜u1Cft]12Tr(ΣxE[Hϕ]ΣxCTΣ˜u1C) (98)

in the first line, which yields Eq. (88). Equation (90) follows from (88) as the 1D special case.

C. Fisher metric and scalar product

Proposition 1.

The scalar product defined in Eq. (13),

Z1,Z2θ:=ΩdxZ1(x)Z2(x)pθ(x),Z1,Z2TθS, (99)

is the scalar product on TθS that is associated with the Fisher metric [26]

gij=Epθ[logpθ(x)θilogpθ(x)θj]. (100)
Proof.

Consider Z1 and Z2 to correspond to two of the basis vectors of the tangent space TθS, i.e., Z1=pθ(x)θi and Z1=pθ(x)θj. Then

pθθi,pθθjθ=Ωdx1pθ(x)pθ(x)θipθ(x)θj (101)
=Ωdxlogpθ(x)θilogpθ(x)θjpθ(x) (102)
=E[logpθ(x)θilogpθ(x)θj]=gij. (103)

This concludes the proof. □

D. Information scaling

Proposition 2.

The Fisher information I(Xt) about the hidden state variable Xt that is conveyed by Gaussian-type discrete-time observations,

Z˜tN(Z˜t;g(Xt),Δt1), (104)

grows linearly with the time step Δt.

Proof.

The information content about the variable φt that is conveyed by the observation Z˜t is given by the Fisher information

I(Xt)=EZ˜t[(xlogN(Z˜t;g(x),σz2Δt))2x=Xt]=g(x)2σz4(Δt)2EZ˜t[(Z˜tg(x))2]=g(x)2σz2ΔtΔt.

E. Details on numerical simulations

Our numerical simulations in Figures 1 and 3, corresponding to Examples 3 and 5, were based on artificial data generated from the true model equations. In particular, the “true” state φt is a single trajectory from Eq. (34), and observations are drawn at each time point from Eq. (35) and, in the case of Example 5, additionally from Eq. (65). To simulate trajectories and observations, we use the Euler–Maruyama approximation [30]. In this approximation, the time-discretized generative model with fixed time step size Δt in Examples 3 and 5 reads:

φtN(φtΔt,1κφΔt)mod2π (105)
ΔUtN(φtφtΔt,1κuΔt) (106)
ZtVM(φt,ξ1(κzΔt)). (107)

The same time-discretization scheme was used to numerically integrate the SDEs (44), (45), (72) and (73) for the von Mises parameters μt and κt. Unless stated otherwise, we used Δt=0.01 in all our numerical simulations, and give times in units of κϕ.

F. Benchmarks for numerical simulations

For our numerical simulations in Figures 1 and 3, corresponding to Examples 3 and 5, we used the following filtering algorithms to compare the circKF against.

1). Particle Filter:

  1. As benchmark, we used a Sequential Importance Sampling/Resampling particle filter (SIS-PF, [32]), that was modified to account for state increment observations ΔUt.

The N particles in the SIS-PF where propagated according to

π(φt(j)φtΔt(j),ΔUt)=N(φtΔt(j)+κuκu+κφΔUt,1κφ+κuΔt)mod2π, (108)

and each particle j was weighted at each time step according to

wt(j)=wtΔt(j)VM(Zt;φt(j),ξ1(κzΔt)), (109)

yielding an SIS for this model that is asymptotically exact in the N limit. Mean μt and precision rt of the filtering distribution were determined at each time step according to a weighted average on the circle, i.e. the first circular moment:

rtexp(iμt)=j=1Nwt(i)exp(iφt(i)). (110)

For a von Mises distribution, the radius r of the first circular moment and the precision parameter κ are related via r=I1(κ)I0(κ), which is why we use r rather than κ in our plots.

In our simulations, we used N = 103 if direct angular observations Zt were present, and N = 104 if only state increment observations were present. We re-sampled the particles whenever the effective number of particles, Neff=j(w(i))2, was lower than N/2.

2). Gaussian approximation:

The reference filter, which we refer to as “Gauss filter”, is a heuristic method that assumes posterior mean μt and variance σt to evolve according to a generalized Kalman–Bucy filter (Eqs. (31) and (32)). Such a filter is often referred to as “assumed density filter” (ADF) in the literature, which under certain conditions, such as for the circular filtering problem we consider here, becomes fully equivalent to a Gaussian projection filter (see [18, Section 7] for in-depth discussion). In order to make the resulting distribution circular, this Gaussian is consecutively approximated by a von Mises distribution via κtσt2, resulting in the following update equations for the model in Example 3:

dμt=κuκφ+κudUt, (111)
dκt=d(1σt2)=1κφ+κu1κt2dt. (112)

For the model used in Example 5, this is combined with the observations in the same way as for the circKF. Note that in absence of direct angular observations zt, the mean dynamics are the same as for the circKF, while κt deviates (as shown in Figure 1). When direct angular observations are present, this, in turn, affects the computation of the update in the mean dynamics, which leads to a worse numerical performance than the circKF.

G. Code availability

Jupyter notebooks to generate Figs. 13, the underlying simulation data as well as Python scripts to generate this data has been deposited at Zenodo, and is publicly available at https://doi.org/10.5281/zenodo.5820406.

Footnotes

1

As outlined earlier, for a Gaussian setting this would be the update part of the Kushner–Stratonovich equation. To the best of our knowledge, no such equation exists for von Mises-valued observations.

References

  • [1].Heinze S, Narendra A, and Cheung A, “Principles of insect path integration,” Current biology : CB, vol. 28, no. 17, pp. R1043–R1058, Sep 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6462409/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Kreiser R, Cartiglia M, Martel JNP, Conradt J, and Sandamirskaya Y, “A Neuromorphic Approach to Path Integration: A Head-Direction Spiking Neural Network with Vision-driven Reset,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5, iSSN: 2379–447X. [Google Scholar]
  • [3].Taube JS, Muller RU, and Ranck JB, “Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis.” The Journal of Neuroscience, vol. 10, no. 2, pp. 420–35, 1990. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/2303851 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Seelig JD and Jayaraman V, “Neural dynamics for landmark orientation and angular path integration,” Nature, vol. 521, no. 7551, pp. 186–191, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Skaggs WE, Knierim JJ, Kudrimoti HS, and McNaughton BL, “A Model of the Neural Basis of the Rat’s Sense of Direction,” Advances in neural information processing systems, p. 10, 1995. [PubMed] [Google Scholar]
  • [6].Turner-Evans D, Wegener S, Rouault H, Franconville R, Wolff T, Seelig JD, Druckmann S, and Jayaraman V, “Angular velocity integration in a fly heading circuit,” eLife, vol. 6, p. e23496, May 2017. [Online]. Available: https://elifesciences.org/articles/23496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Xie X, Hahnloser RH, and Seung HS, “Double-ring network model of the head-direction system,” Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 66, no. 4, pp. 9–9, 2002. [DOI] [PubMed] [Google Scholar]
  • [8].Ma WJ, Beck JM, Latham PE, and Pouget A, “Bayesian inference with probabilistic population codes.” Nature Neuroscience, vol. 9, no. 11, pp. 1432–8, Nov 2006. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/17057707 [DOI] [PubMed] [Google Scholar]
  • [9].Fiser J, Berkes P, Orban G, and Lengyel M, “Statisticallý optimal perception and learning: from behavior to neural representations,” Trends in Cognitive Sciences, vol. 14, no. 3, pp. 119–130, Mar 2010. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1364661310000045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Kalman RE, “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the ASME Journal of Basic Engineering, vol. 82, no. Series D, pp. 35–45, 1960. [Online]. Available: http://fluidsengineering.asmedigitalcollection.asme.org/article.aspx?articleid=1430402 [Google Scholar]
  • [11].Kalman RE and Bucy RS, “New Results in Linear Filtering and Prediction Theory,” Journal of Basic Engineering, vol. 83, no. 1, pp. 95–108, Mar 1961. [Online]. Available: https://asmedigitalcollection.asme.org/fluidsengineering/article/83/1/95/426820/New-Results-in-Linear-Filtering-and-Prediction [Google Scholar]
  • [12].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive Bayesian filtering in circular state spaces,” IEEE Aerospace and Electronic Systems Magazine, vol. 31, no. 3, pp. 70–87, Mar 2016. [Online]. Available: https://ieeexplore.ieee.org/document/7475421 [Google Scholar]
  • [13].Azmani M, Reboul S, Choquel JB, and Benjelloun M, “A recursive fusion filter for angular data,” 2009 IEEE International Conference on Robotics and Biomimetics, ROBIO 2009, pp. 882–887, 2009, publisher: IEEE. [Google Scholar]
  • [14].Traa J. and Smaragdis P, “A Wrapped Kalman Filter for Azimuthal Speaker Tracking,” IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1257–1260, 2013. [Google Scholar]
  • [15].Kurz G, Gilitschenski I, and Hanebeck UD, “Recursive nonlinear filtering for angular data based on circular distributions,” 2013 American Control Conference, pp. 5439–5445, 2014. [Google Scholar]
  • [16].Tronarp F, Hostettler R, and Sarkk a S, “Continuous-Discrete von Mises-Fisher Filtering on S2 for Reference Vector Tracking,” 2018 21st International Conference on Information Fusion, FUSION 2018, no. July, pp. 1345–1352, 2018. [Google Scholar]
  • [17].Hanzon B. and Hut R, “New Results on the Projection Filter,” Research Memorandum, vol. 105, no. March, pp. 9457–9475, 1991. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0377221796004043 [Google Scholar]
  • [18].Brigo D, Hanzon B, and Le Gland F, “Approximate nonlinear filtering by projection on exponential manifolds of densities,” Bernoulli, vol. 5, no. 3, pp. 495–534, 1999. [Online]. Available: http://projecteuclid.org/euclid.bj/1172617201 [Google Scholar]
  • [19].Tronarp F. and Sarkk a S, “Continuous-Discrete Filtering and Smoothing on Submanifolds of Euclidean Space,” arXiv:2004.09335 [math, stat], Apr. 2020, arXiv: 2004.09335. [Online]. Available: http://arxiv.org/abs/2004.09335 [Google Scholar]
  • [20].Nusken N, Reich S, and Rozdeba PJ, “State and Parameter Estimation from Observed Signal Increments,” Entropy, vol. 21, no. 5, p. 505, May 2019. [Online]. Available: https://www.mdpi.com/1099-4300/21/5/505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Stratonovich RL, “Conditional Markov Processes,” Theory of Probability & Its Applications, vol. 5, no. 2, pp. 156–178, Jan 1960. [Online]. Available: http://epubs.siam.org/doi/10.1137/1105015 [Google Scholar]
  • [22].Kushner HJ, “On the Differential Equations Satisfied by Conditional Probablitity Densities of Markov Processes, with Applications,” Journal of the Society for Industrial and Applied Mathematics Series A Control, vol. 2, no. 1, pp. 106–119, Jan 1964. [Online]. Available: http://epubs.siam.org/doi/10.1137/0302009 [Google Scholar]
  • [23].Bain A. and Crisan D, Fundamentals of stochastic filtering, ser. Stochastic modelling and applied probability. New York: Springer, 2009, no. 60, oCLC: ocn213479403. [Google Scholar]
  • [24].Brigo D, Hanzon B, and LeGland F, “A differential geometric approach to nonlinear filtering: the projection filter,” IEEE Transactions on Automatic Control, vol. 43, no. 2, pp. 247–252, 1998. [Online]. Available: http://ieeexplore.ieee.org/document/661075/ [Google Scholar]
  • [25].van Handel R. and Mabuchi H, “Quantum projection filter for a highly nonlinear model in cavity QED,” Journal of Optics B: Quantum and Semiclassical Optics, vol. 7, no. 10, pp. S226–S236, Oct 2005. [Online]. Available: https://iopscience.iop.org/article/10.1088/1464-4266/7/10/005 [Google Scholar]
  • [26].Amari S, Differential-geometrical methods in statistics, 2nd ed., ser. Lecture notes in statistics. Berlin; New York: SpringerVerlag, 1990, no. 28. [Google Scholar]
  • [27].Wong E. and Zakai M, “On the relation between ordinary and stochastic differential equations,” International Journal of Engineering Science, vol. 3, no. 2, pp. 213–229, Jul 1965. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/0020722565900455 [Google Scholar]
  • [28].Gatto R. and Jammalamadaka SR, “The generalized von Mises distribution,” Statistical Methodology, vol. 4, no. 3, pp. 341–353, Jul 2007. [Online]. Available: https://escholarship.org/uc/item/8m91b0p7 [Google Scholar]
  • [29].Gatto R, “Some computational aspects of the generalized von Mises distribution,” Statistics and Computing, vol. 18, no. 3, pp. 321–331, Sep 2008. [Google Scholar]
  • [30].Kloeden PE and Platen E, Numerical solution of stochastic differential equations, corr. 3. print ed., ser. Applications of mathematics. Berlin: Springer, 2010, no. 23. [Google Scholar]
  • [31].Kutschireiter A, Basnak MA, Wilson RI, and Drugowitsch J, “A Bayesian perspective on the ring attractor for heading-direction tracking in the Drosophila central complex,” Tech. Rep., Dec. 2021, bioRxiv, Cold Spring Harbor Laboratory, Section: New Results Type: article. [Online]. Available: https://www.biorxiv.org/content/10.1101/2021.12.17.473253v1 [Google Scholar]
  • [32].Doucet A, Godsill S, and Andrieu C, “On sequential Monte Carlo sampling methods for Bayesian filtering,” Statistics and Computing, p. 12, 2010. [Google Scholar]

RESOURCES