Abstract
The unscented transform uses a weighted set of samples called sigma points to propagate the means and covariances of nonlinear transformations of random variables. However, unscented transforms developed using either the Gaussian assumption or a minimum set of sigma points typically fall short when the random variable is not Gaussian distributed and the nonlinearities are substantial. In this paper, we develop the generalized unscented transform (GenUT), which uses 2n+1 sigma points to accurately capture up to the diagonal components of the skewness and kurtosis tensors of most probability distributions. Constraints can be analytically enforced on the sigma points while guaranteeing at least second-order accuracy. The GenUT uses the same number of sigma points as the original unscented transform while also being applicable to non-Gaussian distributions, including the assimilation of observations in the modeling of infectious diseases such as coronavirus (SARS-CoV-2) causing COVID-19.
Index Terms—: Unscented transform, Probability distributions, Estimation, Kalman filtering, Infectious disease
I. Introduction
THE Kalman filter provides the basis for most of the popular state estimation techniques used for linear and nonlinear dynamic systems. The linear Kalman filter works by propagating the means and covariance of the state of a dynamic system [1], [2]. Originally developed under the Gaussian assumption for measurement and process noise, the Kalman filter is the optimal estimator when this assumption is satisfied. Under non-Gaussian noise, the Kalman filter is the optimal linear estimator but its performance can sometimes deteriorate [1], [3].
For many dynamic systems in practice, linearity is a reasonable assumption. For others, system non-linearities cause methods based on linear models to perform poorly. Most non-linear systems can behave approximately linearly over small operation ranges. The extended Kalman filter (EKF) is one of the most widely used Kalman filter for nonlinear dynamic systems. The EKF employs a linear approximation of the nonlinear system around a nominal state trajectory [1], [2], [4]. However, for highly nonlinear systems, linear approximations can introduce errors that can lead to divergence of the state estimate.
To address the drawbacks of the EKF, several well-known state estimators such as the ensemble Kalman filter [5]–[8], the unscented Kalman filter (UKF) [9], [10], and the particle filter [1], [11] have been developed. Although the particle filter can give better performance than the UKF, this comes at the cost of a higher computational effort. In some applications, the improved performance might not be worth the additional computational costs [1].
The UKF is a nonlinear filter that uses the unscented transformation to approximate the mean and covariance of a Gaussian random variable [9], [12]. The unscented transform uses the intuition that with a fixed number of parameters it should be easier to approximate a Gaussian distribution than it is to approximate an arbitrary nonlinear function or transformation [9]. It produces sets of vectors called sigma points that capture the moments of the standard Gaussian distribution. The UKF uses the generated sigma points to obtain estimates of the states and the state estimation error covariance. The UKF has been used to generate distributions which improve the performance of a particle filter [13], [14]. It has also been employed to improve the performance of the EnKF [15]. Despite the several types of sigma points that exist in the literature [16], [17], a majority of them that were not developed using the Gaussian assumption do not try to match the skewness or kurtosis of a random variable, thereby ensuring only second-order accuracy.
The need to effectively monitor, predict, and control the spread of infectious disease has led to the application of numerous state estimation techniques. The EKF [18], [19] and the particle filter [20] have been used to estimate the parameters of the measles virus transmission dynamics from real data. The ensemble adjustment Kalman filter (EAKF) has been employed in the forecasting of influenza [21] and dengue fever [22]. Several infectious disease such as Ebola [23], HIV [24], and neonatal sepsis [25] have seen implementation of different Kalman filters. More recently, the outbreak of the novel coronavirus (SARS-CoV-2) causing COVID-19 has led to concerted efforts to properly understand its transmission and offer policy guidelines that can mitigate its spread. Recent efforts have employed the iterated EAKF to assimilate daily observations in the modeling of COVID-19 [26]. Distributions such as Poisson, negative-binomial, and binomial are typically used for modeling infectious disease from count data. Additionally, the number of patients arriving at a hospital or a testing center can be modeled by a Poisson distribution whose rate is proportional to the infected population. Although the use of standard Kalman filters in infectious disease estimation and prediction under the Poisson assumption can be justified with the fact that a Poisson distribution with a large rate can be approximated by a Gaussian distribution of the same mean and variance, the approximation breaks down when the rates are small [27].
The usage of Kalman filters to assimilate data generated by the transformation of random variables from different probability distributions revealed a fundamental mismatch in the application of the filters – the accuracy of the filter is reduced if the Gaussian assumption is not satisfied and the nonlinearities are high. This led to the development of unscented transforms that can account for some higher-order moment information such as the skewness and kurtosis [28]–[32]. The unscented transforms can be grouped into two categories: the ones that employ 2n + 1 sigma points [28]–[30] and the ones that use more than 2n + 1 sigma points [31], [32].
First, we consider those that use 2n+1 sigma points. In [28], an unscented transform was developed to match the average marginal skewness and kurtosis. The method however did not match the true skewness and true kurtosis for each element of the random vector. In [29], a randomized unscented transform was used in the development of a filter for non-Gaussian systems. Although the method uses a stochastic integration rule to solve state and measurement statistics, the sigma points are generated under the Gaussian assumption. In [30], an unscented transform was developed to capture the skewness of a random vector. However, the method assumes a closed skew normal distribution in its development. All preceding methods that use 2n + 1 sigma points either apply only to special distributions or can capture at most the average skewness and kurtosis.
Now we consider those that use more than 2n + 1 sigma points. In [31], an unscented transform was developed to match the first four moments of Gaussian random variables. In [32], a higher order unscented transform was developed to match the skewness and kurtosis tensors with high accuracy. The method uses an approximate CANDECOMP/PARAFAC (CP) tensor decomposition to generate its sigma points. However, depending on the dimension of the problem and the error tolerance level in approximating the skewness and kurtosis tensors, this method can require significant computational costs. This is because the sequence of vectors and constants used in the approximate CP method can significantly increase when the error tolerance level is made small. All preceding methods that use more than 2n+1 sigma points either applied to only to special distributions or had significantly higher complexity and computational cost.
For an n-dimensional random vector, 2n + 1 sigma points generally employs 2n2+3n+1 free parameters (2n+1 weights and 2n2+n constants that define the coordinates of the sigma points). Trying to match the mean, covariance, skewness, and kurtosis imposes n, O(n2), O(n3), and O(n4) constraints respectively. In principle, it is impossible to match all these moments using only 2n + 1 sigma points. The zero skewness nature of the Gaussian distribution made it possible to use 2n+1 sigma points to accurately match up to the skewness in [9]. The presence of the O(n3) skewness and O(n4) kurtosis constraints are what prompted researchers to look beyond 2n + 1 sigma points. However, we note that matching the mean and covariance constraints of any random vector using 2n + 1 sigma points still leaves n2 + 2n + 1 free parameters. These residual parameters have been underutilized in capturing as much information as possible about the components of the skewness and kurtosis tensors when the random variable is not Gaussian. One instance where the residual parameters were leveraged was in the capturing of the average marginal skewness and kurtosis, which only represents a total of 2 constraints [28].
In this paper, we develop the generalized unscented transform (GenUT) which is able to adapt to the unique statistics of most probability distributions. We use the intuition that employing sigma points more suitable to the inherent distributions of a random vector can lead to a more accurate propagation of means and covariances. Our method uses 2n + 1 sigma points that not only accurately matches the mean and covariance matrix, but also takes advantage of the additional free parameters to accurately match the diagonal components of the skewness tensor and kurtosis tensor of most random vectors. We employ constraints in total; n for the mean, for the covariance, n for the diagonal components of the skewness tensor, and n for the diagonal components of the kurtosis tensor. This total falls within the 2n2 +3n+1 free parameters available. While more parameters remain, the diagonal components of the skewness and kurtosis tensors are the most significant. In comparison to [28]–[31], our method gives a general way to accurately match the diagonal components of the skewness and kurtosis tensors of most random vectors. In comparison to [32], our method uses fewer sigma points which is crucial for larger system dimensions. In comparison to the standard unscented transform, we acquire the most significant higher moment information of most probability distributions with the same number of sigma points.
In Section II, we discuss the problems that arise when the Gaussian assumption is employed in the unscented transform. In Section III, we develop the GenUT sigma points that can capture certain properties of most probability distributions, such as its mean, covariance, skewness, and kurtosis. In Section IV, we show that our sigma points are accurate in approximating the mean, covariance, and diagonal components of the skewness and kurtosis tensors. In Section V, we address constraints and show that imposing constraints can at least maintain second-order accuracy. In Section VI, we evaluate the accuracy of the GenUT sigma points in propagating means and covariances of nonlinear transformations of arbitrarily distributed random vectors and we give several examples that demonstrate its effectiveness when compared against other unscented transforms. We discuss the conclusions in Section VII.
II. Limitations of the Unscented Transform
We analyze the performance of unscented transforms that were motivated by the Gaussian statistics [9], [12]. We will show how linearization approximations, via Taylor series expansion of a nonlinear transformation of a random vector x evaluated about its mean , introduces errors in the propagation of means and covariances. We will see that errors can be introduced in the propagation of means and covariances beyond the second order when used to approximate a nonlinear function λ(x) of a possibly non-Gaussian distributed random vector .
Definition 1. Let be a random vector. We define the mean , covariance , skewness tensor , and kurtosis tensor as
(1) |
(2) |
(3) |
(4) |
for i, j, k, l ∈ {1, ⋯, n}.
The sample mean and sample covariance of the nonlinear transformation given by
(5) |
can be calculated as follows [9].
- Pass the sigma points through the known nonlinear function to get the transformed sigma points
(6) - Evaluate the sample mean of the transformed sigma points
(7) - Evaluate the sample covariance of the transformed sigma points
(8)
A. Accuracy in Approximating the True Mean
Applying a Taylor series expansion of λ(x) about its mean , we show in Appendix A-A that the true mean of y = λ(x) is given as
(9) |
The analytical expression for the approximated mean from [9] is given as
(10) |
Comparing the above equation with the true mean of (9), we notice the following problems about the sigma points developed using the Gaussian assumption
The odd-powered moments in the approximation of the true mean are always zero due to their symmetry. This introduces significant approximation errors in situations where the odd-powered moments of the distribution of x are non-zero and the transformation y = λ(x) is highly nonlinear.
The fourth-order term fails to capture a part of the true kurtosis even when the optimal value of κ = n − 3 is selected because of the Gaussian assumption.
We also note that errors in approximating the mean beyond the second order occur not only for sets of 2n + 1 sigma points existing in the literature, but also for sets of n + 1 sigma points [10], [33] – this is because they do not account for the skewness and kurtosis of x when it is not Gaussian distributed.
B. Accuracy in Approximating the True Covariance Matrix
The true covariance matrix, which was evaluated in Appendix A-B, is given as
(11) |
where we have used the notation xxT = x[⋯]T. The analytical expression for the approximated covariance matrix from [9] is given as
(12) |
Comparing the above equation with the true covariance matrix of (11), we notice similar issues that were pointed out in approximating the mean – the approximation is only accurate up to the second order when x is not Gaussian distributed. All the odd-powered moments are zero because of the symmetric nature of the sigma points, while the fourth-powered moment is also inaccurate because of the Gaussian nature of the sigma points. As with the mean approximation, errors in the covariance matrix approximation are introduced beyond the second order not only for sets of 2n+1 sigma points existing in the literature, but also for sets of n + 1 sigma points.
III. Generalized Unscented Transform
For a random vector , we develop sigma points that can accurately capture the mean, covariance matrix, and the diagonal components of both the skewness tensor and the kurtosis tensor. This is done by selecting sigma point distributions that have the flexibility to either be symmetric when x is symmetrically distributed or be asymmetric when x is asymmetrically distributed.
Assumption 1. The random vector x follows a probability distribution with finite moments.
We reduce the problem of approximating x to the problem of approximating a user-specified arbitrarily distributed random vector with zero mean and unit variance, whose higher-order moments are functions of the higher-order moments of x. We write
(13) |
where is the matrix square root of P,
Definition 2. Let x be a vector, P be a square matrix, and k be some positive integer. We define the element-wise product (Hadamard product) as ⊙, such that
We also define the element-wise division (Hadamard division) as ⊘.
A. One-Dimensional Distribution
We develop sigma points that match the first three moments of z in a single dimension, and then constrain those points to match the fourth moment of z. For a one-dimensional distribution, we will show how to select sigma points such that the first four moments satisfy
To capture the first three moments in a single dimension, three points are used: the first point lies at the origin with a weight of w0; the second point lies at a distance −u from the origin with a weight of w1; the third point lies at a distance v from the origin with a weight of w2. Therefore, in one-dimension, we use the following 3 sigma points
where w0, w1, and w2 are the weights for the respective sigma points. A visual representation of our sigma points in one dimension is shown in Fig. 1. Obeying the moments of z and the fact that the sum of all weights should equal 1, we write
(14) |
(15) |
(16) |
(17) |
From (15), we see that . Rewriting (16) using (17) gives
(18) |
(19) |
We designate u as the free parameter while assuming that u > 0. Using the fact that v2 − u2 = (u + v)(v − u), substituting (18) into (19) gives
(20) |
From (14) and (18), we see that the weights are given as
(21) |
We note that the free parameter u can be selected to match the fourth moment of z. We now attempt to satisfy the fourth moment constraint given by
(22) |
Eliminating w1 using gives
(23) |
Using the relationships w2v(u + v) = 1, u3 + v3 = (u + v)(u2 +v2 − uv), and , the above equation reduces to
The solution to the above quadratic equation is
(24) |
where v is given in (20). The equations for w1, w2, and w0 remain unchanged.
Fig. 1.
Samples chosen for a one-dimensional distribution for the GenUT. The locations and weights of the sigma points are determined by the moments of the probability distribution.
Remark 1. We note that the sigma points described above, which accurately capture the kurtosis when constrained, were designed for when the state has a dimension of 1. This implies that z, P, S, .
In the next section, we extend this to multiple dimensions.
B. Multi-Dimensional Distribution
For an n-dimensional vector z, we develop a set of sigma points that accurately matches its mean and covariance matrix, while accurately matching the diagonal components of the skewness tensor. Furthermore, by constraining the sigma points, we show that we can accurately match the diagonal components of the kurtosis tensor. We note that for an independent random vector, accurately matching the diagonal components of the skewness tensor implies an accurate matching of the entire skewness tensor.
Definition 3. We define the vectors and which contain the diagonal components of the skewness tensor and kurtosis tensor respectively, such that
For a multi-dimensional distribution, we will show how to select the 2n+1sigma points such that the first four moments satisfy
where is the identity matrix.
Remark 2. Due to the positive definiteness of the covariance matrix , it is always invertible.
A visual representation of our sigma points for a two-dimensional distribution is shown in Fig. 2. Our first point lies at (0, 0) with a weight of w0. Our second point lies on the coordinate axes a distance −u1 from the origin with a weight of w1. Our third point lies on the coordinate axes a distance −u2 from the origin with a weight of w2. Our fourth point lies on the coordinate axes a distance v1 from the origin with a weight of w3. Our fifth point lies on the coordinate axes a distance v2 from the origin with a weight of w4. Therefore, our unscented transform uses the following 2n+1 sigma points
where I[i] is the ith column of the identity matrix. is a vector of zeros. We note that u = [u1, u2, ⋯, un]T and v = [v1, v2, ⋯, vn]T
Fig. 2.
Samples chosen for a two-dimensional distribution for the GenUT. The locations and weights of the sigma points are determined by the moments of the probability distribution.
Definition 4. We partition the weight vector w = [w0, w1,·⋯, w2n]T by defining w′ = [w1, w2,·⋯, wn]T and w″ = [w1+n, w2+n,·⋯, w2n]T such that w = [w0, w′T, w″T]T.
Obeying the moments of z, we write
(25) |
(26) |
(27) |
(28) |
where is a vector of ones. From (26), we see that w′ = w″ ⊙ v ⊘ u. Rewriting (27) and (28) gives
(29) |
(30) |
Selecting u > 0 as the free parameters, we get
(31) |
Therefore, from (25) and (29), we see that
(32) |
To match the diagonal components of the kurtosis tensor, we need to satisfy
(33) |
Solving the above equation results in constrained values for u, such that
(34) |
It can be shown from (13) that the algorithm for selecting the 2n + 1 sigma points for any random vector x is given in Algorithm 1.
We recall from (31) the constraint u > 0 exists. Applying this constraint on (34), we see that
(35) |
The inequality in (35) – at least for a one-dimensional case – agrees with the findings by Pearson in [34] that for probability distributions, the standardized kurtosis always exceeds the squared of the standardized skewness. If the inequality in (35) were violated, then (34) becomes infeasible, which in turn requires the free parameter u > 0 in (31) to be selected such that v > 0 – although this eliminates the accuracy in matching the diagonal components of the kurtosis tensor, the sigma points are still able to accurately match the diagonal components of the skewness tensor.
There might be concerns that v in (31) might be negative whenever the term is negative. If (35) is satisfied, then selecting u using (34) leads to v > 0. Alternatively, arbitrarily selecting u such that ensures that v > 0.
Algorithm 1 can be used to create sigma points that can match up to the kurtosis if (35) is satisfied. For example, we want to prescribe some arbitrary mean, variance, skewness, and kurtosis for a random variable x that is not from any known probability distribution. Randomly selecting the mean , variance P, and skewness S as , P = 0.2, and S = −0.5 respectively, we can use Algorithm 1 to match them exactly. However, we can not randomly select a kurtosis K and expect to match it. The selection of the kurtosis K must satisfy (35), so for this example, we require . Prescribing a kurtosis value of K = 1.3 satisfies (35). Now using Algorithm 1, we see that w0 = 0.2, w1 = 0.0286, w2 = 0.7714, u = 5.8055, and v = 0.2153. The sample mean, sample covariance, sample skewness, and sample kurtosis exactly match their true prescribed values. We show how to calculate the sample statistics in Section IV.
C. Moments of a Probability Distribution
We use the moment generating function (MGF) M(t) to evaluate the mean and higher-order central moments of a probability distribution. For any random variable x [35], its MGF and n-th moment are given by
(36) |
We also use the gamma notation
(37) |
The first four moments of 10 different probability distributions can be found in Table I.
TABLE I.
Probability Distributions
Random Variable | Probability density function |
Mean |
Variance (P) |
Skewness (S) |
Kurtosis (K) |
---|---|---|---|---|---|
Gaussian (μ, σ2) | , x ∈ (−∞, ∞) | μ | σ 2 | 0 | 3σ4 |
Exponential E(λ) | λe−λx, x ≥ 0, λ > 0 | ||||
Gamma G(a, b) |
, x ≥ 0, a > 0, b > 0 |
ab | ab 2 | 2ab3 | 3ab4(a+2) |
Weibull W(a, b) |
, x ≥ 0, a > 0, b > 0, |
aΓ1b | |||
Rayleigh R(σ) | , x ≥ 0 | ||||
Beta BE(a, b) |
, x ∈ (0, 1), a > 0, b > 0 ζk = a + b + k |
||||
Binomial B(n, p) |
, p ∈ [0, 1], k = 0, 1, 2, … , n |
np | np(1 − p) | np(1 − p) (1 − 2p) | np(1 − p) (1+p(1 − p)(3n − 6)) |
Poisson P(A) |
, λ > 0, k = 0, 1, 2, … , ∞ |
λ | λ | λ | 3λ2 + λ |
Geometric GE(p) |
p(1 − p)k, p ∈ (0, 1], k = 0, 1, 2, … ,∞ |
||||
Negative Binomial NB(r, p) |
, k = 0, 1, 2, … , ∞ |
IV. Accuracy of Sigma Point Sample Statistics
We demonstrate the accuracy of our sigma points in approximating any random vector .
Theorem 1. Let be any random vector with mean and covariance matrix P, skewness tensor S, and kurtosis tensor K. The following statements are true for the 2n + 1 sigma points be defined as shown in Algorithm 1.
The sample mean, is equal to .
The sample covariance matrix, , is equal to P.
The sample skewness tensor . is equal to Sjkl if j = k = l.
The sample kurtosis tensor , is equal to Kjklm if j = k = l = m whenever .
Proof. For our proof, we introduce diagonal matrices U, such that U = diag(u) and V = diag(v). In matrix form, we evaluate the sample mean as
(38) |
because and w″ ⊙ v = w′ ⊙ u. We see that the sample mean equals the actual mean. Evaluating the sample covariance matrix, we get
(39) |
because w′ ⊙ u⊙2 + w″ ⊙ v⊙2 = 1 is the diagonal of diag(w′)U2+diag(w″)V2. We see that the sample covariance matrix equals the actual covariance matrix. Defining as a vector containing the diagonal components of the sample skewness tensor such that
we can evaluate the diagonal components of the sample skewness tensor as
(40) |
(41) |
(42) |
We see that our sigma points accurately match the diagonal components of the skewness tensor. Finally, defining as a vector containing the diagonal components of the sample kurtosis tensor such that
we can evaluate the diagonal components of the sample kurtosis tensor as
(43) |
(44) |
We see that our sigma points accurately match the diagonal components of the kurtosis tensor. □
Theorem 1 shows that our sigma points in Algorithm 1 can accurately approximate the mean and covariance of any random vector, as well as the diagonal components of the skewness and kurtosis tensors – this makes it applicable to a wide variety of applications.
V. Constrained Sigma Points
Noting that several physical systems require some constraints on their states or parameters, we show how our sigma points can be constrained while at least maintaining second-order accuracy.
We require the sigma points to be constrained such that
where and are the lower bounds and upper bounds respectively.
Assumption 2. The mean is within the bounds, such that
We note that our sigma points of Algorithm 1 can violate some state constraints despite being able to accurately capture the mean and covariance of a random vector, as well as the diagonal components of its skewness and kurtosis tensors. This might make them inapplicable in situations/models that only permit constrained values. For example, in applications that assume a Poisson distribution for the states, such as count data, the states are usually positive by default and can never be negative. When our sigma point of Algorithm 1 is applied, the positive constraint on an independent random vector can be violated. We demonstrate this using the following example.
Example 1. We generate sigma points for an independent Poisson random vector x such that
where is the mean, P is the covariance matrix, and and are vectors containing the diagonal components of the skewness tensor and kurtosis tensor respectively. Using Algorithm 1, we see that w0 = 0.3333, w1 = 0.2049, w2 = 0.2129, w3 = 0.1284, w4 = 0.1204, u1 = 1.3713, u2 = 1.3028, v1 = 2.1878, and v2 = 2.3028. The 2n + 1 sigma points in matrix form is
The sample statistics are
We see from Example 1 that despite the accuracy of the sample statistics, the sigma points χ[1] and χ[2] both had a negative value which do not satisfy the non-negativity of Poisson draws.
Corollary 1. If the bound is violated after implementing Algorithm 1, then enforcing the constraint leads to accuracy in capturing only the mean, covariance matrix, and the diagonal components of the skewness tensor.
Proof. Lower bounding x will require redefining the variable u such that (34) is no longer satisfied. Theorem 1 establishes that violating (34) ensures an inaccurate approximation of the diagonal components of the kurtosis tensor.
Corollary 2. If the bound is violated after implementing Algorithm 1, then enforcing either or leads to accuracy in capturing only the mean and covariance matrix.
Proof. Both cases, or , require the redefinition of the variable v. This means that (31) will no longer be satisfied. Theorem 1 establishes that violating (31) ensures an inaccurate approximation of the diagonal components of skewness and kurtosis tensor.
To enforce constraints on the sigma points, we introduce a slack parameter θ ∈ (0, ⋯, 1) which is a user selected constant. Using θ, we now redefine the free parameters ui and vi for i ∈ {1, ⋯, n} as
where |.| denotes the absolute value, and the sigma points get closer to their constraints as θ → 1. We note that the equations for w′ and w″ are unchanged.
We note that enforcing constrains on the sigma points results in a loss of accuracy in capturing the diagonal components of at least the kurtosis tensor. The constrained sigma point algorithm is given in Algorithm 2. We now show a benefit of Algorithm 2 in the following example.
Example 2. Using Algorithm 2 to generate positively constrained sigma points for the Poisson random vector, we select θ = 0.9. We see that w0 = −0.0576, w1 = 0.3003, w2 = 0.3968, w3 = 0.1725, w4 = 0.188, u1 = 1.1023, u2 = 0.9, v1 = 1.9188, and v2 = 1.9. The 2n + 1 positive sigma points in matrix form is
while the corresponding sample statistics are
We see from Example 2 that using Algorithm 2 ensures that the sigma points are always positive while ensuring accuracy in approximating the true mean and covariance of a random vector, as well as capturing the diagonal components of the skewness tensor. However, the ability to exactly capture the diagonal components of the kurtosis tensor is lost. A graphical representation of Examples 1 and 2 is shown in Fig. 3 where we plot the sigma points and the covariance.
Fig. 3.
(a) Locations of sigma points for the unconstrained (Algorithm 1), truncated, and constrained (Algorithm 2) sigma points. (b) Mean and covariance of the unconstrained (Algorithm 1), truncated, and constrained (Algorithm 2) sigma points.
VI. Propagation of Means and Covariances of Nonlinear Transformations
We analyze the performance of our new sigma point algorithm when they undergo nonlinear transformations. We will show how linearization approximations, via Taylor series expansion of a nonlinear transformation of a random vector x evaluated about its mean , introduce errors in the propagation of means and covariances. In Appendix A, we evaluated the true mean and true covariance of a random vector, as well as the approximated mean and approximated covariance. We see that although errors are introduced beyond the third order when approximating a nonlinear transformation of a random vector, these errors are minimized because of our ability to match the diagonal components of the skewness and kurtosis tensors. We also see that errors are introduced beyond the third order when the random vector is independent.
We will see that errors can be introduced in the propagation of means and covariances beyond the second order when sigma points developed under the Gaussian assumption [9], [12], [36] are used to approximate the nonlinear function λ(x) when x is an independent random vector. We note that the nonlinear transformation is given by
(45) |
where . , P, , and , we evaluate the sample mean and covariance of the nonlinear transformation of (45) using Algorithm 1.
For our comparison, we use the scaled unscented transform of [36], which is denoted as UT for the remainder of this paper, and the higher order sigma point unscented transform (HOSPUT) of [28]. The scaling of the UT was selected to match a Gaussian distribution. We do not compare against the sigma points in [29]–[32] because they either use a Gaussian assumption, a closed skew normal distribution, or more than 2n+1 sigma points. The sample mean and sample covariance can be evaluated using (6)–(8).
A. Case Study 1 – Transformation of Random Variables
Defining x as a random variable that can follow any of the probability distributions given in the Table I, we evaluate the sample mean and covariance of two nonlinear transformations: a quadratic function of the random variable y = 3x+2x2, and a trigonometric function of the random variable y = sin(x). We also use 105 Monte Carlo draws from the different probability distributions. The true mean and covariance of the quadratic function can be easily evaluated using the raw moments of x up to its fourth order. The true mean and covariance of the trigonometric function can be evaluated using their characteristic functions. A comparison between the accuracy of the GenUT, UT, 105 Monte Carlo draws, and HOSPUT in approximating the true mean and true covariance of the nonlinear transformations for the different probability distributions is shown in Tables II–V.
TABLE II.
Percentage error in Propagating the mean of y = 3x + 2x2
x | GenUT | UT | MC | HOSPUT |
---|---|---|---|---|
𝒩(1, 4) | 0 | 0 | 0.015 | 0 |
E(2) | 0 | 0 | 0.069 | 0 |
G(1, 2) | 0 | 0 | 0.452 | 0 |
W(1, 2) | 0 | 0 | 0.005 | 0 |
R(1) | 0 | 0 | 0.097 | 0 |
BE(3, 4) | 0 | 0 | 0.063 | 0 |
B(3, 0.3) | 0 | 0 | 0.457 | 0 |
P(2) | 0 | 0 | 0.270 | 0 |
GE(0.5) | 0 | 0 | 1.251 | 0 |
NB(4, 0.67) | 0 | 0 | 0.668 | 0 |
TABLE V.
Percentage error in Propagating the covariance of y = sin(x)
x | GenUT | UT | MC | HOSPUT |
---|---|---|---|---|
(1.57, 0.1) | 5.026 | 5.026 | 0.444 | 5.026 |
E(2) | 23.499 | 72.557 | 0.213 | 23.499 |
G(0.5, 0.5) | 20.749 | 61.391 | 0.372 | 20.749 |
W(1, 2) | 4.862 | 31.760 | 0.043 | 4.862 |
R(1) | 12.158 | 50.678 | 0.531 | 12.158 |
BE(3,4) | 0.031 | 0.940 | 0.225 | 0.031 |
B(3, 0.3) | 11.033 | 24.806 | 0.060 | 11.033 |
P(0.1) | 6.646 | 45.895 | 0.461 | 6.646 |
GE(0.7) | 12.074 | 87.637 | 0.070 | 12.074 |
NB(0.4, 0.67) | 39.068 | 135.783 | 0.366 | 39.068 |
For the quadratic function, we see that both the GenUT and HOSPUT gave an exact approximation of the true mean and true covariance for all the probability distributions while the UT was only accurate in approximating the true mean when the probability distribution was not Gaussian. This is because the GenUT and HOSPUT are accurate up to the fourth order moments when the random variable x has a dimension of 1. Although the 105 Monte Carlo draws gave relatively good approximations, they were not as accurate as the GenUT.
For the trigonometric function, we see that the GenUT, HOSPUT, and UT were unable to give exact approximations of the true mean and true covariance in most cases because the Taylor series expansion of λ(x) has terms beyond the fourth order. The GenUT and HOSPUT were more accurate than the UT for all the non-Gaussian probability distributions because they are both accurate up to the fourth order while the UT is accurate up to the second order. The 105 Monte Carlo draws sometimes gave better accuracy than the GenUT because of the random nature of its draws. A box plot of the accuracy of the GenUT, UT, and several Monte Carlo draws of different sizes is shown in Fig. 4 for the trigonometric function. We do not include the HOSPUT because it gives the same performance as the GenUT when a single random variable is transformed. We see that a significant number of Monte Carlo draws is needed to achieve the accuracy of the GenUT when approximating the mean. A significant number of Monte Carlo draws gives better accuracy in approximating the variance.
Fig. 4.
(a) Moments of y = sin(x) when x is a Poisson random variable. (b) Moments of y = sin(x) when x is a Weibull random variable.
B. Case Study 2 – Transformation of a Random Vector
We examine the performance of the GenUT, HOSPUT, and UT in approximating the true mean and covariance of a nonlinear transformation of different random variables such that
(46) |
We calculate the true mean and true covariance of y using 107 Monte Carlo draws. The percentage error in approximating each element of the mean is
The percentage error in approximating each element of the covariance matrix is
We see that for the nonlinear transformation, the GenUT gave the lowest percentage error when approximating the elements of the mean and covariance matrix. The UT gave the worst performance because it was unable to account for the non-Gaussian distributed nature of the random variable x. The HOSPUT performed worse than the GenUT because, when the problem dimension exceeds 1, it is only able to match the average values of the diagonal elements of the skewness and kurtosis tensors.
C. Case Study 3 - Infectious Disease Models
We consider an SIR (susceptible-infectious-recovered) infectious disease model given by the difference equation [37]
(47) |
where β is the infection rate, γ is the recovery rate, and N = Sk+Ik+Rk. Using the conservation principle S+I+R = N, we reduce the model of (47) to
We note that by defining x = Poisson[Ik Rk]T, we can rewrite the above equation as
(48) |
where xi is the ith element of the vector x.
We examine the performance of the GenUT, HOSPUT, and UT in approximating the true mean and covariance of (48). We use the parameters Ik = 10, Rk = 2, β = 1.5, γ = 0.3, and N = 100. The percentage error in approximating each element of the mean is
The percentage error in approximating each element of the covariance matrix is
We see that the GenUT gave the least approximation error of the true covariance matrix. The inability of the GenUT to exactly match the true covariance matrix is because the GenUT is only able to capture the diagonal components of the skewness and kurtosis tensors.
VII. Conclusion
In this paper we have developed the generalized unscented transform (GenUT) that is capable of adapting to the unique statistics of an arbitrarily distributed random variable. We showed that due to its ability to match the diagonal elements of the skewness and kurtosis tensors of most random vectors using 2n+1 sigma points, the GenUT is preferable to and more accurate than unscented transforms that were either developed using the Gaussian assumption or were developed without any probability distribution in mind.
In terms of ease of implementation, we demonstrated that like the unscented transform originally developed in [12] which uses 2n + 1 sigma points, the GenUT uses the same number of sigma points. When compared against unscented transforms that employ more than 2n + 1 sigma points, the GenUT is characterized by a lower computational cost due to its lower number of sigma points that scales linearly with the problem dimension.
In terms of performance, the GenUT and unscented transforms that use 2n + 1 sigma points developed under the Gaussian assumption give the same performance when the random variable is Gaussian distributed. However, when the random variable or random vector is not Gaussian distributed, the GenUT gives better accuracy in the propagation of means and covariances. Additionally, we also showed that the GenUT formulation makes it easy to analytically enforce constraints on the sigma points while still guaranteeing at least a second-order accuracy, which makes it appealing in models that permit only constrained values for random variables or parameters.
For uncertainty quantification, estimation, or prediction applications, when compared to existing unscented transforms, the GenUT gives the most accuracy that can be gotten by employing 2n+1 sigma points. This accuracy will have more significant consequences if the nonlinearities are strong and the problem dimension is large. The GenUT can be applied to any filter that uses linear or nonlinear transformations of random variables. The MATLAB® source code used to generate the results in this paper is available at [38].
TABLE III.
Percentage error in Propagating the covariance of y = 3x+2x2
x | GenUT | UT | MC | HOSPUT |
---|---|---|---|---|
𝒩(1, 4) | 0 | 0 | 0.029 | 0 |
E(2) | 0 | 49.057 | 0.249 | 0 |
G(1, 2) | 0 | 64 | 1.889 | 0 |
W(1, 2) | 0 | 15.003 | 0.310 | 0 |
R(1) | 0 | 16.815 | 0.381 | 0 |
BE(3, 4) | 0 | 2.307 | 0.613 | 0 |
B(3, 0.3) | 0 | 16.380 | 0.359 | 0 |
P(2) | 0 | 25.946 | 1.061 | 0 |
GE(0.5) | 0 | 67.662 | 1.036 | 0 |
NB(4, 0.67) | 0 | 43.224 | 2.356 | 0 |
TABLE IV.
Percentage error in Propagating the mean of y = sin(x)
x | GenUT | UT | MC | HOSPUT |
---|---|---|---|---|
(1.57, 0.1) | 0.001 | 0.001 | 0.012 | 0.001 |
E(2) | 0.219 | 5.788 | 0.110 | 0.219 |
G(0.5, 0.5) | 0.312 | 6.964 | 0.050 | 0.312 |
W(1, 2) | 0.017 | 0.831 | 0.029 | 0.017 |
R(1) | 0.049 | 0.912 | 0.007 | 0.049 |
BE(3, 4) | 0 | 0.038 | 0.037 | 0 |
B(3, 0.3) | 0.158 | 4.814 | 0.046 | 0.158 |
P(0.1) | 0.275 | 18.305 | 0.531 | 0.275 |
GE(0.7) | 2.416 | 32.906 | 0.138 | 2.416 |
NB(0.4, 0.67) | 0.176 | 44.172 | 0.383 | 0.176 |
Acknowledgments
This work was supported by NIH Director’s Transformative Award No. 1R01AI145057, and from the National Science Foundation DMS-1723175, DMS-1854204, and DMS-2006808.
Appendix A. True Mean and Covariance of Nonlinear Transformations
We derive analytical expressions for the true mean and covariance when we take the Taylor series expansion of the nonlinear function y = λ(x) where x is a random vector.
A. True Mean of the Nonlinear Transformation
Applying Taylor series expansion around , where , we write the true mean of y as
(49) |
where is the total differential of λ(x) when perturbed around a nominal value by . We note that
(50) |
Using (50), we can evaluate the true mean of (49) as
(51) |
where , , and .
B. True Covariance of the Nonlinear Transformation
The true covariance of y is given as
(52) |
Evaluating the expression , we write
(53) |
Substituting (53) into (52) gives
(54) |
We note that we can write the first term in the above equation as
(55) |
Using (50) and (55), we can rewrite the true covariance matrix of (54) as
(56) |
where we have used the notation xxT = x[⋯]T.
Appendix B. Approximation of Means and Covariances using the Generalized Unscented Transform
We analytically show the accuracy in capturing the true mean and true covariance of y = λ(x) when using our 2n+1 sigma points. We also show that our sigma point transformations give improved accuracy by capturing the diagonal components of the skewness and kurtosis tensors. We define while recalling that C = PPT. We note that
(57) |
A. Approximation of the Mean
The approximated mean is given as
Using (57), we can evaluate the above equation as
(58) |
where , and .
In the Section IV, we already showed that we can accurately capture the diagonal components of the skewness and kurtosis tensors because whenever j = k = l and whenever j = k = l = m. Therefore, by comparing (58) with the true mean of (51), we can see that our sigma points improves on the accuracy of propagating the mean of a nonlinear transformation.
B. Approximation of the Covariance
The approximated covariance can be evaluated using the expression
(59) |
From
(60) |
Substituting (60) into (59) and multiplying out gives
(61) |
For the first term in (61),
(62) |
Using (57) and (62), we can rewrite the approximated covariance matrix of (61) as
(63) |
Comparing (63) with the true covariance of (56), we can see that our sigma points improves on the accuracy of propagating the covariance of a nonlinear transformation because we are able to accurately capture the diagonal components of the skewness and kurtosis tensors.
Footnotes
Bold fonts are used to represent vectors, matrices, and tensors.
The notation P[i] represents the ith column of the matrix P, Pij represents the ith entry in the jth column of the matrix P, and xi represents the ith entry of the vector x.
Contributor Information
Donald Ebeigbe, Center for Neural Engineering, Department of Engineering Science and Mechanics, Pennsylvania State University, University Park, PA, USA.
Tyrus Berry, Department of Mathematical Sciences, George Mason University, Fairfax, VA, USA.
Michael M. Norton, Center for Neural Engineering, Department of Engineering Science and Mechanics, Pennsylvania State University, University Park, PA, USA
Andrew J. Whalen, Center for Neural Engineering, Department of Engineering Science and Mechanics, Pennsylvania State University, University Park, PA, USA Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Dan Simon, Department of Electrical Engineering and Computer Science, Cleveland State University, Cleveland, OH, USA.
Timothy Sauer, Department of Mathematical Sciences, George Mason University, Fairfax, VA, USA.
Steven J. Schiff, Center for Neural Engineering and Center for Infectious Disease Dynamics, Departments of Engineering Science and Mechanics, Neurosurgery, and Physics, Pennsylvania State University, University Park, PA, USA
References
- [1].Simon D., Optimal state estimation: Kalman, H infinity, and nonlinear approaches. John Wiley & Sons, 2006. [Google Scholar]
- [2].Kandepu R., Imsland L., and Foss B. A., “Constrained state estimation using the unscented kalman filter,” in 16th Mediterranean Conference on Control and Automation, 2008, pp. 1453–1458. [Google Scholar]
- [3].Izanloo R., Fakoorian S. A., Yazdi H. S., and Simon D., “Kalman filtering based on the maximum correntropy criterion in the presence of non-gaussian noise,” in 50th Annual Conference on Information Science and Systems (CISS), 2016, pp. 530–535. [Google Scholar]
- [4].Gustafsson F. and Hendeby G., “Some relations between extended and unscented kalman filters,” IEEE Transactions on Signal Processing, vol. 60, no. 2, pp. 545–555, 2011. [Google Scholar]
- [5].Evensen G., “Sequential data assimilation with a nonlinear quasigeostrophic model using monte carlo methods to forecast error statistics,” Journal of Geophysical Research: Oceans, vol. 99, no. C5, pp. 10143–10162, 1994. [Google Scholar]
- [6].Houtekamer P. L. and Mitchell H. L., “Data assimilation using an ensemble kalman filter technique,” Monthly Weather Review, vol. 126, no. 3, pp. 796–811, 1998. [Google Scholar]
- [7].Anderson J. L., “An ensemble adjustment kalman filter for data assimilation,” Monthly weather review, vol. 129, no. 12, pp. 2884–2903, 2001. [Google Scholar]
- [8].Berry T. and Sauer T., “Adaptive ensemble kalman filtering of non-linear systems,” Tellus A: Dynamic Meteorology and Oceanography, vol. 65, no. 1, p. 20331, 2013. [Google Scholar]
- [9].Julier S. J. and Uhlmann J. K., “A general method for approximating nonlinear transformations of probability distributions,” Robotics Research Group, University of Oxford, Tech. Rep., 1996. [Google Scholar]
- [10].——, “Reduced sigma point filters for the propagation of means and covariances through nonlinear transformations,” in Proceedings of the American Control Conference, 2002, pp. 887–892. [Google Scholar]
- [11].Kitagawa G., “Monte carlo filter and smoother for non-gaussian nonlinear state space models,” Journal of computational and graphical statistics, vol. 5, no. 1, pp. 1–25, 1996. [Google Scholar]
- [12].Julier S. J. and Uhlmann J. K., “Consistent debiased method for converting between polar and cartesian coordinate systems,” in Acquisition, Tracking, and Pointing XI, vol. 3086, 1997, pp. 110–121. [Google Scholar]
- [13].Rui Y. and Chen Y., “Better proposal distributions: Object tracking using unscented particle filter,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2001, pp. 786–793. [Google Scholar]
- [14].Van Der Merwe R., Doucet A., De Freitas N., and Wan E. A., “The unscented particle filter,” in Advances in neural information processing systems, 2001, pp. 584–590. [Google Scholar]
- [15].Luo X. and Moroz I. M., “Ensemble kalman filter with the unscented transform,” Physica D: Nonlinear Phenomena, vol. 238, no. 5, pp. 549–562, 2009. [Google Scholar]
- [16].Simon D., “Kalman filtering with state constraints: a survey of linear and nonlinear algorithms,” IET Control Theory & Applications, vol. 4, no. 8, pp. 1303–1318, 2010. [Google Scholar]
- [17].Cheng Y. and Liu Z., “Optimized selection of sigma points in the unscented kalman filter,” in International Conference on Electrical and Control Engineering. IEEE, 2011, pp. 3073–3075. [Google Scholar]
- [18].Simons E., Ferrari M., Fricks J., Wannemuehler K., Anand A., Burton A., and Strebel P., “Assessment of the 2010 global measles mortality reduction goal: results from a model of surveillance data,” The Lancet, vol. 379, no. 9832, pp. 2173–2178, 2012. [DOI] [PubMed] [Google Scholar]
- [19].Chen S., Fricks J., and Ferrari M. J., “Tracking measles infection through non-linear state space models,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 61, no. 1, pp. 117–134, 2012. [Google Scholar]
- [20].Bretó C., He D., Ionides E. L., King A. A. et al. , “Time series analysis via mechanistic models,” The Annals of Applied Statistics, vol. 3, no. 1, pp. 319–348, 2009. [Google Scholar]
- [21].Shaman J. and Karspeck A., “Forecasting seasonal outbreaks of influenza,” Proceedings of the National Academy of Sciences, vol. 109, no. 50, pp. 20425–20430, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Yamana T. K., Kandula S., and Shaman J., “Superensemble forecasts of dengue outbreaks,” Journal of The Royal Society Interface, vol. 13, no. 123, p. 20160410, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Ndanguza D., Mbalawata I. S., Haario H., and Tchuenche J. M., “Analysis of bias in an ebola epidemic model by extended kalman filter approach,” Mathematics and Computers in Simulation, vol. 142, pp. 113–129, 2017. [Google Scholar]
- [24].Cazelles B. and Chau N. P., “Using the kalman filter and dynamic models to assess the changing HIV/AIDS epidemic,” Mathematical Biosciences, vol. 140, no. 2, pp. 131–154, 1997. [DOI] [PubMed] [Google Scholar]
- [25].Ebeigbe D., Berry T., Schiff S. J., and Sauer T., “A poisson kalman filter to control the dynamics of neonatal sepsis and postinfectious hydrocephalus,” Physical Review Research, vol. 2, no. 4, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Li R., Pei S., Chen B., Song Y., Zhang T., Yang W., and Shaman J., “Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2),” Science, vol. 368, no. 6490, pp. 489–493, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Curtis L. J., “Simple formula for the distortions in a gaussian representation of a poisson distribution,” American Journal of Physics, vol. 43, no. 12, pp. 1101–1103, 1975. [Google Scholar]
- [28].Ponomareva K., Date P., and Wang Z., “A new unscented kalman filter with higher order moment-matching,” Proceedings of Mathematical Theory of Networks and Systems (MTNS 2010), Budapest, 2010. [Google Scholar]
- [29].Straka O., Duník J., Šimandl, and Blasch E., “Randomized unscentedˇ transform in state estimation of non-gaussian systems: Algorithms and performance,” in 2012 15th International Conference on Information Fusion. IEEE, 2012, pp. 2004–2011. [Google Scholar]
- [30].Rezaie J. and Eidsvik J., “A skewed unscented kalman filter,” International Journal of Control, vol. 89, no. 12, pp. 2572–2583, 2016. [Google Scholar]
- [31].Hou J., Zhou W., Zhang W.-A., Zhang C., Chen C., and Shan C., “High-order unscented transformation based on the bayesian learning for nonlinear systems with non-gaussian noises,” in 2019 15th International Conference on Computational Intelligence and Security (CIS). IEEE, 2019, pp. 26–30. [Google Scholar]
- [32].Easley D. C. and Berry T., “A higher order unscented transform,” SIAM/ASA Journal on Uncertainty Quantification, vol. 9, no. 3, pp. 1094–1131, 2021. [Google Scholar]
- [33].Menegaz H. M., Ishihara J. Y., Borges G. A., and Vargas A. N., “A systematization of the unscented kalman filter theory,” IEEE Transactions on automatic control, vol. 60, no. 10, pp. 2583–2598, 2015. [Google Scholar]
- [34].Pearson K., “Mathematical contributions to the theory of evolution.—XIX. Second supplement to a memoir on skew variation,” Philosophical Transactions of the Royal Society of London, Series A, vol. 216, no. 538–548, pp. 429–457, 1916. [Google Scholar]
- [35].Papoulis A. and Pillai S. U., Probability, random variables, and stochastic processes, 4th ed. Tata McGraw-Hill Education, 2002. [Google Scholar]
- [36].Julier S. J. and Uhlmann J. K., “Unscented filtering and nonlinear estimation,” Proceedings of the IEEE, vol. 92, no. 3, pp. 401–422, 2004. [Google Scholar]
- [37].Keeling M. J. and Rohani P., Modeling infectious diseases in humans and animals. Princeton university press, 2011. [Google Scholar]
- [38].“Generalized unscented transform MATLAB® source code,” https://github.com/Schiff-Lab/Generalized-Unscented-Transform, Accessed: 11-11-2021.