Abstract
The unscented Kalman filter (UKF) is finding increased application in biological fields. While realizing a complex UKF system in a low-power embedded platform offers many potential benefits including wearability, it also poses significant design challenges. Here we present a method for optimizing a UKF system for realization in an embedded platform. The method seeks to minimize both computation time and error in UKF state reconstruction and forecasting. As a case study, we applied the method to a model for the rat sleep-wake regulatory system in which 432 variants of the UKF over six different variables are considered. The optimization method is divided into three stages that assess computation time, state forecast error, and state reconstruction error. We apply a cost function to variants that pass all three stages to identify a variant that computes 27 times faster than the reference variant and maintains required levels of state estimation and forecasting accuracy. We draw the following insights: 1) process noise provides leeway for simplifying the model and its integration in ways that speed computation time while maintaining state forecasting accuracy, 2) the assimilation of observed data during the UKF correction step provides leeway for simplifying the UKF structure in ways that speed computation time while maintaining state reconstruction accuracy, and 3) the optimization process can be accelerated by decoupling variables that directly impact the underlying model from variables that impact the UKF structure.
Keywords: Unscented Kalman filter, Embedded system, Optimization, State estimation, Forecasting, Microcontroller, FPGA, Sleep-wake regulatory system
1. Introduction
Nonlinear Kalman filtering presents exciting possibilities for research in dynamical systems as applied to biology. Among nonlinear Kalman filters, the unscented Kalman filter (UKF) has gained widespread use due to its excellent performance and its deterministic, gradient-free implementation. Through the UKF, physiological states that are difficult or costly to measure can be estimated by the assimilation of more easily acquired or less costly observables Bahari, Tulyaganova, Billard, Alloway and Gluckman (2017). Model states and parameters can be tracked to gain insight into physiological phenomena. And future state forecasts can be made to guide intervention.
Two early studies that demonstrated the utility of the UKF in biological systems employed computer simulations. Voss et al. showed that a single state of model-generated data with added noise could be used to reconstruct unmeasured states in a FitzHugh-Nagumo neuron model Voss, Timmer and Kurths (2004). In Schiff and Sauer (2008), UKF state and parameter estimates were used in closed-loop control of potential waves in a computational model of cerebral cortex.
More recent studies have used the UKF to assimilate in vivo or in vitro data in physiological models. In Saatci and Akan (2009), patient data were used to estimate states and parameters in models of the human respiratory system. In Ullah and Schiff (2010), in vitro neuron membrane potential recordings were assimilated in a model of a small network of Hodgkin-Huxley-type neurons. Other studies have applied the UKF to estimate parameters in the Cad system in E. coli Meskin, Nounou, Nounou and Datta (2013), track anesthetic brain states using patient EEG Kuhlmann, Freestone, Manton, Heyse, Vereecke, Lipping, Struys and Liley (2016), and forecast blood glucose levels for diabetic patients using individually-tailored models Albers, Levine, Gluckman, Ginsberg, Hripcsak and Mamykina (2017). Bahari et al. employed a UKF to predict sleep-state transitions using in vivo neural recordings from chronically implanted animals Bahari, Kimbugwe, Alloway and Gluckman (2021).
Deploying a UKF or related nonlinear Kalman filter (KF) in an embedded system provides numerous advantages over traditional computational platforms in the context of biological systems. The use of embedded hardware dedicated to the specific application allows for tight control over task scheduling, minimizes feedback latency in closed-loop applications, and avoids interruptions in communication that can occur with more complex operating systems. Embedded systems can often be converted into a wearable form. Wearability aids in the constant monitoring of biological signals, increases freedom of movement for the subject or animal, and typically simplifies use and maintenance, which may increase the likelihood of long-term user adoption. For biomedical devices used in translational animal experiments, wearability may ease the path from animal trials to human trials. The embedded deployment of UKFs is also relevant in certain non-biological contexts, such as navigation systems in small unmanned aerial vehicles (UAVs).
However, realization of a linear or nonlinear KF in an embedded system can prove challenging. Embedded processors have limited computational speed compared to their non-embedded counterparts and typically have only one core. In wireless embedded systems, power constraints related to battery size or wireless power transfer may further limit computation speed. In a closed-loop application, the required rate of feedback delivered to the system sets a real-time processing deadline. In the interval between subsequent instances of feedback, the CPU must perform any communication tasks, integrate the model equations forward in time, and evaluate the UKF equations, which include matrix operations. The feedback signal may depend upon the predicted system state at some forecast horizon time beyond the next scheduled feedback. This forecasting extends the time required to integrate the model equations. In short, the limitations of the embedded platform and computational demands of the UKF can combine to render the meeting of the real-time requirement impossible for even a relatively small state-space dimension.
Several efforts have been made to deploy linear and unscented KFs in embedded systems. In Soh and Wu (2012), a UKF for nanosatellite attitude determination was implemented on a field-programmable gate array (FPGA). The same authors also designed an FPGA-microcontroller (MCU) system that included an FPGA block that performed the generic UKF computations and a flexible software implementation of the model-specific computations Soh and Wu (2014, 2017). In Fico, Arribas, Soaje, Prats, Utrera, Vazquez and Casquet (2015), the authors implemented a UKF for a navigation system on an MCU and minimized execution time by utilizing processor-specific matrix libraries, replacing calls to functions with in-line code, changing constant divisions to multiplications by the constants’ inverses, and testing different MCUs. A systematic approach for reducing computation time in embedded systems utilizing the KF, UKF, and the extended KF was outlined in Valade, Acco, Grabolosa and Fourniols (2017). In Zhu, Jiang, Chen, Hu and Wang (2011), a KF was implemented on an FPGA for decoding rat paw movement from neural spike data. While each of these works presents a sufficient system for real-time usage and a subset of them reduce computation time by exploring the design space, none of them provide a comprehensive consideration of potential optimization parameters for reducing a UKF system for embedded deployment or a method for actual optimization through minimization of cost functions.
Here, we present a method for optimizing the design of a UKF system for deployment on a given MCU or soft processor in FPGA fabric. The method is generalizable to any model. The optimization searches over a large set of variants in both the formulation of the underlying model and the structure of the UKF itself. Furthermore, we apply a multi-objective cost function that accounts for the competing objectives of minimizing computation time and power consumption and minimizing error in UKF estimation and forecasting. To demonstrate the optimization method, we present a case study using data generated from a model of the sleep-wake regulatory network Diniz Behn and Booth (2010). We leave iteration of the method across computing platforms to the reader.
In addition to offering a generalized approach to fitting a UKF on an embedded platform, we highlight three main insights from our work: 1) The magnitude of the model process noise can be leveraged to justify using simpler, less accurate integration schemes and model approximations. 2) The constraining influence of the UKF data assimilation step, or measurement correction, may permit the use of a computationally simpler UKF structure without significantly sacrificing accuracy in state estimation. 3) And the UKF optimization can be accelerated by thoughtfully ordering the optimization stages and separating the swept variables into two classes—model variables and UKF structure variables. By eliminating unsuitable combinations of model variables through forecast integration tests, the overall optimization space can be reduced before more time-intensive UKF state reconstruction tests are performed.
In the “Materials and methods” section, we review the UKF formalism to establish notation and draw attention to the aspects of the UKF that undergo variation in the optimization. We also introduce the Diniz Behn and Booth (DBB) model of the rat sleep-wake regulatory network, which we use for our case study. The target MCU platform and a proxy computation platform used to compute the bulk of the optimization data are also described. We open the “Results” section with a description of the variable space over which the optimization is run. We then detail the optimization method stage by stage and weave in discussion of the case study results in order to illustrate the method. In the “Conclusion” we summarize the method, highlight our insights, and propose future work.
2. Materials and methods
Our case study scenario for the UKF optimization method is a real-time in vivo sleep-wake regulation experiment in rats, in which feedback to the animal is determined by 10-second forecasts of future state. Our computational target is a system-on-chip/FPGA (SoC-FPGA) combining an MCU and FPGA fabric. We aim to deploy the UKF on the MCU only, and we reserve the FPGA fabric for signal processing of multiple input channels. Here, we review relevant computational details of the UKF, the DBB model of sleep-wake regulation that provides the kernel for the UKF, and details of the target and proxy computing platforms employed in the optimization.
2.1. UKF formalism
Sigma point filters are a family of adaptations of the linear Kalman filter for nonlinear systems. In sigma point filters, the distribution of state is represented through a set of deterministically chosen “sigma points.” Each sigma point is propagated forward in time through the nonlinear dynamics to the next scheduled measurement of data. The forward-iterated sigma points are then used to form a linearization of the dynamics through which the measured data are assimilated to the estimated distribution. We use the term “unscented Kalman filter” to refer to any of a number of sigma point filters that employ some version of the unscented transform for the selection of sigma points Julier and Uhlmann (1997); Julier, Uhlmann and Durrant-Whyte (2000); van der Merwe and Wan (2001); Julier and Uhlmann (2002); Julier (2003). We depict the UKF graphically in Fig 1A. The formalism presented here follows that in Simon (2006) and Sedigh-Sarvestani, Schiff and Gluckman (2012).
Figure 1:

UKF block diagram and Diniz Behn and Booth (DBB) model output. (A) The unscented Kalman filter consists of a repeating cycle of two phases: 1) prediction of state distribution, in which representative sigma points Xi are generated and propagated through the nonlinear model equations, and 2) correction, in which observed data y is assimilated to the estimated distribution. The sigma points can be integrated beyond the next scheduled observation () to produce a forecast of state distribution. (B) Example DBB model states, including mean population firing rates for the wake-active LC, NREM-active VLPO, and REM-active and wake-REM-active subpopulations of the LDT/PPT. Normalized homeostatic sleep pressure h, normalized noisy input δ, and derived hypnogram are shown in the bottom two panels.
The discrete-time equation
| (1) |
models the dynamics of an n-dimensional system, where uk is system input, tk is the discretized time at step k, and vk is the process noise. The measurement of an m-dimensional variable y is described by the linear or nonlinear observation function
| (2) |
where wk is the observation noise. The noise terms vk and wk are uncorrelated, zero-mean Gaussian vectors with covariance matrices Qk (n×n) and Rk (m×m), respectively. The process noise accounts for aspects of the true system that the model f(·) fails to capture, and the observation noise accounts for uncertainty in the measurement.
Like other Kalman filters, The UKF consists of alternating steps of prediction and correction to produce an estimate of state distribution with mean and covariance . At the start of every prediction-correction cycle, a set of sigma points Xi, i = 0, …, N, is chosen to represent the estimated state distribution by satisfying the constraints
| (3) |
and
| (4) |
where Wi are scalar weights. In UKFs, the computation of the sigma points Xi,k requires a square root of the matrix . In the prediction step, the sigma points are propagated through the nonlinear dynamics in Eq (1). The predicted mean and predicted covariance are found from the forward-iterated sigma points:
| (5) |
| (6) |
where , and the tilde denotes predicted data that have not yet undergone the correction step.
Next, the forward-iterated sigma points are transformed into the observation space using Eq (2): .1 The ’s are used to find the predicted mean and covariance transformed into the observation space:
| (7) |
| (8) |
In the correction step, the Kalman gain is found:
| (9) |
where the cross-covariance is defined as
| (10) |
The observed data yk+1 are then used to find the corrected mean,
| (11) |
and the corrected covariance is found from the Kalman gain and the observation covariance or the cross-covariance :
| (12) |
| (13) |
The hat indicates terms that have undergone the correction step. The corrected mean and covariance are then used to produce the sigma points for the next cycle, .
Numerical integration is used to advance the sigma points from tk to tk+1 using the dynamics in Eq (1). However, the integrator can also be used to forecast the state well beyond the next anticipated observation time by integrating the sigma points to a given forecast horizon. Therefore, although the integrator is not an explicit part of the UKF equations, its implementation impacts estimation and forecast accuracy as well as computation time. In order to avoid confusion, we refer to the computation of and as prediction and to the integration of the state equations beyond the next anticipated observation as forecasting.
2.2. The Diniz Behn and Booth model
As a case study, we ran our optimization procedure on a UKF system for forecasting neural activity in a behaving rat. We used Diniz Behn and Booth’s model of the sleep-wake regulatory system Diniz Behn and Booth (2010) as the model in Eq (1). In the Diniz Behn and Booth (DBB) model, whose twelve states are listed in Table 1, the relative firing rates of five neural populations Fz indicate the animal’s state as wakefulness (W), rapid-eye-movement (REM) sleep, or non-REM (NREM) sleep. The five populations are the locus coeruleus (LC), dorsal raphe (DR), ventrolateral preoptic nucleus (VLPO), a REM-promoting subpopulation from the laterodorsal tegmentum (LDT) and pedunculopontine tegmentum (PPT), and a wake/REM-promoting subpopulation from LDT and PPT. Interactions between the neural populations are mediated by the normalized concentrations of the neurotransmitters Ci emitted by each of the five populations. The two other model states are the homeostatic sleep pressure variable h and the random, wake-inducing input to the LC and DR, called δ. Example outputs for select DBB model states are shown in Fig 1B.
Table 1.
States in the Diniz Behn and Booth model
| No. | State | Abbrev. |
|---|---|---|
| Population average firing rates | ||
| 1 | Locus coeruleus (LC) (wake active) | F LC |
| 2 | Dorsal raphe (DR) (wake active) | F DR |
| 3 | Ventrolateral preoptic nucleus (VLPO) (NREM active) | F V LPO |
| 4 | REM promoting subpopulations in the laterodorsal and pedunculopontine tegmenta (R) | F R |
| 5 | Wake/REM promoting subpopulations in the laterodorsal and pendunculopontine tegmenta (WR) | F W R |
| Normalized neurotransmitter concentrations | ||
| 6 | Noradrenaline | C N |
| 7 | Serotonin | C S |
| 8 | γ-aminobutyric acid (GABA) | C G |
| 9 | Acetylcholine from the R subpopulation | C AR |
| 10 | Acetylcholine from the WR subpopulation | C AW R |
| h and δ | ||
| 11 | Homeostatic sleep drive | h |
| 12 | Noisy input to the LC and DR | δ |
The five average firing rates are governed by the equations
| (14) |
| (15) |
| (16) |
where the subscript z ∈ {LC, DR, V LPO, R, W R} denotes one of the five neural populations; Fz,∞ is the steady-state firing rate for population z; cz is the summed excitatory and inhibitory neurotransmitter input to population z; and τz governs the rate of decay from Fz to Fz,∞. The tanh function in Eq (15) serves to as a smooth step function between near-zero and maximum firing rates, and the parameters βz and αz govern the midpoint and steepness of the step, respectively. The constants gi,z define the degree of excitatory or inhibitory effect of neurotransmitter i on population z. The gi,z parameters should not be confused with the observation function g(xk) in Eq (2).
Each model population emits a specific neurotransmitter. LC emits noradrenaline, DR emits serotonin, VLPO emits GABA, and the subpopulations from LDT/PPT both emit acetylcholine. The equations for the Ci’s have roughly the same form as the Fz’s, but take as input only the firing rate of a single population:
| (17) |
| (18) |
The result of Eq (18) is multiplied by a noise factor drawn from a Poisson process in order to model the variability of neurotransmitter output in relation to the population firing rate.
The homeostatic sleep drive h increases while the model animal is awake and decreases while the animal is asleep. Wake/sleep state is determined by summing the firing rates of the wake-active populations FLC and FDR and comparing the sum with the wake threshold parameter :
| (19) |
The variable h enters the formula for through the parameter βV LPO(h) = −kh.
Finally, noisy input to LC and DR is modeled as a train of randomly occuring impulses δin with varying amplitude and a decay rate governed by the time constant τδ:
| (20) |
We point out two qualities of the DBB model that impact computation in real-time embedded systems and are also common to other biological models. One, the tanh function appears multiple times in the model equations. Sigmoid functions such as tanh are commonly used in neural mass models as transfer functions from population input to population output, as well as in other biological models in which a state transitions smoothly from a minimum “off” condition to a maximum “on” condition. Such sigmoid functions can account for a large portion of the operations in an embedded model integrator, and substituting an approximation for the sigmoid can yield significant reduction in computation time. Two, random biological noise relaxes the required level of fidelity of the numerical integrator to the ideal model equations. In application, the noisy input to the system δ(t) is practically unknown. Therefore, when the model equations are integrated, the input is set to zero, and the model noise Q is increased accordingly to account for the uncertainty of input. Although a larger Q further spreads the estimated distribution of state, it permits the use of a computationally simpler integrator. As long as the deviation of a simplified integrator from a “best available” integrator is small with respect to the noise Q, the simplified integrator can be safely substituted for one with higher fidelity to the ideal model without significantly impacting the forecast accuracy.
2.3. Computation platforms
The amount of time required to process all of the optimization data for the tested variants on the target MCU is prohibitive, due to the large quantity of test data and the limited computing speed of the MCU. For this reason, we used a desktop PC as a proxy platform for computing the larger volumes of optimization data, and we reserved the target MCU platform for only measuring the time required to compute one cycle of each of the tested UKF variants. The target platform for the case study was a 32-bit Arm Cortex-M3 processor contained in a SmartFusion2 SoC-FPGA (M2S010, Microsemi, Aliso Viejo, CA, USA) running at 100 MHz. The Cortex-M3 does not include a floating point unit. MATLAB (The MathWorks, Natick, MA, USA) running on a desktop computer with a 64-bit Intel Core i7-6700 CPU (3.4 GHz) served as the proxy computation platform. All UKF variants under test were programmed in MATLAB, and these proxy variants were used to compute the data for the stages of the optimization in which forecast divergence and state reconstruction accuracy were tested.
3. Results
We present here a method for the design of an embedded UKF system as a true multi-objective optimization process—in the sense of minimization of a cost function—that balances sufficiently fast computation time and low power on the one hand and sufficient fidelity of state forecasting and estimation on the other hand. Here, fidelity refers to how closely a computed test output matches a target benchmark output, which may or may not closely approximate the true state of the system. The goal of the optimization is to find an alternative embedded computation scheme that has high fidelity relative to the more expensive benchmark scheme. The error between the benchmark and the true system state is taken as a reference point for assessing the fidelity of the alternative.
The optimization is divided into three test stages, which are mapped in Fig 2: 1) a computation time test over all variants, 2) a forecast test over remaining model variants only, and 3) a state reconstruction test over all remaining variants. Each of the stages produces a scalar result for each tested variant, which is compared to a threshold value. Combinations of variables producing a result below the threshold pass to the next stage, while those exceeding the threshold are excluded from further testing. Variants that pass the third stage are assessed with a cost function that combines the scalar results of all three stages. The benchmark variant is a version of the UKF that has been developed on a desktop PC to yield excellent state reconstruction fidelity with respect to the true system, but which has been developed without being constrained by the computational limitations of the target MCU. Performance of all other computational variants are compared to this benchmark.
Figure 2:

The optimization comprises three main test stages: computation time test, forecast test, and state reconstruction test. Variants passing all three stages are assessed with a cost function. Only the computation time test is computed on the target MCU.
3.1. Optimization variable space
We divide UKF design variables into three broad classes—fixed variables, model variables, and UKF structure variables—and identify the specific values we allowed them to take in our case study. We use the term “variable” loosely to refer to both quantifiable and procedural options in UKF system design. We separate variables that impact the integration of the model equations into the “model variables” class, which can be optimized independently from of the variables that govern the generation of the sigma points. This allows us to remove groups of variants from later optimization tests, thus accelerating the process. The classes of variables and the values we allowed them to take in our optimization are summarized in Table 2, with the benchmark variable values in boldface type.
Table 2.
UKF system variable classes and values used in case study
| Abbrev. | Variable | Values |
|---|---|---|
| Fixed variables | ||
| m | Number of observables | 2 |
| t obs | Observation time interval | 2 s |
| t fh | Forecast horizon time | 10 s |
| Model variables | ||
| F | Function approximation | tanh, LP7, LP3 |
| M int | Integration method | RK4, tpz., Eul. |
| t int | Integration time step | 0.25, 0.5, 1, 2 s |
| p | Numerical precision | double (d), single (s) |
| UKF structure variables | ||
| Nsig | Number of sigma points | 2n (24), n + 1 (13) |
| M sr | Square root method | Schur, Chol., SR-UKF |
| - | Sigma point regeneration | Off |
| - | Matrix inversion | 2 × 2 closed form |
| - | Sigma point weights | Wi = 1/Nsig, for all i |
| - | Approach to constraints on states | Enforced on |
Variables with names in italics were held to a single value in the optimization. Variables in normal type were varied. Benchmark variable values are in boldface type.
3.1.1. Fixed variables
Fixed variables are those that are largely predetermined by higher-level system considerations and over which the system designer has little or no control. They therefore remain fixed during the optimization. A given system may impose constraints on different variables compared to another system, so the set of fixed variables listed here should be taken only as an example.
The number of observables, m, is the dimension of the vector yk in Eq (2). In many cases, which or how many observables to use is determined by the system requirements. In other cases, the cost-benefit tradeoff for including an observable should be considered. In general, more observables means more information, which theoretically leads to more accurate state estimation. However, due to the matrix operations involved, the computation time of the UKF grows super-linearly with m. A further consideration, which we will not discuss here, is the cost on the subject of additional measurements—for example, additional electrodes placed in brain result in additional damage Bahari et al. (2017). Additionally, the observability of the estimated states relative to each observable should also be taken into account; the accuracy of UKF estimation of one or more states may depend more or less heavily upon a particular observable or combination of observables Sedigh-Sarvestani et al. (2012). When two or more observables are highly correlated, at least one could be considered for exclusion if the computation time requirements are tight. For our case study, we used two observables (m = 2), FDR and FR, since the dynamics are poorly reconstructed with only one measured firing rate Graybill, Gluckman and Kiani (2019), but simultaneously accessing additional DBB model populations in a live animal without damaging critical brain structures is difficult Billard, Bahari, Kimbugwe, Alloway and Bruce (2018).
The observation time interval, tobs, is the time between successive observations in real time. Although the UKF allows for the value of tobs to vary, we assume it to be constant. On the one hand, tobs must be small with respect to the rate of divergence of the sigma points from each other. More specifically, within the space defined by the forward iterates of the sigma points, the flow of the dynamics must be sufficiently linearizable. If the dynamics bifurcate between observations such that the mean of the sigma points is no longer close to the expressed dynamics, the state estimate can fail catastrophically. On the other hand, an entire cycle of the UKF-forecasting scheme must compute between observations; therefore, tobs must be chosen large enough so that at least the fastest variants can be computed on the target platform. In our case study, we assimilated model-generated observations at the interval tobs = 2 s.
The forecast horizon time, tfh, is how far into the future beyond the next observation the state forecast is made in computed model time. In other words, for every cycle of the UKF-forecasting scheme, the sigma points will be integrated forward by tobs for the prediction step and an additional tfh for the forecast. We set tfh = 10 s because our application involves anticipating changes in state-of-vigilance with a lead time of 5 to 10 s.
3.1.2. Model variables
Model variables are those that impact the fidelity with which model integration approximates the dynamics of the idealized model in Eq (1), which in turn tracks the dynamics of the real system with some finite error. Function approximations can replace elementary functions inside the model. For example, the tanh function, which appears frequently in neural mass models and can take a relatively long time to compute in MCUs, may be replaced with a linear-piecewise approximation. Taylor series may also be used to approximate functions. In our case study, in addition to the default tanh function from the MCU software library, we considered two linear-piecewise (LP) replacements for the tanh function: a seven-piece approximation of tanh (LP7) and a family of ten three-piece functions (LP3), in which the slope of each three-piece function was individually tuned for the model state in which it was used. The tanh function was used for the benchmark variant.
Various integration methods, Mint, and integration time steps, tint, can be tested to find a balance between fidelity to the model and computational cost. A fourth-order Runge-Kutta (RK4) method, for example, will offer higher fidelity to the model ODEs than the simpler trapezoidal or Euler methods but will take longer to compute. (See pp. 29-33 in Simon (2006).) Similarly, a smaller tint will provide higher fidelity than a larger tint but with increased computation time. We used the RK4, trapezoidal (tpz.), and Euler (Eul.) integration methods, each for tint = 0.25, 0.5, 1, and 2 s. The RK4 integrator and tint = 0.25 s were used for the benchmark variant.
The numerical precision used—i.e. single- or double-precision floating point or various fixed-point word lengths—can greatly impact computation time on a processor that lacks a floating-point unit (FPU) or has a word size smaller than the chosen data type. MCUs without an FPU can typically perform floating point computations through software routines, but at an increased time cost. We tested both single-precision (32-bit) and double-precision (64-bit) floating point types. Double-precision was used for the benchmark variant.
3.1.3. UKF structure variables
We grouped remaining variables that impact any aspect of the UKF, aside from the model, under UKF structure variables. We focused specifically on the number of sigma points, Nsig, and the matrix square root method, Msr, applied to .
The standard formulation of the UKF Julier and Uhlmann (1997) uses a symmetrical set of 2n + 1 sigma points, including the mean , where n is the dimension of x. The mean can be excluded from the set to reduce the number of sigma points to 2n. The spherical-simplex (SS) unscented transform uses a reduced set of n + 2 or n + 1 sigma points, again depending on whether the mean is included Julier (2003). In our case study, we considered both the standard set of 2n sigma points and the SS set of n + 1 sigma points. We used the standard set for the benchmark.
The formation of sigma points requires a matrix square root S of the positive-semidefinite covariance matrix , such that . A variety of methods for computing S, which in general is not unique, have been applied to the UKF. A common choice for S is the lower-triangular factor L produced by Cholesky decomposition (Chol.), which has a straightforward closed-form solution. The square root based on the Schur decomposition yields the principal square root, which is symmetrical for a positive semi-definite input matrix and produces sigma points that are generally more evenly distributed in state space than those produced by the Cholesky factor. However, the Schur square root is computationally more intensive than the Cholesky square root. More importantly, the number of operations and therefore time required to compute the Schur square root can vary widely from iteration to iteration within the same code—varying up to 10% in our experience—depending on the condition of the input matrix. This can pose serious problems in a real-time system, and the range of computation times should be checked against the time safety margin to ensure computation deadlines are consistently met. Alternate matrix square root methods for UKFs, not discussed here, are considered in Rhudy, Gu, Gross and Napolitano (2011) and Straka, Duník, šimandl and Havlík (2013), including iterative methods and methods based on singular-value decomposition.
The square-root UKF (SR-UKF) van der Merwe and Wan (2001) is theoretically equivalent to employing the Cholesky square root, but it avoids repeatedly computing the Cholesky decomposition of every cycle by tracking the Cholesky factor L instead. and , the Cholesky square roots of and , respectively, are obtained through QR decomposition followed by a rank-one Cholesky update, if is included as a sigma point. is used to form the sigma points, and and its transpose are used in back-substitutions to solve for the Kalman gain K, which obviates taking the inverse of . The corrected Cholesky factor is found through repeated rank-one Cholesky updates of . The SR-UKF may provide reduced computation time and better numerical stability compared to the traditional Cholesky method, although the output values of may differ slightly due to the alternative computational approach. We tested the Schur and Cholesky square root methods along with a slightly modified version of the SR-UKF, in which we inverted directly rather than using repeated back-substitution to compute K, since the dimension of was only 2 in our system. We used the Schur square root for the benchmark case.
A number of other UKF structure variables that we held fixed in our optimization deserve mention:
The sigma points may be regenerated after the prediction step and before transformation into the observation space Simon (2006). The impact of this step on state estimation accuracy may depend on the degree of nonlinearity in the observation function Eq (2). We bypassed sigma point regeneration after the prediction step.
Different strategies for handling the inversion of in Eq (9) will have consequences for both computation time and numerical stability, particularly when m is large. For a linear observation function, the UKF correction step equations may be replaced with the classic KF correction step equations. This allows for the use of measurement decorrelation and sequential scalar updating (first published in Kaminski, Bryson and Schmidt (1971) and reviewed in Grewal and Andrews (2015) and Simon (2006)), which replaces the inversion of with a series of scalar divisions. Sequential updating can greatly decrease computation time for large-dimension measurements, as in Kulikova, Lima and Kulikov (2022). We employed the standard UKF correction step equations, and because the dimension of our observed state is small (m = 2), we used an explicit solution for a 2 × 2 matrix inversion.
The sigma point weights Wi can be adjusted not only to include or exclude from among the sigma points but also to incorporate prior knowledge of the higher-order moments of the state distribution Julier and Uhlmann (1997); van der Merwe and Wan (2001). Methods for tuning the weights are described in Sakai and Kuroda (2010); Duník, šimandl and Straka (2012); Straka, Duník and šimandl (2014); Scardua and da Cruz (2017); Turner and Rasmussen (2012). It should be noted that the use of negative sigma point weights may lead either to the loss of the positivity of which could cause the matrix square root operation to fail, or to a Cholesky factor that fails a rank-one update. New square-root UKF methods have recently been proposed in Kulikov and Kulikova (2021, 2022) that are robust to negative sigma point weights for any sigma point index. We used all positive sigma point weights equal to the reciprocal of the total number of sigma points.
The noise covariance matrices Q and R may also undergo tuning, as addressed in Scardua and da Cruz (2017); Sakai and Kuroda (2010); Duník, Straka, Kost and Havlík (2017). We held Q and R constant and inflated the element in Q corresponding to the uncertainty of the unknown input δ.
Finally, it is useful to check if generated sigma points fall within the allowed state space of the model dynamics and, presumably, the real system. For example, real neuronal systems do not have negative firing rates. Methods have been developed for dealing with constraints on state values in both linear and nonlinear Kalman filters Vachhani, Narasimhan and Rengaswamy (2006); Simon (2006); Narasimhan and Rengaswamy (2009); Kolås, Foss and Schei (2009); Simon (2010); Teixeira, Tôrres, Aguirre and Bernstein (2010). Although the states in the DBB model are physically meaningful only for non-negative values, the state estimation accuracy of our system was not found to be significantly impacted by small excursions of the sigma points into negative territory. From the model equation side, this can be understood because steady-state firing rates in Eq (15) are formulated to be positive definite despite potential negative neurotransmitter concentrations, and the steady-state neurotransmitter concentrations in Eq (18) produce positive outputs from positive firing rate inputs. Therefore the only accommodation we made for the non-negative state constraints was to enforce them on .
3.2. Computation time test
In the computation time test, we identify whether each variant of the UKF prediction-correction-forecast scheme can be computed in real-time. For of each of the 432 combinations of model and UKF structure variables represented in the rows of Table 2, we measured the computation time tcomp for one UKF cycle on the target MCU. In Fig 3 we depict the sequential computational tasks comprising tcomp, beginning with the moment the observed data yk become available: the observed data are used to correct the predicted mean and covariance from the previous cycle; the new set of sigma points are generated; the sigma points are propagated forward by tobs + t fh in model time; and the predicted mean and covariance are computed for the next cycle. Note that the computation time test assumes that the specific target processor and values of tobs and t fh are fixed.
Figure 3:

UKF system computational tasks in time relative to tobs. One cycle of the UKF must compute in the time between successive observations, tobs, while also allowing for overhead tasks, such as feedback to the biological system and communication with a base station. The computation time tcomp refers to the time required to compute the UKF matrix operations and generate and integrate the sigma points. Block widths are not to scale.
For each variant, tcomp was obtained by computing two cycles of the UKF prediction-correction-forecast scheme on the target MCU and capturing the number of clock periods required to compute the second UKF cycle. Because tcomp is relatively invariant to the location of in state space, the timing test only requires computation of two cycles of the UKF, rather than hours of data in model time. This enables collection of tcomp data directly from the MCU, which eliminates the need for clock-cycle-accurate MCU emulation software for the PC. The total time required to obtain tcomp for all 432 variants was 17 min. The C codes used to obtain the values of tcomp on the Cortex-M3 were translated from MATLAB using MATLAB Coder.
The second and subsequent cycles of the UKF compute more slowly than the first cycle. This difference is due in part to the increased time spent in the square root operation on the non-diagonal matrix , k > 0, compared to the faster square root performed on the diagonal initial-condition matrix , in the first cycle. It should be noted that some variability in tcomp is expected for matrix functions containing operations that are conditional on the qualities of the input matrix, like the Schur square root and QR decomposition. For our case study, one value of tcomp is sufficient, since the variation in time to compute these functions is very small relative to our time safety margin. For a different system, further probing of the range of computation times may be necessary.
In the optimization, measured values of tcomp for each variant are compared to a maximum threshold value θt < tobs. Variants with tcomp > θt fail the computation time test. The threshold θt is constrained from above by not only tobs but also by the time required for overhead tasks (toh) and a safety margin (tmarg). Thus, θt ≤ tobs – toh – tmarg. Cases may arise in which feedback to the subject resulting from the forecast must be initiated before the end of the tobs interval. In such cases, it may be necessary to also measure the computation time from the moment of observation to the moment when the forecast is known (yk to in Fig 3). For this study, we chose θt = 1.5 s, which allows 0.5 s for toh and tmarg, combined, and also satisfies our feedback latency requirement.
It should be emphasized that tcomp ≤ θt is merely a necessary condition. In general, a smaller computation time is preferable for at least three reasons: smaller computation time implies 1) lower power and 2) lower feedback latency, and 3) a larger margin between tcomp and θt allows for greater flexibility in modifying the system design at a later time without violating timing constraints.
The impact of the model and UKF variables on tcomp will be more or less significant depending on where the variant sits in the optimization variable space. The computation time may be approximated as
| (21) |
where tprec is the average time for an operation of a given precision, Nsig is the number of sigma points used in a given UKF, Niter is the number of model equation iterations used in a given integration method, Nmodel is the number of operations in the model equations, Nother,sig is the number of other operations repeated for each sigma point, Nsr is the number of operations in the matrix square root, and Nother is the number of operations not captured elsewhere. For example, a change in tcomp due to the matrix square root variant will only manifest when the Nsig term is on the same order as Nsr or smaller.
The high-level trends implied by Eq 21 are reflected in Fig 4A, where we show computation time test results for the benchmark UKF system and all variants that differ from the benchmark by only one variable. The change from double to single precision reduces tcomp by a factor of about two. The dominance of the Nmodel term in this region of optimization variable space is evidenced by 1) the halving of tcomp for each doubling of tint; 2) the proportionality of tcomp to Niter, where Niter,RK4 = 4, Niter,tpz = 2, and Niter,Eul = 1; and 3) the proportionality of tcomp to Nsig, where Nsig = 24 for the benchmark and Nsig = 13 for the SS variant. The significant time cost of evaluating the tanh function in the model equations is shown by the roughly 60% reduction in tcomp when the linear piecewise approximations LP7 and LP3 are used.
Figure 4:

Computation time test results and platform validation. (A) Values of tcomp for the benchmark variant (tcomp,bm, dotted line) and all variants that are one variable step away from the benchmark. Labels indicate the differing variables. (B) Histogram of values for platform validation. (C) tcomp is shown for all variants. (Left) tcomp color coded. The benchmark variant (*) and other variants (o) for which tcomp is plotted in (A) are marked. (Right) Same data binary thresholded for tcomp ≤ θt. Abbreviations: seven-piece linear (LP7) and three-piece linear (LP3) approximations of tanh; 4th-order Runge-Kutta (RK4), trapezoidal (tpz), and Euler (Eul) integration methods; Schur (Sch) and Cholesky (Cho) square root methods; square-root UKF (SRU); standard (24) and spherical-simplex (13) sigma point methods.
Although none of the variants in Fig 4A compute in less than θt = 1.5 s, many variants—about three quarters of those tested—passed under the threshold, with the fastest variant computing in 0.105 s, compared to 11.94 s for the benchmark variant. The values of tcomp for all variants are represented on the left side of Fig 4C, with the passing variants having tcomp < θt shaded in the grid on the right. For variants in which the Nsig term has been substantially decreased, the impact of the faster square root methods on tcomp become apparent. For example, compare the similarity of tcomp between the Schur and Cholesky methods in the top row of Fig 4C (left) to the noticeable difference between the Schur and Cholesky methods in the bottom row.
3.3. Forecast test
In the forecast test we identify whether the computed forecast trajectories—uncorrected by observed data—for a given combination of model variables follow benchmark system trajectories with sufficient fidelity.
The quantitative criterion used in the forecast test is rooted in the probabilistic description of system state adopted in KFs. The distribution of state is centered at with “width” . Predictions and forecasts are made by evolving this distribution through the model equations. As evolves forward in time, it gradually expands due to the process noise Q, which encapsulates the unmodeled dynamics. Our objective is to identify alternate methods of integrating the model equations—i.e. using simpler integrators, larger time steps, faster precision, and model approximations—that maintain sufficient fidelity to the benchmark integration scheme relative to Q. We define fidelity through a measure of how far trajectories integrated with these variant methods diverge from trajectories computed with the benchmark integrator over the anticipated forecast time tfh. The variant and benchmark trajectories are computed with zero process noise; thus the divergence between them is due to the computational simplifications made in the variants. To assess this variant-to-benchmark forecast divergence error as it evolves over time, we compare it with the process noise Q, which we obtain by computing the same divergence error metric between benchmark-integrated trajectories without noise and “true” noisy data ().
The data for both the forecast and state reconstruction tests were computed in MATLAB running on a desktop PC, rather than on the target MCU, due to the large volume of data. Because the two platforms use different programming languages, compilers, processors, and instruction sets, an inter-platform error is expected between the output from the Cortex-M3 for the jth UKF variant and output from MATLAB running on the PC for the same variant. In order for the proxy PC/MATLAB data for the jth UKF variant to be valid in assessing its performance on the Cortex-M3 relative to other variants, should be small compared to the inter-variant errors , which are the differences between the output of the jth and kth UKF variants, k ≠ j, computed on the Cortex-M3.
We expect that for some pairs of variants will be close to zero. However, the goal of the optimization stages is to identify and remove outlying variants that fail in significant ways. Therefore for the jth variant need only be smaller than the median of its values. Furthermore, if methods j and k have near-zero on the target platform, their numerical performance is equivalent for our purposes and they need not be distinguished from each other.
To validate the PC/MATLAB as a substitute computational platform for the Cortex-M3, we computed two cycles of the UKF—the first with a diagonal initial-condition matrix, and the subsequent iteration with a non-diagonal matrix—for every tested variant both on the Cortex-M3 (target) and on the PC with MATLAB (proxy). We retained the state estimation outputs from the second cycle and computed the inter-platform error
| (22) |
and inter-variant error
| (23) |
for all j and all k ≠ j, where and are the ith entry of the output vector for the jth variant computed on the target platform and proxy platform, respectively.
For variant j, we define the empirical cumulative distribution function (ECDF) of its inter-variant error over all k ≠ j as , which is bounded on the interval [0, 1]. We require that a variant’s inter-platform error be less than the median of its inter-variant error, or . This condition was met for all of the tested variants. In fact, for all variants was less than 0.025, which indicates that MATLAB running on the PC performs sufficiently well as a proxy for UKF computations on the Cortex-M3—in both double and single precision—for this case study. A histogram of the values for all j is shown in Fig 4B.
For the forecast test, a large set of synthetic “true” noisy data (5000 min) was generated in MATLAB using the benchmark model variables: the tanh function, an RK4 integrator with tint = 0.25 s, and double precision. This true data included the random input δ and the neurotransmitter noise. In this simulated context, δ and the neurotransmitter noise comprise the process noise. A set of 10 000 initial conditions were randomly selected from the true data, and each starting point was integrated forward for tobs + tfh = 12 s without process noise by the benchmark integrator and the integrator corresponding to each combination of model variables that passed the computation time test.
For each of the 10 000 test windows, the normalized forecast error for each variant v was found between the end point of the 12-s-long variant-integrated trajectory and the end point of the 12-s-long benchmark-integrated trajectory :
| (24) |
where σi is the standard deviation for model state i, and ℰ is the set of model states critical to the application (here, ℰ = {DR, VLPO, R}). This yielded a distribution of 10 000 values of for each tested model variant. The normalized forecast error between the noiseless benchmark-integrated trajectories and the true noisy data 12 s after each initial condition, , is similarly defined:
| (25) |
The distribution of over the 10 000 test windows was taken as the process noise Q, against which the variants were assessed.
Each combination of the model variables in Table 2 corresponds to a set of six variants of the UKF structure variables: two options for the number of sigma points (Nsig) multiplied by three options for the square root method (Msr). The forecast test was run for the combinations of model variables (F, Mint, tint, and p) that had at least one out of the six corresponding UKF structure variants pass the computation time test.
In order to assess the fidelity of a tested variant to the benchmark, we compare the ECDF of , defined as Pf,v(Ef), for each variant under test to the ECDF of , defined as Pf,bm(Ef); in other words, we compare the “width” of the spread due to the simplified variant to the “width” of the spread due to Q, both at the forecast time horizon. In the context of the UKF, the state distribution iterated forward in time produces variances that are ideally accommodated by the UKF correction step. However, prediction-correction schemes perform poorly with large-outlier predictions that place the estimated state, for example, across trajectory bifurcations. Therefore in order to focus the forecast test metric on these problematic large-outlier cases, we compare the ECDFs at the 95th percentile using a right-side tail comparison metric Tf:
| (26) |
where and are the 95th percentile values of and , respectively. We set the maximum threshold of the tail metric Tf for the forecast test, θf, to 1. Model variants with are considered to have forecasting fidelity to the benchmark integrator comparable to or better than the fidelity of the noiseless benchmark integrator to the true noisy system. Variants with fail the forecast test. Note that the failure of a single model variant eliminates up to six UKF structure variants from the following test.
Sixty of the seventy-two model variants passed the timing test and were assessed in the forecast test. The ECDFs Pf,v for selected model variants are shown in Fig 5A along with . Box charts of the same Ef data are shown in Fig 5B along with the corresponding values of .In Fig 5C, values of Tf for all variants are represented in the left half, and variants that passed under the threshold θf = 1 are shaded in blue in the right half. The red box highlights the variable space corresponding to the data plotted in Figs 5A and 5B.
Figure 5:

Forecast test results. (A) The benchmark variant ECDF Pf,bm is plotted with Pf,v for the model variants that use single precision, tint = 1 s, and trapezoidal or Euler integration. The neighborhood where the benchmark and four of the other ECDFs enter the 95th percentile, marked by the horizontal dotted line, is enlarged in the inset. (B) Box charts of Ef and values of Tf for the same variants. (C) Tf is shown for all variants that passed the timing test. (Left) Tf color coded. (Right) Binary thresholded for Tf ≤ θf. The red box highlights the model variants for which data are presented in (A) and (B).
Replacing the tanh function by the LP3 caused the forecast to fail with a magnitude of error generally much larger than the magnitude of the process noise. Integration method and integration time step played smaller roles in decreasing fidelity of the variants to the benchmark. Use of single precision in place of double precision had a negligible impact on the fidelity.
The variable space in Figs 4C and 5C is laid out such that tcomp generally decreases from top to bottom and from left to right. The computation time test (Fig 4C) eliminated more time-intensive variants from the upper and left regions of the double and single portions of the variable space. The forecast test (Fig 5C) eliminated less time intensive variants from the lower and right regions of the double and single portions of the variable space.
Some unexpected results highlight a point of caution. In Fig 5C (left) the apparent improvement in results for the Euler integration of the LP3 variants at tint = 2 s over the RK4 and trapezoidal methods is misleading. Analysis revealed oscillations due to instability for those variants at certain locations in state space. The oscillations pushed the state forecast away from and then back toward the initial-condition region of state space every two integration time steps. Thus the even number of integration time steps over a short total integration time resulted in the error unexpectedly decreasing compared to stable integrations involving the RK4 and trapezoidal methods, under those conditions. This example underscores the importance of data visualization across the entire variable space in order to identify potentially misleading results.
The forecast test eliminated sets of six variants sharing the same model variables. In the state reconstruction test, the impact of the UKF structure variables on state reconstruction is examined.
3.4. State reconstruction test
In the state reconstruction test, we identify if the mean state estimate produced by a given UKF variant tracks the benchmark UKF with sufficient fidelity. Here “sufficient fidelity” is defined by the distribution of error between the benchmark UKF state estimate and the true state, against which we assess the error between the output of each variant and the benchmark UKF.
We generated 1000 one-hour-long segments of “true” noisy data using the benchmark model integrator and including the random input δ and the neurotransmitter noise, as with the true data for the divergence test. Synthetic noisy observation data were generated for the observed variables (F DR and F R) by adding Gaussian noise to the true data. Negative observation values were corrected to zero. We then seeded the benchmark UKF and each surviving UKF variant with the true initial conditions and used them to reconstruct state estimates using the noisy observation data for the 1000 segments. These tests were computed in MATLAB.
In a manner similar to the forecast test, the error between each of the tested variants and the benchmark UKF was assessed relative to the error between the benchmark UKF and the true data. For each tested variant, we computed the normalized mean-squared error between the state estimates reconstructed by the variant UKF and the benchmark UKF,
| (27) |
for each of the 1000 test windows, with the mean taken over the hour-long window. This yielded a distribution of error for each variant. We used an analogous formula to compute the normalized mean-squared error between the benchmark UKF data and the true data, .
Whereas in the forecast test the process noise Q provides the leeway for simplifying the model integration, the constraining influence of the UKF correction step provides the leeway for simplifying the UKF structure in the state reconstruction test. Once every tobs, the estimated distribution of state is pulled toward the observed data using the Kalman gain in Eq (11). As long as the predicted cluster of sigma points approximately encompasses the true mean, the correction step will compensate for the simplifications in both the UKF structure and the model integration. However, if the forward iterated sigma points diverge from the true state in one or more dimensions, the impact of the correction step may be insufficient to prevent the entire sigma point set from drifting into catastrophic error. Therefore we again focus the test metric on these problematic large-outlier cases by comparing the reconstruction error ECDFs at the 95th percentile:
| (28) |
Variants with Tr > θr = 1 fail the state reconstruction test.
We define the ECDF of as , which is plotted in Fig 6A for selected variants against the ECDF of which we define as . The corresponding box charts and Tr values are shown in Fig 6B. Values of Tr for all variants are represented in the left side of Fig 6C, while passing variants are shaded in blue on the right side. The red box in Fig 6C (right) highlights the variable space corresponding to the data plotted in Figs 6A and 6B.
Figure 6:

State reconstruction test results. (A) The benchmark variant ECDF Pr,bm is plotted with Pr,v for the model variants that use single precision, tint = 1 s, Euler integration, and the LP7 approximation. The horizontal dotted line marks the 95th percentile. Note that the ECDFs for the SR-UKF variants overlap those of the Cholesky square root variants. (B) Box charts of and values of Tr for the same variants. (C) Tr is shown for all variants that passed the forecast test. (Left) Tr color coded. (Right) Binary thresholded for Tr ≤ θr. The red box highlights the variants for which data are presented in (A) and (B).
In Fig 6 the similarity in performance between the Cholesky (Cho) and SR-UKF (SRU) square root methods is apparent. In fact the SR-UKF ECDFs entirely overlap those of the Cholesky square root method. It should not be assumed, however, that this will be the case for all systems. Changes in precision, state dimension, and the underlying model could each potentially draw out differences in the numerical approaches between these two methods.
For all square root methods, the standard set of 24 sigma points (Std) tends to outperform the reduced set of 13 sigma points (SS). However, within families of six variants having the same model variables, when 24 sigma points are used the Cholesky square root and SR-UKF methods tend to outperform the Schur square root method, and when 13 sigma points are used, the opposite is true.
We hypothesize that these trends are due to the coupling of nonlinearities in the model with the slight differences in sigma point arrangement produced by the different square root and sigma point generation methods. For a given distribution with mean and covariance all sigma point generation methods and matrix square root methods produce sigma points with the same mean and covariance. Furthermore, for all methods, sigma points lie on the same isoprobability. However, the “coverage” of state space differs among methods. For example, because the Cholesky square root is lower triangular, it systematically produces a set of sigma points with a larger span—relative to the variance—in states with lower dimension index than states with higher dimension index, whereas the symmetrical Schur square root tends to produce sigma points that are similarly distributed among all states. Also, regardless of the matrix square root method, the standard method produces sigma points that are arranged in pairs opposite from each other in state space, with the mean in between. In contrast, the spherical-simplex method produces a sigma point cloud lacking this form of symmetry.
The trapezoidal integration method for tint = 2 s fails in the state reconstruction tests. In the DBB model, trapezoidal integration with a large tint can lead to an unusual situation in which F R increases to its typical “on” firing rate while the acetylcholine emitted by the REM-active sub-population (C AR) remains low. This appears to be due to negative feed back that occurs under certain conditions in the C AR equation from the first to the second evaluation of the equation in the trapezoidal method. This unexpected result highlights the importance of sufficiently sampling the model state space to draw out such anomalies.
The insensitivity to double versus single precision for this model and selected UKF variable space is evident in the similarity between the top and bottom halves of Fig 6C. Precision may become relevant, however, for stiffer model equations or a higher-dimension of the matrix which undergoes inversion.
3.5. Cost function
A multi-objective cost function combining the results of the optimization stages was applied to the variants that passed all three tests. The cost function is a sum of three weighted terms, one each for computation time, forecast error, and state reconstruction error:
| (29) |
where a1, a2, and a3 are adjustable weights (set to a1 = a2 = a3 = 1 for this case study). Since Eq (29) is a weighted sum of individual cost functions to be minimized, the variant that minimizes Eq (29) is guaranteed to be Pareto-optimal. Squared terms are used to flatten the cost function near the minimum.
In most MCU-based applications, power is roughly proportional to tcomp. However, if FPGA fabric is used to realize part of the system design and power data are available for the tested variants, a fourth term my be added to Eq 29 that incorporates computation power Pcomp and a corresponding threshold θP. Power data can be used to eliminate variants in a separate test similar to the timing test. Pcomp was not separately considered in our MCU-based case study.
The ten variants that yielded the lowest values of the cost function Eq (29) are listed in Table 3, and cost function values for all UKF system variants that passed the state reconstruction test are indicated in Fig 7. The optimal variant with the lowest cost function value (p = s, tint = 1 s, Mint = Euler, F = tanh, Msr = Schur, N sig = 13) computes in 0.439 s, which is 27 times faster than the benchmark variant (see Fig 4A).
Table 3.
Variants with the lowest cost function values
| CF | tcomp (s) | T f | T r | p | tint (s) | M int | F | M sr | N sig | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.150 | 0.439 | 0.200 | 0.157 | s | 1.0 | Eul. | tanh | Schur | 13 |
| 2 | 0.186 | 0.455 | 0.200 | 0.232 | s | 1.0 | Eul. | tanh | SR-UKF | 24 |
| 3 | 0.187 | 0.458 | 0.200 | 0.232 | s | 1.0 | Eul | tanh | Chol. | 24 |
| 4 | 0.197 | 0.637 | 0.033 | 0.126 | s | 1.0 | tpz. | tanh | Schur | 13 |
| 5 | 0.206 | 0.612 | 0.200 | 0.012 | s | 1.0 | Eul. | tanh | Schur | 24 |
| 6 | 0.215 | 0.649 | 0.100 | 0.134 | s | 0.5 | Eul. | tanh | Schur | 13 |
| 7 | 0.229 | 0.647 | 0.041 | 0.203 | s | 2.0 | RK4 | tanh | Schur | 13 |
| 8 | 0.238 | 0.281 | 0.200 | 0.404 | s | 1.0 | Eul. | tanh | SR-UKF | 13 |
| 9 | 0.239 | 0.283 | 0.200 | 0.404 | s | 1.0 | Eul. | tanh | Chol. | 13 |
| 10 | 0.273 | 0.490 | 0.033 | 0.407 | s | 1.0 | tpz. | tanh | SR-UKF | 13 |
Figure 7:

Cost function values. (Left) Values of the cost function J are represented in color for variants that passed all three optimization tests. Variants that failed any of the three tests appear as white. J10 denotes the tenth-lowest cost function value. (Right) The ten variants with the lowest cost function values are shaded in blue. Reconstructed data for the optimal variant (⋆) and a variant that failed the reconstruction test (×) are shown in Fig 8.
Because the cost function contains multiple competing terms that balance computation speed with fidelity to the benchmark UKF, a variety of variable combinations yielded cost function values less than 1. The ten best variants in Table 3 span all three integration methods, all three matrix square root methods, both sigma point generation methods, and three of the four values of tint. This indicates that a balance between speed and fidelity can be obtained through multiple different approaches.
Six of the ten best variants share the same model variables (p = s, tint = 1 s, Mint = Euler, F = tanh) and thus also share the same value of Tf. Four of the ten are adjacent pairs of Cholesky and SR-UKF variants. In both pairs, the SR-UKF computes in slightly less time than the corresponding Cholesky variant.
Absent from Table 3 are variants using double precision or the LP7 function. The lowest cost function value for a variant using double precision was 0.284, which ranks 14th among all variants. All of the surviving tanh variants had lower cost function values than the LP7 variant with the lowest cost function value (J = 1.150).
In Fig 8, we compare UKF state reconstruction for the optimal UKF and a selected UKF variant that passed the forecast test but failed the state reconstruction test. Data are shown from the window that yielded the 95th percentile value for the failing variation. The failing variant catastrophically diverges from the true data at the end of the segment and only loosely tracks the downstrokes of F R, whereas the optimal variant maintains acceptable fidelity over the entire hour and closely tracks F R.
Figure 8:

True data and UKF reconstructions of the three states used for the error metrics are shown for the optimal variant and a selected variant that failed the state reconstruction test. The data are excerpted from the state reconstruction test window corresponding to for the failing variation, which uses double precision, tint = 2 s, trapezoidal integration, the tanh function, the SR-UKF, and spheical-simplex (13) sigma points.
3.6. Optimization computation time
The total computation time for the optimization was 139 h. The values of tcomp were generated from tests run on the target MCU, which took 17 min. The proxy PC was used to generate the forecast and reconstruction test data, which took 8 min and 138 h to generate, respectively. We summarize these times along with the number of variants assessed in each test in Table 4.
Table 4.
Optimization stage computation times
| Test | Variants tested | Time | Platform | Variants eliminated |
|---|---|---|---|---|
| 1. Computation time | 432 | 17 min | MCU | 105 |
| 2. Forecast | 327 | 8 min | PC | 152 |
| 3. State reconstruction | 175 | 138 h | PC | 20 |
The order chosen for the optimization stages minimizes the time required to compute the results for multiple reasons. The results from the computation time test can be obtained quickly by integrating one or two cycles of the UKF prediction-correction-forecast scheme, making this data relatively inexpensive to acquire over the full variable space at the outset. Placing the computation time test first also disqualifies the most time-intensive variants from later stages. Furthermore, the forecast test decouples the model variables from the UKF structure variables, thereby allowing a family of variants—six in our case—to be assessed together as a unit. And for this case study, the state reconstruction test was by far the most time intensive. Thus, placing it after the other two tests minimized the number of variants that passed through it.
The forecast test involves integration of the model equations without correction from the UKF, thus allowing weaknesses in model variable combinations to become apparent. In contrast, the state reconstruction test includes the UKF correction step, which may mask weaknesses in variables associated with model integration. Therefore it is reasonable to assume that inadequate combinations of model variables are more likely to fail during the forecast test with uncorrected integration than during the state reconstruction test. However, combinations of model variables may potentially couple with combinations of UKF variables to highlight weaknesses in both. The assumption underlying sigma-point KFs is that the distribution of state is well defined by a multivariate Gaussian distribution. The UKF structure variables systematically determine sets of sigma points that represent the state distribution more or less effectively. Meanwhile, the model variables—different integrators and function approximations, for example—alter the shape of the forward integrated distribution with respect to the “ideal” state distribution. It is not guaranteed that these two effects—shortcomings in the arrangement of sigma points and simplification of the integrated dynamics—will not systematically align to expose weaknesses of both. This highlights the importance of testing all surviving combinations of model variables with all of the UKF structure variants in the state reconstruction test.
MATLAB code for the optimization method is available on Penn State’s ScholarSphere at https://doi.org/doi:10.26207/bzgp-ph78.
4. Conclusion
We have presented a method for designing an embedded UKF system for a computationally constrained target device. The method seeks an optimum balance between fast computation time on the one hand and fidelity of state forecasting and reconstruction relative to a benchmark UKF on the other hand. We have described an optimization space of UKF system variables and a series of tests to assess UKF system performance in computation time, forecasting, and state reconstruction. We applied the optimization method to an embedded UKF case study that tracks the rat sleep-wake regulatory system. The original benchmark UKF in the case study computed a two-second observation cycle in 11.94 s on the target platform. After applying the series of tests, we ranked passing variants using a multi-objective cost function and presented sample reconstructions comparing the optimal UKF to the benchmark UKF. The optimal UKF reduces the computation time on the target platform by a factor of 27 compared to the benchmark UKF, and it maintains a high level of fidelity to the benchmark reconstruction relative to the benchmark error.
We again emphasize the two aspects of the UKF that allow the computations to be simplified while maintaining relative accuracy: 1) the magnitude of the process noise, which provides leeway for reducing the complexity of the forecast integration, and 2) the influence of the Kalman correction step, which constrains the state estimate error in spite of simplifications made to the UKF structure. We also highlight that the optimization process was accelerated by evaluating the model variables in isolation from the UKF structure variables during the forecast test and ordering the tests to eliminate as many variants as possible from the time intensive state reconstruction test.
Embedded systems are ad hoc by nature, and the presented framework should be modified or expanded as needed, for example, by the addition of fixed point types in the precision variable. However, the underlying themes developed in our method should remain applicable across a broad range of applications.
Future work includes deploying an embedded UKF system to predict sleep-state transitions for in vivo epilepsy experiments in rat models. The signal processing of raw local field potential data must be translated from desktop PC software algorithms to FPGA hardware, and mechanisms to resume UKF state reconstruction after seizure events will be investigated.
The UKF is a powerful tool with many possible applications in biological research and biomedical devices that are only beginning to be explored. In this paper, we have demonstrated that a model that at first appears to be too complex to be realized in a low-power system can be scaled down in a way that maintains the quality of state forecasting and reconstruction. It is hoped that the results presented here will aid others in realizing UKF systems in highly portable embedded platforms.
Funding:
P.P.G. was supported by the Penn State College of Engineering Leighton Riess Graduate Fellowship in Biodevice Research and the Penn State Department of Electrical Engineering. B.J.G.’s effort was partially supported through NIH R01EB019804. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
CRediT authorship contribution statement
Philip P. Graybill: Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Bruce J. Gluckman: Conceptualization, Funding acquisition, Methodology, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. Mehdi Kiani: Funding acquisition, Resources, Supervision, Writing – review & editing.
Before this transformation, the sigma points may be regenerated using and . This optional step has the advantage of placing the sigma points on the isoprobability defined by before transformation. In this paper, we have omitted this step because of the simple linear nature of our observation function.
References
- Albers DJ, Levine M, Gluckman B, Ginsberg H, Hripcsak G, Mamykina L, 2017. Personalized glucose forecasting for type 2 diabetes using data assimilation. PLoS Computational Biology 13. doi: 10.1371/journal.pcbi.1005232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bahari F, Kimbugwe J, Alloway KD, Gluckman BJ, 2021. Model-based analysis and forecast of sleep-wake regulatory dynamics: Tools and applications to data. Chaos 31. doi: 10.1063/5.0024024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bahari F, Tulyaganova C, Billard M, Alloway K, Gluckman BJ, 2017. The neural basis for sleep regulation - Data assimilation from animal to model. Conference Record - Asilomar Conference on Signals, Systems and Computers, 1061–1065 doi: 10.1109/ACSSC.2016.7869532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Billard MW, Bahari F, Kimbugwe J, Alloway KD, Bruce J, 2018. The systemDrive : a Multisite , Multiregion Microdrive with Independent Drive Axis Angling for Chronic Multimodal Systems Neuroscience Recordings in Freely Behaving Animals. eNeuro 5, 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Diniz Behn CG, Booth V, 2010. Simulating Microinjection Experiments in a Novel Model of the Rat Sleep-Wake Regulatory Network. Journal of Neurophysiology 103, 1937–1953. URL: http://jn.physiology.org/cgi/doi/10.1152/jn.00795.2009, doi: 10.1152/jn.00795.2009. [DOI] [PubMed] [Google Scholar]
- Duník J, šimandl M, Straka O, 2012. Unscented Kalman Filter: Aspects and Adaptive Setting of Scaling Parameter. IEEE Transactions on Automatic Control 57, 2411–2416. [Google Scholar]
- Duník J, Straka O, Kost O, Havlík J, 2017. Noise covariance matrices in state-space models: A survey and comparison of estimation methods—Part I. International Journal of Adaptive Control and Signal Processing 31, 1505–1543. doi: 10.1002/acs.2783. [DOI] [Google Scholar]
- Fico VM, Arribas CP, Soaje AR, Prats MAM, Utrera SR, Vazquez ALR, Casquet LMP, 2015. Implementing the unscented Kalman filter on an embedded system: A lesson learnt, in: Proceedings of the IEEE International Conference on Industrial Technology, IEEE, Seville, Spain. pp. 2010–2014. doi: 10.1109/ICIT.2015.7125391. [DOI] [Google Scholar]
- Graybill P, Gluckman BJ, Kiani M, 2019. Toward a Wearable Data Assimilation Platform, in: IEEE Biomedical Circuits and Systems Conference, Nara, Japan. [Google Scholar]
- Grewal MS, Andrews AP, 2015. Kalman Filtering: Theory and Practice with MATLAB. 4th ed., Wiley, Hoboken, NJ, USA. [Google Scholar]
- Julier S, Uhlmann J, Durrant-Whyte HF, 2000. A New Method for the Nonlinear Transformation of Means and Covariances in Filters and Estimators. IEEE Transactions on Automatic Control 45, 477–482. [Google Scholar]
- Julier SJ, 2003. The Spherical Simplex Unscented Transformation, in: Proceedings of the American Control Conference, IEEE, Denver, Colorado, USA. pp. 2430–2434. doi: 10.1109/acc.2003.1243439. [DOI] [Google Scholar]
- Julier SJ, Uhlmann JK, 1997. New extension of the Kalman filter to nonlinear systems, in: Signal Processing, Sensor Fusion, and Target Recognition VI, SPIE, Orlando, FL, USA. pp. 182–193. doi: 10.1117/12.280797. [DOI] [Google Scholar]
- Julier SJ, Uhlmann JK, 2002. Reduced sigma point filters for the propagation of means and covariances through nonlinear transformations. Proceedings of the American Control Conference 2, 887–892. doi: 10.1109/ACC.2002.1023128. [DOI] [Google Scholar]
- Kaminski PG, Bryson AE, Schmidt SF, 1971. Discrete Square Root Filtering: A Survey of Current Techniques. doi: 10.1109/TAC.1971.1099816. [DOI] [Google Scholar]
- Kolås S, Foss BA, Schei TS, 2009. Constrained nonlinear state estimation based on the UKF approach. Computers and Chemical Engineering 33, 1386–1401. doi: 10.1016/j.compchemeng.2009.01.012. [DOI] [Google Scholar]
- Kuhlmann L, Freestone DR, Manton JH, Heyse B, Vereecke HE, Lipping T, Struys MM, Liley DT, 2016. Neural mass model-based tracking of anesthetic brain states. NeuroImage 133, 438–456. URL: 10.1016/j.neuroimage.2016.03.039, doi: 10.1016/j.neuroimage.2016.03.039. [DOI] [PubMed] [Google Scholar]
- Kulikov GY, Kulikova MV, 2021. Itô-Taylor-based square-root unscented Kalman filtering methods for state estimation in nonlinear continuous-discrete stochastic systems. European Journal of Control 58, 101–113. URL: https://www.sciencedirect.com/science/article/pii/S0947358020301448, doi: 10.1016/j.ejcon.2020.07.003. [DOI] [Google Scholar]
- Kulikov GY, Kulikova MV, 2022. Hyperbolic-SVD-Based Square-Root Unscented Kalman Filters in Continuous-Discrete Target Tracking Scenarios. IEEE Transactions on Automatic Control 67, 366–373. doi: 10.1109/TAC.2021.3056338. [DOI] [Google Scholar]
- Kulikova MV, Lima PM, Kulikov GY, 2022. Sequential method for fast neural population activity reconstruction in the cortex from incomplete noisy measurements. Computers in Biology and Medicine 141. doi: 10.1016/j.compbiomed.2021.105103. [DOI] [PubMed] [Google Scholar]
- van der Merwe R, Wan EA, 2001. The square-root unscented Kalman filter for state and parameter-estimation, in: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, Utah, USA. pp. 3461–3464. doi: 10.1109/icassp.2001.940586. [DOI] [Google Scholar]
- Meskin N, Nounou H, Nounou M, Datta A, 2013. Parameter estimation of biological phenomena: An unscented Kalman filter approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10, 537–543. doi: 10.1109/TCBB.2013.19. [DOI] [PubMed] [Google Scholar]
- Narasimhan S, Rengaswamy R, 2009. Reply to Comments on “Robust and reliable estimation via unscented recursive nonlinear dynamic data reconciliation” (URNDDR). Journal of Process Control 19, 719–721. doi: 10.1016/j.jprocont.2008.08.002. [DOI] [Google Scholar]
- Rhudy M, Gu Y, Gross J, Napolitano MR, 2011. Evaluation of matrix square root operations for UKF within a UAV GPS/INS sensor fusion application. International Journal of Navigation and Observation 2011. doi: 10.1155/2011/416828. [DOI] [Google Scholar]
- Saatci E, Akan A, 2009. Dual Unscented Kalman Filter and Its Applications to Respiratory System Modelling, in: Moreno VM, Pigazo A (Eds.), Kalman Filter: Recent Advances and Applications. IntechOpen, Vienna, AUS. chapter 9. doi: 10.5772/6807. [DOI] [Google Scholar]
- Sakai A, Kuroda Y, 2010. Discriminatively Trained Unscented Kalman Filter for Mobile Robot Localization. Journal of Advanced Research in Mechanical Engineering 1, 153–161. URL: http://www.hypersciences.org/JARME/Iss.3-2010/JARME-5-3-2010.pdf. [Google Scholar]
- Scardua LA, da Cruz JJ, 2017. Complete offline tuning of the unscented Kalman filter. Automatica 80, 54–61. URL: 10.1016/j.automatica.2017.01.008, doi: 10.1016/j.automatica.2017.01.008. [DOI] [Google Scholar]
- Schiff SJ, Sauer T, 2008. Kalman filter control of a model of spatiotemporal cortical dynamics. Journal of Neural Engineering 5, 1–8. doi: 10.1088/1741-2560/5/1/001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sedigh-Sarvestani M, Schiff SJ, Gluckman BJ, 2012. Reconstructing Mammalian Sleep Dynamics with Data Assimilation. PLoS Computational Biology 8. doi: 10.1371/journal.pcbi.1002788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon D, 2006. Optimal State Estimation: Kalman, H-Infinity, and Nonlinear Approaches. John Wiley & Sons, Inc., Hoboken, NJ, USA. [Google Scholar]
- Simon DJ, 2010. Kalman Filtering With State Constraints: A Survey of Linear and Nonlinear Algorithms. IET Control Theory & Applications 4, 1303–1318. URL: https://engagedscholarship.csuohio.edu/enece_facpub. [Google Scholar]
- Soh J, Wu X, 2012. A FPGA-based approach to attitude determination for nanosatellites, in: Proceedings of the 7th IEEE Conference on Industrial Electronics and Applications (ICIEA), IEEE. pp. 1700–1704. doi: 10.1109/ICIEA.2012.6360999. [DOI] [Google Scholar]
- Soh J, Wu X, 2014. A modular FPGA-based implementation of the unscented Kalman filter, in: NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 127–134. [Google Scholar]
- Soh J, Wu X, 2017. An FPGA-Based Unscented Kalman Filter for System-On-Chip Applications. IEEE Transactions on Circuits and Systems II: Express Briefs 64, 447–451. doi: 10.1109/TCSII.2016.2565730. [DOI] [Google Scholar]
- Straka O, Duník J, šimandl M, 2014. Unscented Kalman filter with advanced adaptation of scaling parameter. Automatica 50, 2657–2664. URL: 10.1016/j.automatica.2014.08.030, doi: 10.1016/j.automatica.2014.08.030. [DOI] [Google Scholar]
- Straka O, Duník J, šimandl M, Havlík J, 2013. Aspects and Comparison of Matrix Decompositions in Unscented Kalman Filter, in: American Control Conference, IEEE, Washington, DC, USA. pp. 3075–3080. [Google Scholar]
- Teixeira BO, Tôrres LA, Aguirre LA, Bernstein DS, 2010. On unscented Kalman filtering with state interval constraints. Journal of Process Control 20, 45–57. URL: 10.1016/j.jprocont.2009.10.007, doi: 10.1016/j.jprocont.2009.10.007. [DOI] [Google Scholar]
- Turner R, Rasmussen CE, 2012. Model based learning of sigma points in unscented Kalman filtering. Neurocomputing 80, 47–53. URL: 10.1016/j.neucom.2011.07.029, doi: 10.1016/j.neucom.2011.07.029. [DOI] [Google Scholar]
- Ullah G, Schiff SJ, 2010. Assimilating Seizure Dynamics. PLoS Computational Biology 6. doi: 10.1371/journal.pcbi.1000776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vachhani P, Narasimhan S, Rengaswamy R, 2006. Robust and reliable estimation via Unscented Recursive Nonlinear Dynamic Data Reconciliation. Journal of Process Control 16, 1075–1086. doi: 10.1016/j.jprocont.2006.07.002. [DOI] [Google Scholar]
- Valade A, Acco P, Grabolosa P, Fourniols JY, 2017. A study about Kalman filters applied to embedded sensors. Sensors (Switzerland) 17, 1–18. doi: 10.3390/s17122810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voss HU, Timmer J, Kurths J, 2004. Nonlinear dynamical system identification from uncertain and indirect measurements. International Journal of Bifurcation and Chaos in Applied Sciences and Engineering 14, 1905–1933. doi: 10.1142/S0218127404010345. [DOI] [Google Scholar]
- Zhu X, Jiang R, Chen Y, Hu S, Wang D, 2011. FPGA implementation of Kalman filter for neural ensemble decoding of rat’s motor cortex. Neurocomputing 74, 2906–2913. doi: 10.1016/j.neucom.2011.03.044. [DOI] [Google Scholar]
