Robust spectrotemporal decomposition by iteratively reweighted least squares

Demba Ba; Behtash Babadi; Patrick L Purdon; Emery N Brown

doi:10.1073/pnas.1320637111

. 2014 Dec 2;111(50):E5336–E5345. doi: 10.1073/pnas.1320637111

Robust spectrotemporal decomposition by iteratively reweighted least squares

Demba Ba ^a,^b,¹, Behtash Babadi ^c, Patrick L Purdon ^b,^d, Emery N Brown ^a,^b,^d,^e

PMCID: PMC4273341 PMID: 25468968

Significance

Classical spectral estimation techniques use sliding windows to enforce temporal smoothness of the spectral estimates of signals with time-varying spectrotemporal representations. This widely applied approach is not well-suited to signals that have low-dimensional, highly structured time–frequency representations. We develop a new Bayesian spectral decomposition framework—spectrotemporal pursuit—to compute spectral estimates that are smooth in time and sparse in frequency. We use a statistical interpretation of sparse recovery to derive efficient algorithms for computing spectrotemporal pursuit spectral estimates. We apply spectrotemporal pursuit to achieve a more precise delineation of the oscillatory structure of human electroencephalogram and neural spiking data under propofol general anesthesia. Spectrotemporal pursuit offers a principled alternative to existing methods for decomposing a signal into a small number of oscillatory components.

Keywords: structured sparsity, spectral decomposition, dynamics, recursive estimation, neural signal processing

Abstract

Classical nonparametric spectral analysis uses sliding windows to capture the dynamic nature of most real-world time series. This universally accepted approach fails to exploit the temporal continuity in the data and is not well-suited for signals with highly structured time–frequency representations. For a time series whose time-varying mean is the superposition of a small number of oscillatory components, we formulate nonparametric batch spectral analysis as a Bayesian estimation problem. We introduce prior distributions on the time–frequency plane that yield maximum a posteriori (MAP) spectral estimates that are continuous in time yet sparse in frequency. Our spectral decomposition procedure, termed spectrotemporal pursuit, can be efficiently computed using an iteratively reweighted least-squares algorithm and scales well with typical data lengths. We show that spectrotemporal pursuit works by applying to the time series a set of data-derived filters. Using a link between Gaussian mixture models, $ℓ_{1}$ minimization, and the expectation–maximization algorithm, we prove that spectrotemporal pursuit converges to the global MAP estimate. We illustrate our technique on simulated and real human EEG data as well as on human neural spiking activity recorded during loss of consciousness induced by the anesthetic propofol. For the EEG data, our technique yields significantly denoised spectral estimates that have significantly higher time and frequency resolution than multitaper spectral estimates. For the neural spiking data, we obtain a new spectral representation of neuronal firing rates. Spectrotemporal pursuit offers a robust spectral decomposition framework that is a principled alternative to existing methods for decomposing time series into a small number of smooth oscillatory components.

Across nearly all fields of science and engineering, dynamic behavior in time-series data, due to evolving temporal and/or spatial features, is a ubiquitous phenomenon. Common examples include speech (1), image, and video (2) signals; neural spike trains (3) and EEG (4) measurements; seismic and oceanographic recordings (5); and radar emissions (6). Because the temporal and spatial dynamics in these time series are often complex, nonparametric spectral techniques, rather than parametric, model-based approaches (7), are the methods most widely applied in the analysis of these data. Nonparametric spectral techniques based on Fourier methods (8, 9), wavelets (10, 11), and data-dependent approaches, such as the empirical mode decomposition (EMD) (12, 13), use sliding windows to take account of the dynamic behavior. Although analysis with sliding windows is universally accepted, this approach has several drawbacks.

First, the spectral estimates computed in a given window do not use the estimates computed in adjacent windows, hence the resulting spectral representations do not fully capture the degree of smoothness inherent in the underlying signal. Second, the uncertainty principle (14) imposes stringent limits on the spectral resolution achievable by Fourier-based methods within a window (8, 9). Because the spectral resolution is inversely proportional to the window length, sliding window-based spectral analyses are problematic when the signal dynamics occur at a shorter time-scale than the window length. Third, in many analyses, such as EEG studies (15), speech processing (1), and applications of EMD (13), a common objective is to compute time–frequency representations that are smooth (continuous) in time and sparse in frequency. Current spectral estimation procedures are not specifically tailored to achieve smoothness in time and sparsity in frequency. Finally, batch time-series analyses are also common in many applications (5, 12, 13, 15). Although the batch analyses can use all of the data in the recorded time series to estimate the time–frequency representation at each time point, spectral estimation limited to local windows remains the solution of choice because the computational demands of batch analyses scale poorly with the length of the time series. Using all of the data in batch spectral analyses would enhance both time and frequency resolution.

For a time series whose time-varying mean is the superposition of a small number of smooth oscillatory components, we formulate nonparametric batch spectral analysis as a Bayesian estimation problem. We assume a Gaussian or a point-process observation model for the time series and introduce prior distributions on the time–frequency plane that yield maximum a posteriori (MAP) spectral estimates that are smooth (continuous) in time yet sparse in frequency. Our choice of prior distributions is motivated by EMD (13), and its variants (11, 16), which decompose signals into a small number of oscillatory components. We term our procedure “spectrotemporal pursuit.” To compute the spectrotemporal pursuit spectral estimate, we develop highly efficient, recursive, iteratively reweighted least-squares (IRLS) algorithms in which each iteration is implemented by using fixed interval smoothing algorithms (17, 18). We prove that the spectrotemporal pursuit spectral estimates converge to the global MAP estimates by using an important link between Gaussian mixture models, $ℓ_{1}$ minimization, and the expectation–maximization algorithm (19, 20). We show that computation of the spectrotemporal pursuit spectral estimates is equivalent to applying to the time series a bank of data-dependent filters, one for each oscillatory component.

We illustrate spectrotemporal pursuit in simulation studies as well as in analyses of human EEG and neural spiking data (15, 21) recorded during unconsciousness induced by the anesthetic propofol. The spectrotemporal pursuit analysis of the EEG data yields significantly denoised spectral estimates that have higher time and frequency resolution than multitaper spectral estimates. Our analysis of the neural spiking data yields a new spectral description of neural firing rates that can further our understanding of the relationship of spiking activity to local field potentials.

Toy Example

We begin with a toy example to highlight the deficiencies of classical techniques in analyzing time series that exhibit dynamic behavior and the limits they impose on spectral resolution (8, 9). We simulated noisy observations from the linear combination of two amplitude-modulated signals as

\begin{array}{l} y_{t} = 10 \cos^{8} (2 π f_{0} t) \sin (2 π f_{1} t) \\ + 10 \exp (4 \frac{t - T}{T}) \cos (2 π f_{2} t) + v_{t}, for 0 \leq t \leq T, \end{array}

[1]

where $f_{0} = 0.04$ Hz, $f_{1} = 10$ Hz, $f_{2} = 11$ Hz, $T = 600 s$ , and ${(v_{t})}_{t = 1}^{T}$ is independent, identically distributed, zero-mean Gaussian noise with variance set to achieve a signal-to-noise ratio (SNR) of 5 dB. The simulated data (Fig. 1) consist of a 10-Hz oscillation whose amplitude is modulated by a slow 0.04-Hz oscillation, and an exponentially growing 11-Hz oscillation. The former is motivated by the fact that low-frequency (<1 Hz) phase modulates alpha (8–12 Hz) amplitude during profound unconsciousness, and during the transition into and out of unconsciousness, under propofol-induced general anesthesia (15, 21). We incorporated the latter to demonstrate the desire, in certain applications, to resolve closely spaced amplitude-modulated signals.

Fig. 1. — Linear combination of two amplitude-modulated oscillations at 10- and 11-Hz frequencies, respectively, observed in independent, identically distributed, additive, zero-mean Gaussian noise with variance set to achieve an SNR of 5 dB. (A) The full simulated data according to Eq. 1. (B) Zoomed-in view from $t = 295 s$ to $t = 305 s$ .

This toy example poses serious challenges for classical spectral estimation algorithms due to the strong amplitude modulation and dynamic behaviors, observation noise, and identifiability issues (the decomposition is not unique) (11).

In the next section, we develop an analysis paradigm to separate signals such as the 10-and 11-Hz oscillations and recover the time-varying modulating signals. The key to our spectral decomposition framework is the concept of a structured time–frequency representation.

Theory: Robust Spectral Decomposition

The State-Space Model.

Consider a discrete-time signal $y_{t}, t = 1,2, \dots, T$ obtained by sampling of an underlying, noise-corrupted, continuous-time signal at a rate $f_{s}$ (above the Nyquist rate). Given an arbitrary interval of length W, let $y_{n} ≔ {(y_{(n - 1) W + 1}, y_{(n - 1) W + 2}, \dots, y_{n W})}^{'}$ for $n = 1,2, \dots, N$ with $N ≔ \frac{T}{W}$ . Without loss of generality, we assume that T is an integer multiple of W and consider the following spectrotemporal representation of $y_{n}$ as

y_{n} = {\tilde{F}}_{n} {\tilde{x}}_{n} + v_{n},

[2]

where ${({\tilde{F}}_{n})}_{l, k} ≔ \exp (j 2 π ((n - 1) W + l) \frac{(k - 1)}{K})$ for $l = 1,2, \dots, W$ and $k = 1,2, \dots, K$ , ${\tilde{x}}_{n} ≔ {({\tilde{x}}_{n, 1}, {\tilde{x}}_{n, 2}, \dots, {\tilde{x}}_{n, K})}^{'} \in ℂ^{K}$ , with K being a positive integer, and $v_{n}$ is independent, identically distributed, additive zero-mean Gaussian noise. Equivalently, we can define the linear observation model of Eq. 2 over a real vector space as follows:

y_{n} = F_{n} x_{n} + v_{n},

[3]

where ${(F_{n})}_{l, k} : = \cos (2 π ((n - 1) W + l) \frac{(k - 1)}{K})$ and ${(F_{n})}_{l, k + K / 2} : = \sin (2 π ((n - 1) W + l) \frac{k - 1 + K / 2}{K})$ for $l = 1,2, \dots, W$ and $k = 1,2, \dots, \frac{K}{2}$ , and $x_{n} ≔ {(x_{n, 1}, x_{n, 2}, \dots, x_{n, K})}^{'} \in ℝ^{K}$ . We may rewrite Eq. 3 conveniently in vector form as $y = F x + v$ , where F is a $T \times N K$ block-diagonal matrix with $F_{n}$ on the diagonal blocks:

F ≔ (\begin{matrix} F_{1} \\ F_{2} \\ ⋱ \\ F_{N} \end{matrix}),

[4]

$y = {(y_{1}, y_{2}, \dots, y_{T})}^{'} \in ℝ^{T}$ , $x = {({x'}_{1}, {x'}_{2}, \dots, {x'}_{N})}^{'} \in ℝ^{N K}$ , and $v = {({v'}_{1}, {v'}_{2}, \dots, {v'}_{N})}^{'} \in ℝ^{T}$ . We view x as a time–frequency representation of the time-varying mean of the signal y.

As we show in A Spectrotemporal Pursuit Analysis of Neural Spiking Activity, we can generalize this linear Gaussian forward model to nonlinear spectrotemporal parameterizations of the joint distribution of non-Gaussian data.

Our objective is to compute an estimate $\hat{x}$ of x given the data y. The component-wise magnitude-squared of $\hat{x} \in ℝ^{N K}$ then gives an estimate of the magnitude spectrum of y. Classical spectral estimation techniques use sliding windows with overlap to implicitly enforce temporal smoothness of the oscillatory components; i.e., adjacent spectral estimates are intended to be close in value. However, rather than being stated explicitly in the form of a model (deterministic or stochastic) for the evolution of the spectral estimates, the temporal smoothness is implicit in the degree of nonoverlap of the respective windows. Moreover, these techniques do not consider sparsity in the frequency domain. In contrast, we take a direct approach that treats ${(x_{n})}_{n = 1}^{N}$ as the realization of a sequence of random variables and uses a prior distribution to impose explicitly a stochastic continuity constraint on its elements across time. We impose a model on the components ${(x_{n, k})}_{k = 1}^{K}$ for each $n = 1,2, \dots, N$ to enforce sparsity in the frequency domain. Starting with an initial condition $x_{0} = {(0, \dots, 0)}^{'} \in ℝ^{K}$ , we can express the stochastic continuity constraint in the form of the first-order difference equation

x_{n} = x_{n - 1} + w_{n},

[5]

where $w = {(w_{1}', w_{2}', \dots, w_{N}')}^{'} \in ℝ^{N K}$ is a random vector. To impose stochastic continuity, we assume a joint prior probability density function over $w_{1}, w_{2}, \dots, w_{N}$ , which in turn imposes a joint probability density function on ${(x_{n})}_{n = 1}^{N}$ . Motivated by the empirical mode decomposition (13) and its variants (11, 16), we choose prior densities that enforce sparsity in the frequency domain and smoothness in time. In logarithmic form, the priors we propose are

\log p_{1} (w_{1}, w_{2}, \dots, w_{N}) = - α_{1} \sum_{k = 1}^{K} {(\sum_{n = 1}^{N} w_{n, k}^{2} + ϵ^{2})}^{\frac{1}{2}} + c_{1},

[6]

and

\log p_{2} (w_{1}, w_{2}, \dots, w_{N}) = - α_{2} \sum_{k = 1}^{K} {(\sum_{n = 1}^{N} {(w_{n, k}^{2} + ϵ^{2})}^{\frac{1}{2}})}^{\frac{1}{2}} + c_{2},

[7]

where $α_{1} > 0$ and $α_{2} > 0$ are constants, $e^{c_{1}}$ and $e^{c_{2}}$ are normalization constants, and $ϵ > 0$ is a small constant. These priors belong to the Gaussian scale mixture (GSM) family of densities (19, 20). This family of densities has robustness properties in the statistical sense (22). Both $p_{1} (\cdot)$ and $p_{2} (\cdot)$ enforce inverse solutions that are spatially sparse and temporally smooth. Indeed, one can interpret Eqs. 6 and 7 as forms of dynamic group-sparse regularization (23). Unlike $p_{1} (\cdot)$ , $p_{2} (\cdot)$ can capture abrupt changes in the dynamics of the inverse solution. Under each of these priors, the discrete-time stochastic process ${(x_{n})}_{n = 1}^{N}$ is a non-Gaussian random process whose increments are statistically dependent.

The Inverse Solution: Spectrotemporal Pursuit.

We formulate the problem of computing a robust spectral decomposition as one of Bayesian estimation in which the posterior density of x given y fully characterizes the space of inverse solutions. The forward model of Eq. 3 specifies the likelihood of the data, that is to say the conditional density of y given x. We assume that the observation noise $v_{n}$ are samples from independent, identically distributed, zero-mean Gaussian random vectors with covariance $σ^{2} I$ . To simplify the notation, we let

f_{i} (x_{1}, x_{2}, \dots, x_{N}) ≔ \log p_{i} (x_{1} - x_{0}, x_{2} - x_{1}, \dots, x_{N} - x_{N - 1})

[8]

for $i = 1$ and 2 in what follows. We compute robust spectral estimates by solving the MAP estimation problem

max_{x_{1}, \dots, x_{N}} - \sum_{n = 1}^{N} \frac{1}{2 σ^{2}} {‖ y_{n} - F_{n} x_{n} ‖}_{2}^{2} + f (x_{1}, x_{2}, \dots, x_{N}) .

[9]

We call the MAP estimation problem of Eq. 9 the spectrotemporal pursuit problem, and its solution the spectrotemporal pursuit estimate. To solve this optimization problem, we can absorb the constant σ in $f (\cdot)$ (specifically in $α_{1}$ in the case of Eq. 6 and $α_{2}$ in the case of Eq. 7). Therefore, henceforth, we assume that $σ = 1$ . In SI Text, we give a continuous-time variational interpretation of spectrotemporal pursuit.

Eq. 9, with $f (\cdot) = f_{1} (\cdot)$ , is a strictly concave optimization problem, which, in principle, can be solved using standard techniques. However, our experience has shown that these techniques do not scale well with N because of the batch nature of the problem. We use the Bayesian formulation of spectrotemporal pursuit—in particular, the relationship between sparsity-promoting priors and the expectation–maximization (EM) algorithm, to develop highly efficient IRLS algorithms that exploit the temporal structure of Eq. 9. Gradient-based algorithms (24, 25) are a popular alternative to IRLS for finding the maximizer of Eq. 9 with $ϵ = 0$ . One of the advantages of our Bayesian formulation is the ability to characterize the uncertainty of the estimate in the form of confidence bounds. We discuss this in further detail in SI Text, Advantages of IRLS over Gradient-Based Methods.

Link of Spectrotemporal Pursuit to Basis Pursuit Denoising, IRLS Algorithms, and Sparse Recovery.

Spectrotemporal pursuit builds on results in basis pursuit denoising (BPDN) and IRLS algorithms developed to compute sparse decompositions in noise-free systems. Chen et al. (26) demonstrated using the BPDN algorithm that static signals can be decomposed into sparse representations using finite dictionaries obtained by discretizing in the frequency domain. We termed our procedure spectrotemporal pursuit to emphasize the link with BPDN. Daubechies et al. (27) showed under mild conditions that in the absence of noise, IRLS algorithms can recover sparse signals. It is straightforward to see that the IRLS algorithm of Daubechies et al. (27) solves a BPDN-type optimization problem. We recently extended the work of Daubechies et al. (27) to solve the problem of sparse recovery in the presence of noise (20), and broadened the family of IRLS algorithms. The IRLS algorithms we derive in the next section belong to the broader class of IRLS algorithms we introduced (20). The key insight from those results, which we apply here, is that this broad class of IRLS algorithms can be used in the context of Bayesian estimation of state-space models (28) to compute highly structured spatiotemporal decompositions efficiently. In SI Text, we elaborate on these relationships.

An IRLS Algorithm for Spectrotemporal Pursuit.

We show that we can obtain the solution $\hat{x}$ to Eq. 9 as the limit of a sequence ${({\hat{x}}^{(ℓ)})}_{ℓ = 0}^{\infty}$ whose $ℓ^{th}$ element, $ℓ = 1, \dots, \infty$ , is the solution to a Gaussian MAP estimation problem (constrained least-squares program) of the form

max_{x_{1}, \dots, x_{N}} - \sum_{n = 1}^{N} \frac{1}{2 σ^{2}} {‖ y_{n} - F_{n} x_{n} ‖}_{2}^{2} - \sum_{k = 1}^{K} \sum_{n = 1}^{N} \frac{{(x_{n, k} - x_{n - 1, k})}^{2}}{2 {(Q_{n}^{(ℓ)})}_{k, k}} .

[10]

For each ℓ, $Q_{n}^{(ℓ)}$ is a $K \times K$ diagonal matrix that depends on ${\hat{x}}_{n - 1}^{(ℓ - 1)}, {\hat{x}}_{n}^{(ℓ - 1)}$ and $f (\cdot)$ . For instance, for $f (\cdot) = f_{1} (\cdot)$ , $Q_{n}^{(ℓ)} = Q^{(ℓ)}$ does not depend on n and we have

{(Q^{(ℓ)})}_{k, k} = \frac{{(\sum_{n = 1}^{N} {({\hat{x}}_{n, k}^{(ℓ - 1)} - {\hat{x}}_{n - 1, k}^{(ℓ - 1)})}^{2} + ϵ^{2})}^{\frac{1}{2}}}{α_{1}} .

[11]

Eq. 10 is a quadratic program with strictly concave objective and block-tridiagonal Hessian. The fixed interval smoother (17) exploits this structure to give an efficient solution to this program via forward–backward substitution. This iterative solution to the spectrotemporal pursuit problem, which we refer to as the spectrotemporal pursuit algorithm, can be implemented using the following steps:

Input: Observations y, initial guess ${\hat{x}}^{(0)}$ of solution, state-noise covariances $Q_{n}^{(0)}$ , $n = 1, \dots, N$ , initial conditions $x_{0 | 0}$ , $Σ_{0 | 0}$ , tolerance $t o l \in (0,0.01)$ , and maximum number of iteration $L_{max} \in ℕ^{+}$ .

0.
Initialize iteration number ℓ to 1.
1.
Filter at time $n = 1,2, \dots, N$ :
$\begin{array}{l} x_{n | n - 1} = x_{n - 1 | n - 1} \\ Σ_{n | n - 1} = Σ_{n - 1 | n - 1} + Q_{n}^{(ℓ)} \\ K_{n} = Σ_{n | n - 1} F_{n}^{H} {(F_{n} Σ_{n | n - 1} F_{n}^{H} + σ^{2} I)}^{- 1} \\ x_{n | n} = x_{n | n - 1} + K_{n} (y_{n} - F_{n} x_{n | n - 1}) \\ Σ_{n | n} = Σ_{n | n - 1} - K_{n} F_{n} Σ_{n | n - 1} \end{array}$
2.
Smoother at time $n = N - 1, N - 2, \dots, 1$ :
$\begin{array}{l} B_{n} = Σ_{n | n} Σ_{n + 1 | n}^{- 1} \\ x_{n | N} = x_{n | n} + B_{n} (x_{n + 1 | N} - x_{n + 1 | n}) \\ Σ_{n | N} = Σ_{n | n} + B_{n} (Σ_{n + 1 | N} - Σ_{n + 1 | n}) B_{n}^{H} \end{array}$
3.
Let ${\hat{x}}_{n}^{(ℓ)} = x_{n | N}, n = 1, \dots, N$ and ${\hat{x}}^{(ℓ)} = {({\hat{x}}_{1}^{{(ℓ)}^{'}}, \dots, {\hat{x}}_{N}^{{(ℓ)}^{'}})}^{'}$ .
4.
Stop if $\frac{{‖ {\hat{x}}^{(ℓ)} - {\hat{x}}^{(ℓ - 1)} ‖}_{2}}{{‖ {\hat{x}}^{(ℓ - 1)} ‖}_{2}} < t o l$ or $ℓ = L_{max}$ , else
5.
Let $ℓ = ℓ + 1$ , and update the state covariance $Q_{n}^{(ℓ)}$ . For $f (\cdot) = f_{1} (\cdot)$ , $Q_{n}^{(ℓ)} = Q^{(ℓ)}$ is given by
${(Q^{(ℓ)})}_{k, k} = \frac{{(\sum_{n = 1}^{N} {({\hat{x}}_{n, k}^{(ℓ - 1)} - {\hat{x}}_{n - 1, k}^{(ℓ - 1)})}^{2} + ϵ^{2})}^{\frac{1}{2}}}{α_{1}}$
6.
Go back to 1.

Output: ${\hat{x}}^{(L)}$ , where $L \leq L_{max}$ is the number of the last iteration of the algorithm.

For $f (\cdot) = f_{1} (\cdot)$ , starting with ${\hat{x}}^{(0)}$ a guess of the solution, we solve Eq. 10 for $ℓ = 1,2, \dots, L$ with $Q^{(ℓ)}$ iteratively updated using Eq. 11. L is the smallest of $L_{max}$ , a prespecified maximum number of iterations, and the number of iterations when the convergence criterion of step 4 of the spectrotemporal pursuit algorithm is first satisfied. Consistent with previous reports (20), we have found that a small number of IRLS iterations (between 5 and 10) are sufficient in practice. Due to the dynamic nature of our state-space model, the MAP estimation problem Eq. 10 admits an iterative solution given by the fixed interval smoother. As noted previously, $Q^{(ℓ)}$ is independent of n for the penalization function $f_{1} (\cdot)$ .

Solving the IRLS problem in Eq. 10 with ${(Q^{(ℓ)})}_{k, k}$ in Eq. 11 is an EM algorithm (19, 20) for solving Eq. 9 with $f (\cdot) = f_{1} (\cdot)$ . The EM algorithm minorizes the objective function of Eq. 9 by a local quadratic approximation to the log-prior, which results in the least-squares problem of Eq. 10. The Hessian of the quadratic approximation is a function of $Q^{(ℓ)}$ in Eq. 11. One way to derive Eq. 11 is to invoke the concavity of the function $s^{1 / 2}$ , which implies that the linear approximation around a point $\bar{s}$ satisfies $s^{1 / 2} \leq {\bar{s}}^{1 / 2} + \frac{1}{2 {\bar{s}}^{1 / 2}} (s - \bar{s})$ . Applying this result to the log-prior at ${\hat{x}}^{(ℓ - 1)}$ allows us to readily extract ${(Q^{(ℓ)})}_{k, k}$ in Eq. 11.

In general, it can be shown that for priors from the GSM family of distributions, the EM algorithm for solving the MAP estimation problem of Eq. 9 results in IRLS solutions (20). The connection to EM theory leads to a class of IRLS algorithms (20) much broader than that considered in the existing literature (27) and a much simpler convergence analysis (20). We establish convergence of the IRLS algorithm of Eq. 10 as a theorem in Appendix: Convergence of the IRLS Algorithm. The convergence does not follow from the analysis of Daubechies et al. (27). The proof requires novel ideas (20) that we adapt to the current setting.

The difficulty in solving the spectrotemporal pursuit problem of Eq. 9 arises from the choice of the prior probability density functions over $w_{1}, w_{2}, \dots, w_{N}$ (Eqs. 6 and 7), which, for each $k = 1, \dots, K$ , groups ${(w_{n, k})}_{n = 1}^{N}$ across time. At first glance, this grouping suggests a batch solution. However, recognizing the connection with the EM theory for GSMs yields an IRLS solution that can be solved efficiently using the fixed interval smoother, owing to the dynamic structure of the state-space model. A large class of optimization problems can be solved by IRLS (29). In practice, however, the update step of IRLS is challenging. One attractive feature of our formulation is its modularity: each IRLS update step is simple in the sense that it can be solved using any algorithm that can solve a strictly concave quadratic program with block-tridiagonal Hessian.

We can easily specialize the discussion above to $f (\cdot) = f_{2} (\cdot)$ . In particular, if we let

\begin{matrix} {(Q_{n}^{(ℓ)})}_{k, k} = \frac{2}{α_{2}} {({({\hat{x}}_{n, k}^{(ℓ - 1)} - {\hat{x}}_{n - 1, k}^{(ℓ - 1)})}^{2} + ϵ^{2})}^{\frac{1}{2}} \\ \times {(\sum_{n^{'} = 1}^{N} {({({\hat{x}}_{n^{'}, k}^{(ℓ - 1)} - {\hat{x}}_{n^{'} - 1, k}^{(ℓ - 1)})}^{2} + ϵ^{2})}^{\frac{1}{2}})}^{\frac{1}{2}}, \end{matrix}

[12]

Eq. 10 becomes an IRLS algorithm for solving the spectrotemporal pursuit problem with $f (\cdot) = f_{2} (\cdot)$ . In this case, we can also prove the convergence of the algorithm. However, because $f_{2} (\cdot)$ is not concave, we can only guarantee convergence to a stationary point of the objective in Eq. 9 (20). In both cases, the role of ϵ is to avoid division by zero in forming the quadratic approximation of Eq. 10. As in the case of static IRLS (20), we expect the solution to be close, in some sense, to the maximizer of Eq. 9 with $ϵ = 0$ .

We use the terminology robust spectral estimator and estimate, to refer to spectrotemporal pursuit and its solution, respectively. The former terminology reflects the fact that $f_{1} (\cdot)$ and $f_{2} (\cdot)$ correspond to GSM (19) prior distributions on $x_{1}, x_{2} - x_{1}, \dots, x_{N} - x_{N - 1}$ , which are robust in the statistical sense (19).

Analysis of Spectral Resolution.

Determining the frequency resolution of a given estimator is a central question in spectral estimation. To characterize the resolution for nonparametric spectral estimators, we must study the properties of the so-called “taper” that is applied to the data. For instance, the multitaper spectral estimator uses the discrete prolate spheroidal sequences as tapers, which are known to have very small side lobes in the frequency domain (9). From the recursive form of the fixed interval smoother, it is not evident how the robust estimator fits into the tapering framework of nonparametric spectral estimators. In what follows, we show how the spectral resolution of the robust spectral estimator can be characterized.

Before proceeding with the analysis, let us first consider the linear least-squares estimate of x. Recall the compact form of the forward model $y = F x + v$ , where F is given in Eq. 4. Let us assume for convenience that the window length W is an integer multiple of the number K of discrete frequencies, i.e., $W = r K$ for some integer r, so that $F_{n} = F_{1}$ , for all n. The linear least-squares estimator maps the data y to

\hat{x} ≔ \frac{1}{r W} F^{H} y,

[13]

from which we can compute the spectrum as the component-wise magnitude of $\hat{x}$ of the data. In other words, the rows of $(1 / r W) F^{H}$ form a bank of filters, which consists of sliding windows of the Fourier basis on the interval $[1, W]$ . The side lobes of these filters determine the spectral resolution. In Appendix: Robust Spectral Decomposition as a Filter Bank, we show that for $f (\cdot) = f_{1} (\cdot)$ , the estimate $\hat{x}$ from the robust spectral estimator is given by

lim_{ℓ \to \infty} {\hat{x}}^{(ℓ)} = G F^{H} y,

[14]

where G is a weighting matrix and is only a function of the window size W and $Q^{(\infty)} ≔ \lim_{ℓ \to \infty} Q^{(ℓ)}$ . The rows of $G F^{H}$ form a filter bank whose output is equivalent to that of the robust spectral estimator at windows $n = 1,2, \dots, N$ . As shown in the appendix, the advantages of the weighting matrix G are twofold. First, the weighting shapes the Fourier basis by an effectively exponential taper for higher side-lobe suppression. Second, the choice of $Q^{(\infty)}$ from the data determines the gain of each filter; i.e., the filters corresponding to the large (small) elements of $Q^{(\infty)}$ are likely to have a high (low) gain. Therefore, the shaping of the filters is determined by the data itself.

Eq. 14 provides an ex post prescription to analyze the resolution and leakage of the robust spectral estimate; i.e., given W and $Q^{(\infty)}$ , we can form the matrix $G F^{H}$ , the rows of which are the equivalent bank of filters corresponding to the robust spectral estimator in windows $n = 1,2, \dots, N$ .

Toy Example Revisited

We use the data simulated in the toy example to compare the spectral estimates computed using spectrotemporal pursuit to the spectrogram computed using the multitaper method (8).

Fig. 2 A and B show the spectrogram estimates obtained by the multitaper method with 2-s temporal resolution and the multitaper method with 0.5 Hz frequency resolution, respectively. Fig. 2, Right, shows the zoomed-in view of the estimates in the interval $t = 250 s$ to $t = 350 s$ . We used a time bandwidth product of three and five tapers in both cases. The multitaper estimate with 2-s temporal resolution has a theoretical frequency resolution of 3 Hz (9), and as can be seen from Fig. 2A, is unable to resolve the closely spaced signals of 10 and 11 Hz in the frequency domain. However, the temporal details are captured to a certain extent. For the multitaper estimate with 0.5 Hz frequency resolution, the temporal resolution is 12 s and, as can be observed from Fig. 2B, the estimate it produces is smeared along the time axis and does not capture the details of the 0.04-Hz amplitude modulation. However, the two tones of 10 Hz and 11 Hz are well-resolved. For both multitaper instances, a significant amount of the additive noise has leaked into the spectrogram estimates. In other words, the multitaper approach is unable to denoise the estimate. Fig. 2C shows the spectral estimate obtained using spectrotemporal pursuit. Similarly, Fig. 2C, Right, shows the zoomed-in view of the estimate in the interval $t = 250 s$ to $t = 350 s$ . Spectrotemporal pursuit gives the sparse, more compact representation that we would hope to recover given the simulated data of Eq. 1. Indeed, we are able to faithfully recover the two tones as well as their temporal modulation. In addition, the spectral estimate from Fig. 2C is significantly denoised relative to that in Fig. 2 A and B. Lastly, as Fig. 2C suggests, spectrotemporal pursuit is able to overcome the fundamental limits imposed by the classical uncertainty principle (14): the spectral estimate of Fig. 2C exhibits high resolution both in time and in frequency; to illustrate this, we examine the two filters that, when applied to the data from the toy example, reproduce the spectrotemporal pursuit spectral estimates at frequencies 10 Hz and 5 Hz, respectively. Fig. 3 shows the equivalent filters corresponding to the spectrotemporal pursuit estimator at frequencies 10 Hz and 5 Hz for $t = 300 s$ . The equivalent filter at 10 Hz corresponds to the component $10 \cos^{8} (2 π f_{0} t) \sin (2 π f_{1} t)$ and, as explained in Toy Example, resembles a 10-Hz oscillation that is exponentially decaying in a piece-wise constant fashion. The first side-lobe is ∼10.5 Hz, with a suppression of ∼25 dB. The equivalent filter at 5 Hz, however, corresponds to a frequency that is not part of the signal. Hence, the peak gain is ∼10 dB smaller than that of the 10-Hz filter. As a result, the 5-Hz component of the estimate is negligible.

Fig. 3. — Equivalent filters corresponding to the robust spectral estimate of the toy example. (A) Equivalent filter and (B) its power spectral density (PSD) at $f = 10$ Hz and $t = 300 s$ . (C) Equivalent filter and (D) its PSD at $f = 5$ Hz and $t = 300$ s.

This toy example demonstrates the potential of structured time–frequency representations used in the spectrotemporal pursuit framework to go beyond the classical time–frequency resolution limits imposed by the uncertainty principle (14), and capture the dynamics of a signal whose time-varying mean is the sum a small number of oscillatory components.

Applications of Spectrotemporal Pursuit to Neural Signal Processing

A Spectrotemporal Pursuit Analysis of EEG Recordings.

We first illustrate the application of spectrotemporal pursuit by computing the robust spectral decomposition of frontal EEG data recorded from a patient during general anesthesia for a surgical procedure at Massachusetts General Hospital. The recording and analysis of these EEG data are part of an ongoing study that has been approved by the Massachusetts General Hospital Human Research Committee. All data collected were part of routine anesthesia care. General anesthesia is typically induced with propofol and is commonly maintained by administering a propofol infusion. Because data collection interacted in no way with patient management, and because all data were deidentified, informed patient consent was not required. A case is illustrated in Fig. 4. The patient received a bolus i.v. injection of propofol at ∼3.5 min, followed by a propofol infusion at 120 mcg⋅kg⁻¹⋅min⁻¹ that was maintained until minute 27, when the case ended.

When administered to initiate general anesthesia, propofol produces profound slow (<1 Hz) and delta (1–4 Hz) oscillations (Fig. 4, minute 5) (15, 21). With maintenance of general anesthesia, using propofol we observe an alpha oscillation (8–12 Hz) in addition to the slow and delta oscillations. The presence of the alpha oscillations along with the slow and delta oscillations is a marker of unconsciousness (15, 21). Developing a precise characterization of the dynamic properties of propofol is important for helping to define the neural circuit mechanisms of this anesthetic.

We computed the spectrogram for $T = 35$ min of EEG data, sampled at a rate $f_{s} = 250 Hz$ , using the multitaper method with 2-s temporal resolution (Fig. 4A), multitaper method with 0.5 Hz frequency resolution (Fig. 4B), and the magnitude of the spectral estimates from the spectrotemporal pursuit estimator with prior $f_{2} (\cdot)$ (Fig. 4C). Fig. 4, Right, shows zoomed-in views of the spectrogram from minute 15 to minute 18. For the spectrotemporal pursuit analysis, $W = 500$ , $N = 1,050$ , and we select $α_{2}$ by splitting the data into two sequences consisting of its even and odd times, respectively, and performing a form of twofold cross-validation (30). In other words, we assume that the spectrotemporal pursuit estimate of the spectrum is constant in windows of length 2 s. For each 2-s window of data, $F_{n}$ is the $500 \times 500$ matrix, which is the Fourier basis for the discrete-time interval $[(n - 1) W + 1, n W]$ for $n = 1,2, \dots, N$ . Because the original data are acquired at twice the sampling rate of the even and odd splits, the value of $α_{2}$ used on the original data are half of that obtained by cross-validation because the variance of random walk as in Eq. 5 increases linearly with time.

By setting the choice of tapers or the analysis window, it is possible, with the multitaper method, to achieve either high frequency or high temporal resolution. In contrast, as in the toy example, spectrotemporal pursuit achieves high temporal resolution, high spatial resolution, and performs significant denoising of the spectrogram. As a consequence of the simultaneous enhanced time–frequency resolution in the spectrotemporal pursuit analysis, the slow and delta oscillations are clearly delineated during induction (minute 3.5), whereas during maintenance (minutes 5–27), the oscillations are strongly localized in the slow delta and alpha bands. Furthermore, the denoising induced by spectrotemporal pursuit creates a $\approx 30$ -dB difference between these spectral bands and the other frequencies in the spectrum.

The spectrotemporal pursuit analysis offers the possibility of achieving a more precise delineation of the time–frequency properties of the EEG under propofol general anesthesia. We use this method to analyze in detail the dynamics of propofol as well as the other commonly used anesthetics.

A Spectrotemporal Pursuit Analysis of Neural Spiking Activity.

Modulation of neuronal firing rates by brain rhythms has been well documented (21, 31). During propofol-induced general anesthesia, the transition into and out of consciousness is characterized by abrupt changes both in average neuronal firing rates as well as their modulation by a low-frequency ( $\approx 0.5$ Hz) oscillation (21). Local field potentials (LFPs) are routinely recorded simultaneously with spike-train data. Typically, the spike-field coherence, a measure of phase synchronization between spike trains and LFPs, is used to quantify the modulation of firing rates by oscillations and its time course. Despite the prevalence of modulation of neuronal firing rates by brain rhythms, there are no principled approaches that use spike-train data alone, without recourse to LFPs, to extract oscillatory dynamics. We adapt our spectrotemporal pursuit framework to point-process data and apply it to neural spiking data acquired from a human subject during the induction of propofol general anesthesia. This study was approved by the Massachusetts General Hospital Human Research Committee. In accordance with the Partners Human Research Committee, informed consent was obtained from all patients.

We assume that the spike train from each of the neurons recorded during the experiment is the realization of a point process (32) in the interval $(0, T]$ . A point process is an ordered sequence of discrete events that occur at random points in continuous time. For a neural spike train, the elements of the sequence give the times in $(0, T]$ when the membrane potential of a neuron crosses a predetermined threshold—that is, the times when the neuron emits a spike. A point process is fully characterized by its conditional intensity function (CIF) (32). In our notation, the binary time series $y_{t} \in {0,1}, t = 1, \dots, T$ , represents the discrete-time point-process data. We denote by $0 \leq λ_{t} < \infty, t = 1, \dots, T$ the sampled CIF of the point process. Letting $λ_{n} ≔ (λ_{(n - 1) W + 1}, λ_{(n - 1) W + 2}, \dots, λ_{n W})'$ , we begin with a point-process spectrotemporal forward model, which consists of a spectrotemporal parametrization of the CIF as follows:

\log λ_{n} = F_{n} x_{n},

[15]

where $\log λ_{n} ≔ {(\log λ_{(n - 1) W + 1}, \log λ_{(n - 1) W + 2}, \dots, \log λ_{n W})}^{'}$ for $n = 1,2, \dots, N$ . Eq. 15 is a time–frequency representation of the time-varying CIF of a point process: $x_{n}, n = 1,2, \dots, N$ represent the signals modulating each of the oscillations (in $F_{n}, n = 1,2, \dots, N$ ) comprising the CIF.

We estimate ${(x_{n})}_{n = 1}^{N}$ by solving the following MAP estimation problem

max_{x_{1}, \dots, x_{N}} \sum_{n = 1}^{N} {y'}_{n} \log λ_{n} - {1'}_{W} λ_{n} + f_{2} (x_{1}, x_{2}, \dots, x_{N}),

[16]

where $1_{W}$ is the vector of ones in $ℝ^{W}$ , and under a generalized linear model (Eq. 15). The objective function of Eq. 16 trades off the point process log-likelihood (3, 32) with the log-prior in Eq. 7, which enforces sparsity in frequency and smoothness in time. Eq. 16 is a point-process version of spectrotemporal pursuit (Eq. 9). We compute the solution $\hat{x}$ to Eq. 16 as the limit of a sequence ${({\hat{x}}^{(ℓ)})}_{ℓ = 0}^{\infty}$ whose $ℓ^{th}$ element, $ℓ = 1, \dots, \infty$ , is the solution to

max_{x_{1}, \dots, x_{N}} \sum_{n = 1}^{N} y_{n}' F_{n} x_{n} - 1_{W}' e^{F_{n} x_{n}} - \sum_{k = 1}^{K} \sum_{n = 1}^{N} \frac{{(x_{n, k} - x_{n - 1, k})}^{2}}{2 {(Q_{n}^{(ℓ)})}_{k, k}},

[17]

where $Q_{n}^{(ℓ)}$ is given in Eq. 12 and $e^{F_{n} x_{n}}$ is the $ℝ^{W}$ vector with entries $e^{{(F_{n} x_{n})}_{w}}, w = 1,2, \dots, W$ . Each iteration can be implemented efficiently using a point-process smoothing algorithm (18). It is not hard to prove that the conclusions of the theorem in Appendix: Convergence of the IRLS Algorithm, regarding convergence, also hold for the sequence generated using this algorithm. However, as stated previously, we can only guarantee convergence to a stationary point of the objective (20) because $f_{2} (\cdot)$ is not concave.

We used this algorithm to compute a spectral representation of the population rate function of 41 neurons recorded in a patient undergoing intracranial monitoring for surgical treatment of epilepsy (21) using a multichannel microelectrode array implanted in temporal cortex. Recordings were conducted during the administration of propofol for induction of anesthesia. In ref. 21, the authors extensively describe the experimental protocol under which these data were collected and further use the data to elucidate the neurophysiological processes that characterize the transition into loss of consciousness (LOC) during propofol-induced general anesthesia. Fig. 5 depicts the data collected during this experiment as well as the results of our analysis; Fig. 5, Right, show the respective zoomed-in views of Fig. 5, Left, from $t = 140 s$ to $t = 152 s$ . Fig. 5 A and B show, respectively, the LFP activity and a raster plot of the neural spiking activity collected during the experiment. The bolus of propofol is administered at ∼ 0 s. As reported in ref. 21, propofol-induced unconsciousness occurs within seconds of the abrupt onset of a slow ( $\approx 0.5$ Hz) oscillation in the LFP. Moreover, neuronal spiking is strongly correlated to this slow oscillation, occurring only within a limited slow oscillation-phase window and being silent otherwise.

Fig. 5. — Spectral decomposition of the neural spiking activity from a subject undergoing propofol-induced general anesthesia. (A) LFP activity, (B) raster plot of the spike trains from the 41 units that were simultaneously acquired alongside the LFP activity, (C) PSTH and robust estimate of the firing rate obtained using spectrotemporal pursuit, (D) spectral decomposition of the firing rate in the $[0.05, 1]$ -Hz band (color scale in decibels), and (E) $\approx 0.45$ -Hz component of log-firing rate (dark blue) with 95% confidence bounds (shaded region). (*Right*) Respective zoomed-in views from $t = 140 s$ to $t = 152 s$ . The transition into LOC is marked by the onset of a strong $\approx 0.45$ -Hz oscillation.

We demonstrate modulation of the neural spiking activity by the slow oscillation during the transition into LOC under propofol-induced general anesthesia by analysis of the neural spiking activity alone. Fig. 5C shows the firing rate estimates obtained using the standard peristimulus time histogram (PSTH; black) with a bin size of 125 ms and the spectrotemporal pursuit solution (red), respectively. In each 125-ms bin, the PSTH is the total number of spikes across all units divided by the product of the number of units and the bin size. Consistent with the findings of Lewis et al. (21), the firing rate of the neurons reaches a maximum at the troughs of the slow $\approx 0.5$ -Hz oscillation in the LFP. The robust firing rate estimate from spectrotemporal pursuit is much smoother. Fig. 5D shows the novel spectral decomposition of the firing rate of the cortical neurons in the range 0.05 Hz to 1 Hz during propofol-induced general anesthesia. For this analysis, we set $f_{s} = 1$ kHz, $T = 1,000$ s, $W = 125$ , and $N = 8,000$ . In other words, we assume that the spectral decomposition is constant in windows of length 125 ms. For each 125-ms window of data, $F_{n}$ is a $125 \times 50$ matrix obtained using 50 equally spaced values in the range $[0.05, 1]$ Hz. We constructed the observation vector $y_{n} \in ℝ^{125}$ in each window by summing the spikes from all 41 units in each 1-ms bin. To select $α_{2}$ , we split the 41 units randomly into two disjoint sets of 21 and 20 units, respectively, and performed twofold cross-validation on this splitting.

Our analysis reveals that the onset of LOC under propofol-induced general anesthesia in the patient is accompanied by the onset of a strong $\approx 0.45$ -Hz oscillation. Fig. 5E shows the contribution of this $\approx 0.45$ -Hz oscillation to the log of the population firing rate (Eq. 15). We are able to quantify the extent to which this oscillation modulates the firing rate of cortical neurons at a resolution of 125 ms. Before LOC (before ∼ 0 s), our analysis (Fig. 5E) shows that the slow oscillation does not contribute significantly to the firing rate of cortical neurons. However, during the recovery period, the slow oscillation increases the firing rate of cortical neurons by up to a factor of ∼ 3 above its local mean. Fig. 5E further indicates that the $\approx 0.45$ -Hz component of the firing rate estimate from spectrotemporal pursuit is 180 degrees out of phase with the LFP activity. In other words, following LOC, the troughs of the LFP activity coincide with the times at which the contribution of this oscillation to the firing rate of cortical neurons is maximum. We computed the 95% confidence bounds in Fig. 5E, Right, by using the covariance matrices ${(Σ_{n | N})}_{n = 1}^{N}$ from the last iteration of the spectrotemporal pursuit algorithm.

Our analysis demonstrates that spectrotemporal pursuit can help clarify the relationship between neural spiking and LFPs.

Discussion

Robust Spectral Decomposition for Signals with Structured Time–Frequency Representations.

Classical nonparametric spectral estimation methods use sliding windows with overlap to implicitly enforce temporal smoothness of the spectral estimate. This widely adopted approach does not adequately describe signals with highly structured time–frequency representations. We develop a robust nonparametric spectral decomposition paradigm for batch time-series analyses, termed spectrotemporal pursuit, that uses a Bayesian formulation to explicitly enforce smoothness in time and sparsity in frequency of the spectral estimate. Spectrotemporal pursuit yields spectral estimates that are significantly denoised and have significantly higher time and frequency resolution than those obtained using the multitaper method. We illustrated spectrotemporal pursuit on a simulated signal comprising two nearby frequency components with a highly modulated amplitude, as well as on human EEG and on human neural spiking activity both recorded during propofol general anesthesia.

Computation of spectrotemporal pursuit spectral estimates requires solving a high-dimensional optimization problem (Eq. 9 with substitutions from Eqs. 6–8). By using a relation between $ℓ_{1}$ minimization, GSM models, and the EM algorithm, we develop computationally efficient IRLS algorithms to compute the spectral decomposition. In the problems we study, these algorithms yield easy-to-implement Kalman (17) and point-process (18) smoothing algorithms. We also sketch a proof of convergence of the algorithms (Appendix) that uses ideas developed in ref. 20. We can readily characterize the achievable spectral resolution of our procedure because the spectrotemporal pursuit solution applies a time-varying, data-dependent filter bank to the observations (Fig. 3).

Spectrotemporal pursuit offers a principled alternative to existing methods, such as EMD, for decomposing a noisy time-series into a small number of oscillatory components (13); in spectrotemporal pursuit, this is easily handled using the regularization parameters $α_{1}$ and $α_{2}$ , respectively, in the cases of $f_{1} (\cdot)$ and $f_{2} (\cdot)$ , which we estimate from the time-series using cross-validation (30).

Alternate Approaches to Compute Spectral Representations of Signals that Exhibit Dynamic Behavior.

For deterministic signal models, notable contributions are those in refs. 33 and 34 for a signal model consisting of a small number of amplitude-modulated oscillations as in refs. 11 and 16. In the context of stochastic signals, the authors in ref. 35 propose an algorithm to estimate the nonparametric, nonstationary spectrum of a Dahlhaus locally stationary process. Lastly, dynamic models with time-varying sparsity have been proposed in several contexts (36, 37). In principle, these latter models can be applied to time-varying parametric spectral estimation using autoregressive models.

In SI Text, we discuss in greater detail the similarities and differences between these works and spectrotemporal pursuit.

Future Work.

In future reports we plan to extend our current formulation of spectrotemporal pursuit to compute smooth in time and sparse in frequency spectral estimates using filter algorithms. This development will be critical for online computation of robust spectral decompositions using our paradigm. We will also extend our formulation to dictionaries other than Fourier representations. The present formulation of spectrotemporal pursuit requires the user to specify the size W of the window over which the spectrum is constant. In our examples, these choices were informed by the time scales that are believed to be physiologically relevant (15, 21). We are currently developing dynamic versions of the matching pursuit algorithm (38), an efficient algorithm for solving sparsity-enforcing optimization problems, which will obviate the need for specifying W.

In conclusion, computing efficient spectral summaries of time series is a common challenge in many fields of science and engineering. Spectrotemporal pursuit offers a new approach to robust spectral analysis of batch time series which should be applicable to a broad range of problems.

Appendix: Convergence of the IRLS Algorithm

The following theorem states the convergence of the IRLS algorithm of Eq. 10.

Theorem. Let ${\hat{x}}^{(0)} \in ℝ^{K}$ and ${({\hat{x}}^{(ℓ)})}_{ℓ = 1}^{\infty} \in ℝ^{N K}$ be the sequence generated by the IRLS algorithm of Eq. 10 with ${(Q_{n}^{(ℓ)})}_{k, k}$ as in Eq. 11. Then, (i) ${({\hat{x}}^{(ℓ)})}_{ℓ = 0}^{\infty}$ is bounded, (ii) ∃ $\bar{x} = \underset{ℓ \to \infty}{l i m} {\hat{x}}^{(ℓ)}$ , and (iii) ${({\hat{x}}^{(ℓ)})}_{ℓ = 0}^{\infty}$ converges to the unique stationary point of the objective of Eq. 9 with $f (\cdot) = f_{1} (\cdot)$ .

The proof follows the same lines as that of theorem 3 in ref. 20. We only give the main ideas here.

Proof 1 (sketch of proof). Let $f (\cdot) = f_{1} (\cdot)$ in Eq. 9. The concavity of the $\sqrt{\cdot}$ function implies that the objective in Eq. 10 with ${(Q^{(ℓ)})}_{k, k}$ as in Eq. 11 is a lower bound of that in Eq. 9. Moreover, the difference between these two objectives attains a maximum at ${\hat{x}}^{(ℓ - 1)}$ . We can use this to show (20) that Eq. 10 generates a sequence that is nondecreasing when evaluated at the objective of Eq. 9. Along with the fact that the objective of Eq. 9 is strictly concave and coercive (39); this implies that the sequence ${({\hat{x}}^{(ℓ)})}_{ℓ = 0}^{\infty}$ lies in a compact set, and hence is bounded. Therefore, there exists a convergent subsequence with limit point $\bar{x}$ . Take any convergent subsequence, each of its elements satisfies the first-order necessary and sufficient optimality conditions for the objective of Eq. 10, which, in the limit, are equivalent to the first-order necessary and sufficient optimality condition for the unique maximizer of Eq. 9.

Appendix: Robust Spectral Decomposition as a Filter Bank

The following proposition characterizes the equivalent filter banks corresponding to the robust spectral estimator:

Proposition. Let $W = r K$ for some integer r, so that $F_{n} = F_{1}$ , for all n and let F be as defined in Eq. 4. Let $Q^{(\infty)}$ be the element-wise limit point of $Q^{(ℓ)}$ . Moreover, suppose that N is large enough so that $Σ_{N | N}$ converges to its steady-state value denoted by $Σ^{(\infty)}$ and that each iteration of fixed-interval smoother is initialized using the steady-state value of $Σ_{n | n}$ in the previous iteration. Then, the estimate from the robust spectral estimator is given by

lim_{ℓ \to \infty} {\hat{x}}^{(ℓ)} = G F^{H} y,

[18]

where

G ≔ (\begin{matrix} Λ^{0} Γ & Λ^{1} Γ & Λ^{2} Γ & \dots & Λ^{N - 1} Γ \\ Λ^{1} Γ & Λ^{0} Γ & Λ^{1} Γ & \dots & Λ^{N - 2} Γ \\ Λ^{2} Γ & Λ^{1} Γ & Λ^{0} Γ & \dots & Λ^{N - 3} Γ \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ Λ^{N - 1} Γ & Λ^{N - 2} Γ & Λ^{N - 3} Γ & \dots & Λ^{0} Γ \end{matrix}),

[19]

with

Λ ≔ Σ^{(\infty)} {(Σ^{(\infty)} + Q^{(\infty)})}^{- 1},

and

Γ ≔ (Σ^{(\infty)} + Q^{(\infty)}) [I - r W {({(Σ^{(\infty)} + Q^{(\infty)})}^{- 1} + r W I)}^{- 1}] .

Proof 2 (sketch of proof). Consider the fixed interval smoother at the $ℓ^{t h}$ iteration of the robust spectral estimator. By expanding the estimate ${\hat{x}}_{n | N}$ in terms of ${(y_{l})}_{l = 1}^{N}$ , it is not hard to show that

\begin{matrix} {\hat{x}}_{n | N} = \sum_{s = 1}^{n - 1} [\prod_{m = s}^{n - 1} (I - K_{m} F_{m})] K_{s} y_{s} \\ + K_{n} y_{n} \\ + \sum_{s = n + 1}^{N} [\prod_{m = n}^{s} B_{m}] K_{s} y_{s} \end{matrix}

[20]

with $B_{m} = Σ_{m | m} Σ_{m + 1 | m}^{- 1}$ . Also, we have $I - K_{m} F_{m} = Σ_{m | m} Σ_{m | m - 1}^{- 1}$ . In the steady state, we have $Σ_{m | m} = Σ^{(\infty)}$ and $Σ_{m | m - 1} = Σ^{(\infty)} + Q^{(\infty)}$ , for all m. Hence, the expression for ${\hat{x}}_{n | N}$ simplifies to

{\hat{x}}_{n | N} = \sum_{s = 1}^{N} Λ^{| s - n |} Γ F_{s}^{H} y_{s},

[21]

with Λ and Γ as defined in the statement of the proposition. Expressing the above expression in compact form gives the statement of the proposition.

The proposition states that the spectral estimate at window n is a tapered version of the Fourier transform of the data, where the taper at window s is given by $Λ^{| s - n |} Γ$ ; this can be viewed as an exponentially decaying taper in the matrix form, because the eigenvalues of Λ are bounded by 1. To illustrate this observation, assume that $Q^{(\infty)} = q I$ for some positive constant q. Then, it is not hard to verify that $Λ = γ I$ and $Γ = η I$ for some positive constants $0 < γ \leq 1$ and $η > 0$ . Then, the equivalent sliding taper applied to the data are given by the exponential taper $η γ^{| s |}$ .

The rows of $G F^{H}$ form a filter bank whose output is equivalent to that of the robust spectral estimator at windows $n = 1,2, \dots, N$ . As mentioned before, the advantage of the weighting factor G is twofold. First of all, the weighting shapes the Fourier basis by an effectively exponential taper for higher side-lobe suppression. Second, the choice of $Q^{(\infty)}$ from the data determines the gain of each filter; i.e., the filters corresponding to the large (small) elements of $Q^{(\infty)}$ are likely to have a high (low) gain. Therefore, the shaping of the filters is determined by the data itself. Note that given $Q^{(\infty)}$ , we can compute $Σ^{(\infty)}$ by numerically solving a Riccati equation, and form the matrix $G F^{H}$ , the rows of which are the equivalent bank of filters corresponding to the robust spectral estimator at windows $n = 1,2, \dots, N$ .

This characterization of the spectral resolution of the robust spectral estimator, as well as its interpretation as a filter bank, applies to the case of $Q^{(\infty)}$ independent of n (time); this holds for log-prior $f_{1} (\cdot)$ and its associated $Q^{(\infty)}$ , which is the element-wise limit of Eq. 11. The element-wise limit of Eq. 12, which corresponds to log-prior $f_{2} (\cdot)$ , is not independent of n. Eq. 2, however, is quite general and adopts a similar form once the appropriate substitutions are made. We restricted our attention to $Q^{(\infty)}$ independent of n to convey the key ideas.

Supplementary Material

Supplementary File

pnas.201320637SI.pdf^{(77.6KB, pdf)}

Acknowledgments

This work was partially supported by National Institutes of Health Transformative Research Award GM 104948 (to E.N.B.) and by New Innovator Awards DP2-OD006454 (to P.L.P.) and R01-EB006385 (to E.N.B. and P.L.P.).

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1320637111/-/DCSupplemental.

References

1.Quatieri TF. Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall; Englewood Cliffs, NJ: 2008. [Google Scholar]
2.Lim JS. Two-Dimensional Signal and Image Processing. Prentice Hall; Englewood Cliffs, NJ: 1990. [Google Scholar]
3.Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. J Neurophysiol. 2005;93(2):1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]
4.Mitra P, Bokil H. Observed Brain Dynamics. Oxford Univ Press; New York: 2007. [Google Scholar]
5.Emery WJ, Thomson RE. Data Analysis Methods in Physical Oceanography. Elsevier Science; Amsterdam: 2001. [Google Scholar]
6.Haykin SS, Steinhardt AO. Adaptive Radar Detection and Estimation. Vol 11 Wiley-Interscience; New York: 1992. [Google Scholar]
7.Kitagawa G, Gersch W. Smoothness Priors Analysis of Time Series. Vol 116 Springer; Berlin: 1996. [Google Scholar]
8.Thomson DJ. Spectrum estimation and harmonic analysis. Proc IEEE. 1982;70(9):1055–1096. [Google Scholar]
9.Percival DB. Spectral Analysis for Physical Applications. Cambridge Univ Press; Cambridge, UK: 1993. [Google Scholar]
10.Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory. 1990;36(5):961–1005. [Google Scholar]
11.Daubechies I, Lu J, Wu HT. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Appl Comput Harmon Anal. 2011;30(2):243–261. [Google Scholar]
12.Huang NE, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A. 1998;454(1971):903–995. [Google Scholar]
13.Wu Z, Huang NE. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv Adapt Data Anal. 2009;1(1):1–41. [Google Scholar]
14.Cohen L. Time-Frequency Analysis. Vol 778 Prentice Hall; Englewood Cliffs, NJ: 1995. [Google Scholar]
15.Purdon PL, et al. Electroencephalogram signatures of loss and recovery of consciousness from propofol. Proc Natl Acad Sci USA. 2013;110(12):E1142–E1151. doi: 10.1073/pnas.1221180110. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hou TY, Shi Z. Adaptive data analysis via sparse time-frequency representation. Adv Adapt Data Anal. 2011;3(1-2):1–28. [Google Scholar]
17.Rauch H, Striebel C, Tung F. Maximum likelihood estimates of linear dynamic systems. AIAA J. 2012;3(8):1445–1450. [Google Scholar]
18.Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Comput. 2003;15(5):965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]
19.Lange K, Sinsheimer J. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat. 1993;2(2):175–198. [Google Scholar]
20.Ba D, Babadi B, Purdon P, Brown E. Convergence and stability of iteratively re-weighted least squares algorithms. IEEE Trans Signal Process. 2014;62(1):183–195. doi: 10.1109/TSP.2013.2287685. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Lewis LD, et al. Rapid fragmentation of neuronal networks at the onset of propofol-induced unconsciousness. Proc Natl Acad Sci USA. 2012;109(49):E3377–E3386. doi: 10.1073/pnas.1210907109. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Huber P. Robust Statistics. Wiley; New York: 1981. [Google Scholar]
23.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol. 2006;68(1):49–67. [Google Scholar]
24.Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. [Google Scholar]
25.Combettes PL, Pesquet J-C. Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering. 2011. (Springer, New York), pp 185–212.
26.Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1996;20(1):33–61. [Google Scholar]
27.Daubechies I, DeVore R, Fornasier M, Güntürk C. Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math. 2009;63(1):1–38. [Google Scholar]
28.Ba D, Babadi B, Purdon P, Brown E. 2012. Exact and stable recovery of sequences of signals with sparse increments via differential $ℓ_{1}$ -minimization. Advances in Neural Information Processing Systems, NIPS 2012, eds Pereira F, et al. (Neural Information Processing Systems Foundation, La Jolla, CA), Vol 25, pp 2636–2644.
29.Bissantz N, Dümbgen L, Munk A, Stratmann B. Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces. SIAM J Optim. 2009;19(4):1828–1845. [Google Scholar]
30.Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat. 2007;1(2):302–332. [Google Scholar]
31.Siapas AG, Lubenov EV, Wilson MA. Prefrontal phase locking to hippocampal theta oscillations. Neuron. 2005;46(1):141–151. doi: 10.1016/j.neuron.2005.02.028. [DOI] [PubMed] [Google Scholar]
32.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. 2nd Ed. Vol 1 Springer; Berlin: 2003. [Google Scholar]
33.Flandrin P, Borgnat P. Time-frequency energy distributions meet compressed sensing. IEEE Trans Signal Proc. 2010;58(6):2974–2982. [Google Scholar]
34.Wolfe PJ, Godsill SJ, Ng WJ. Bayesian variable selection and regularization for time–frequency surface estimation. J R Stat Soc Series B Stat Methodol. 2004;66(3):575–589. [Google Scholar]
35.Rosen O, Stoffer DS, Wood S. Local spectral analysis via a Bayesian mixture of smoothing splines. J Am Stat Assoc. 2009;104(485):249–262. [Google Scholar]
36.Ahmed A, Xing EP. Recovering time-varying networks of dependencies in social and biological studies. Proc Natl Acad Sci USA. 2009;106(29):11878–11883. doi: 10.1073/pnas.0901910106. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Kalli M, Griffin JE. Time-varying sparsity in dynamic regression models. J Econom. 2014;178(2):779–793. [Google Scholar]
38.Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process. 1993;41(12):3397–3415. [Google Scholar]
39.Bertsekas D. Nonlinear Programming. Athena Scientific; Nashua, NH: 1999. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

pnas.201320637SI.pdf^{(77.6KB, pdf)}

[r1] 1.Quatieri TF. Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall; Englewood Cliffs, NJ: 2008. [Google Scholar]

[r2] 2.Lim JS. Two-Dimensional Signal and Image Processing. Prentice Hall; Englewood Cliffs, NJ: 1990. [Google Scholar]

[r3] 3.Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. J Neurophysiol. 2005;93(2):1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]

[r4] 4.Mitra P, Bokil H. Observed Brain Dynamics. Oxford Univ Press; New York: 2007. [Google Scholar]

[r5] 5.Emery WJ, Thomson RE. Data Analysis Methods in Physical Oceanography. Elsevier Science; Amsterdam: 2001. [Google Scholar]

[r6] 6.Haykin SS, Steinhardt AO. Adaptive Radar Detection and Estimation. Vol 11 Wiley-Interscience; New York: 1992. [Google Scholar]

[r7] 7.Kitagawa G, Gersch W. Smoothness Priors Analysis of Time Series. Vol 116 Springer; Berlin: 1996. [Google Scholar]

[r8] 8.Thomson DJ. Spectrum estimation and harmonic analysis. Proc IEEE. 1982;70(9):1055–1096. [Google Scholar]

[r9] 9.Percival DB. Spectral Analysis for Physical Applications. Cambridge Univ Press; Cambridge, UK: 1993. [Google Scholar]

[r10] 10.Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory. 1990;36(5):961–1005. [Google Scholar]

[r11] 11.Daubechies I, Lu J, Wu HT. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Appl Comput Harmon Anal. 2011;30(2):243–261. [Google Scholar]

[r12] 12.Huang NE, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc Lond A. 1998;454(1971):903–995. [Google Scholar]

[r13] 13.Wu Z, Huang NE. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv Adapt Data Anal. 2009;1(1):1–41. [Google Scholar]

[r14] 14.Cohen L. Time-Frequency Analysis. Vol 778 Prentice Hall; Englewood Cliffs, NJ: 1995. [Google Scholar]

[r15] 15.Purdon PL, et al. Electroencephalogram signatures of loss and recovery of consciousness from propofol. Proc Natl Acad Sci USA. 2013;110(12):E1142–E1151. doi: 10.1073/pnas.1221180110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Hou TY, Shi Z. Adaptive data analysis via sparse time-frequency representation. Adv Adapt Data Anal. 2011;3(1-2):1–28. [Google Scholar]

[r17] 17.Rauch H, Striebel C, Tung F. Maximum likelihood estimates of linear dynamic systems. AIAA J. 2012;3(8):1445–1450. [Google Scholar]

[r18] 18.Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Comput. 2003;15(5):965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]

[r19] 19.Lange K, Sinsheimer J. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat. 1993;2(2):175–198. [Google Scholar]

[r20] 20.Ba D, Babadi B, Purdon P, Brown E. Convergence and stability of iteratively re-weighted least squares algorithms. IEEE Trans Signal Process. 2014;62(1):183–195. doi: 10.1109/TSP.2013.2287685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Lewis LD, et al. Rapid fragmentation of neuronal networks at the onset of propofol-induced unconsciousness. Proc Natl Acad Sci USA. 2012;109(49):E3377–E3386. doi: 10.1073/pnas.1210907109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Huber P. Robust Statistics. Wiley; New York: 1981. [Google Scholar]

[r23] 23.Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol. 2006;68(1):49–67. [Google Scholar]

[r24] 24.Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2009;2(1):183–202. [Google Scholar]

[r25] 25.Combettes PL, Pesquet J-C. Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering. 2011. (Springer, New York), pp 185–212.

[r26] 26.Chen SS, Donoho DL, Saunders MA. Atomic decomposition by basis pursuit. SIAM J Sci Comput. 1996;20(1):33–61. [Google Scholar]

[r27] 27.Daubechies I, DeVore R, Fornasier M, Güntürk C. Iteratively reweighted least squares minimization for sparse recovery. Commun Pure Appl Math. 2009;63(1):1–38. [Google Scholar]

[r28] 28.Ba D, Babadi B, Purdon P, Brown E. 2012. Exact and stable recovery of sequences of signals with sparse increments via differential $ℓ_{1}$ -minimization. Advances in Neural Information Processing Systems, NIPS 2012, eds Pereira F, et al. (Neural Information Processing Systems Foundation, La Jolla, CA), Vol 25, pp 2636–2644.

[r29] 29.Bissantz N, Dümbgen L, Munk A, Stratmann B. Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces. SIAM J Optim. 2009;19(4):1828–1845. [Google Scholar]

[r30] 30.Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat. 2007;1(2):302–332. [Google Scholar]

[r31] 31.Siapas AG, Lubenov EV, Wilson MA. Prefrontal phase locking to hippocampal theta oscillations. Neuron. 2005;46(1):141–151. doi: 10.1016/j.neuron.2005.02.028. [DOI] [PubMed] [Google Scholar]

[r32] 32.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. 2nd Ed. Vol 1 Springer; Berlin: 2003. [Google Scholar]

[r33] 33.Flandrin P, Borgnat P. Time-frequency energy distributions meet compressed sensing. IEEE Trans Signal Proc. 2010;58(6):2974–2982. [Google Scholar]

[r34] 34.Wolfe PJ, Godsill SJ, Ng WJ. Bayesian variable selection and regularization for time–frequency surface estimation. J R Stat Soc Series B Stat Methodol. 2004;66(3):575–589. [Google Scholar]

[r35] 35.Rosen O, Stoffer DS, Wood S. Local spectral analysis via a Bayesian mixture of smoothing splines. J Am Stat Assoc. 2009;104(485):249–262. [Google Scholar]

[r36] 36.Ahmed A, Xing EP. Recovering time-varying networks of dependencies in social and biological studies. Proc Natl Acad Sci USA. 2009;106(29):11878–11883. doi: 10.1073/pnas.0901910106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r37] 37.Kalli M, Griffin JE. Time-varying sparsity in dynamic regression models. J Econom. 2014;178(2):779–793. [Google Scholar]

[r38] 38.Mallat SG, Zhang Z. Matching pursuits with time-frequency dictionaries. IEEE Trans Signal Process. 1993;41(12):3397–3415. [Google Scholar]

[r39] 39.Bertsekas D. Nonlinear Programming. Athena Scientific; Nashua, NH: 1999. [Google Scholar]

PERMALINK

Robust spectrotemporal decomposition by iteratively reweighted least squares

Demba Ba

Behtash Babadi

Patrick L Purdon

Emery N Brown

Series information

Significance

Abstract

Toy Example

Fig. 1.

Theory: Robust Spectral Decomposition

The State-Space Model.

The Inverse Solution: Spectrotemporal Pursuit.

Link of Spectrotemporal Pursuit to Basis Pursuit Denoising, IRLS Algorithms, and Sparse Recovery.

An IRLS Algorithm for Spectrotemporal Pursuit.

Analysis of Spectral Resolution.

Toy Example Revisited

Fig. 2.

Fig. 3.

Applications of Spectrotemporal Pursuit to Neural Signal Processing

A Spectrotemporal Pursuit Analysis of EEG Recordings.

Fig. 4.

A Spectrotemporal Pursuit Analysis of Neural Spiking Activity.

Fig. 5.

Discussion

Robust Spectral Decomposition for Signals with Structured Time–Frequency Representations.

Alternate Approaches to Compute Spectral Representations of Signals that Exhibit Dynamic Behavior.

Future Work.

Appendix: Convergence of the IRLS Algorithm

Appendix: Robust Spectral Decomposition as a Filter Bank

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases