Gaussian process based nonlinear latent structure discovery in multivariate spike train data

Anqi Wu; Nicholas A Roy; Stephen Keeley; Jonathan W Pillow

. Author manuscript; available in PMC: 2019 Jun 26.

Published in final edited form as: Adv Neural Inf Process Syst. 2017 Dec;30:3496–3505.

Gaussian process based nonlinear latent structure discovery in multivariate spike train data

Anqi Wu ¹, Nicholas A Roy ¹, Stephen Keeley ¹, Jonathan W Pillow ¹

PMCID: PMC6594561 NIHMSID: NIHMS1033063 PMID: 31244512

Abstract

A large body of recent work focuses on methods for extracting low-dimensional latent structure from multi-neuron spike train data. Most such methods employ either linear latent dynamics or linear mappings from latent space to log spike rates. Here we propose a doubly nonlinear latent variable model that can identify low-dimensional structure underlying apparently high-dimensional spike train data. We introduce the Poisson Gaussian-Process Latent Variable Model (P-GPLVM), which consists of Poisson spiking observations and two underlying Gaussian processes—one governing a temporal latent variable and another governing a set of nonlinear tuning curves. The use of nonlinear tuning curves enables discovery of low-dimensional latent structure even when spike responses exhibit high linear dimensionality (e.g., as found in hippocampal place cell codes). To learn the model from data, we introduce the decoupled Laplace approximation, a fast approximate inference method that allows us to efficiently optimize the latent path while marginalizing over tuning curves. We show that this method outperforms previous Laplace-approximation-based inference methods in both the speed of convergence and accuracy. We apply the model to spike trains recorded from hippocampal place cells and show that it compares favorably to a variety of previous methods for latent structure discovery, including variational auto-encoder (VAE) based methods that parametrize the nonlinear mapping from latent space to spike rates with a deep neural network.

1. Introduction

Recent advances in multi-electrode array recording techniques have made it possible to measure the simultaneous spiking activity of increasingly large neural populations. These datasets have highlighted the need for robust statistical methods for identifying the latent structure underlying high-dimensional spike train data, so as to provide insight into the dynamics governing large-scale activity patterns and the computations they perform [1-4].

Recent work has focused on the development of sophisticated model-based methods that seek to extract a shared, low-dimensional latent process underlying population spiking activity. These methods can be roughly categorized on the basis of two basic modeling choices: (1) the dynamics of the underlying latent variable; and (2) the mapping from latent variable to neural responses. For choice of dynamics, one popular approach assumes the latent variable is governed by a linear dynamical system [5-11], while a second assumes that it evolves according to a Gaussian process, relaxing the linearity assumption and imposing only smoothness in the evolution of the latent state [1, 12-14]. For choice of mapping function, most previous methods have assumed a fixed linear or log-linear relationship between the latent variable and the mean response level [1, 5-8, 11, 12]. These methods seek to find a linear embedding of population spiking activity, akin to PCA or factor analysis. In many cases, however, the relationship between neural activity and the quantity it encodes can be highly nonlinear. Hippocampal place cells provide an illustrative example: if each discrete location in a 2D environment has a single active place cell, population activity spans a space whose dimensionality is equal to the number of neurons; a linear latent variable model cannot find a reduced-dimensional representation of population activity, despite the fact that the underlying latent variable (“position”) is clearly two-dimensional.

Several recent studies have introduced nonlinear coupling between latent dynamics and firing rate [7, 9, 10, 15]. These models use deep neural networks to parametrize the nonlinear mapping from latent space to spike rates, but often require repeated trials or long training sets. Table 1 summarizes these different model structures for latent neural trajectory estimation (including the original Gaussian process latent variable model (GPLVM) [16], which assumes Gaussian observations and does not produce spikes).

Table 1:

Modeling assumptions of various latent variable models for spike trains.

model	latent	mapping function	output nonlinearity	observation
PLDS [8]	LDS	linear	exp	Poisson
PfLDS [9, 10]	LDS	neural net	exp	Poisson
LFADS [15]	RNN	neural net	exp	Poisson
GPFA [1]	GP	linear	identity	Gaussian
P-GPFA [13, 14]	GP	linear	exp	Poisson
GPLVM [16]	GP	GP	identity	Gaussian
P-GPLVM	GP	GP	exp	Poisson

Open in a new tab

In this paper, we propose the Poisson Gaussian process latent variable model (P-GPLVM) for spike train data, which allows for nonlinearity in both the latent state dynamics and in the mapping from the latent states to the spike rates. Our model posits a low-dimensional latent variable that evolves in time according to a Gaussian process prior; this latent variable governs firing rates via a set of non-parametric tuning curves, parametrized as exponentiated samples from a second Gaussian process, from which spikes are then generated by a Poisson process (Fig. 1).

Figure 1: — Schematic diagram of the Poisson Gaussian Process Latent Variable Model (P-GPLVM), illustrating multi-neuron spike train data generated by the model with a one-dimensional latent process.

The paper is organized as follows: Section 2 introduces the P-GPLVM; Section 3 describes the decoupled Laplace approximation for performing efficient inference for the latent variable and tuning curves; Section 4 describes tuning curve estimation; Section 5 compares P-GPLVM to other models using simulated data and hippocampal place-cell recordings, demonstrating the accuracy and interpretability of P-GPLVM relative to other methods.

2. Poisson-Gaussian process latent variable model (P-GPLVM)

Suppose we have simultaneously recorded spike trains from N neurons. Let $Y \in R^{N \times T}$ denote the matrix of spike count data, with neurons indexed by i ∈ (1,…, N) and spikes counted in discrete time bins indexed by t ∈ (1,…, T). Our goal is to construct a generative model of the latent structure underlying these data, which will here take the form of a P-dimensional latent variable x(t) and a set of mapping functions or tuning curves {h_i(x)}, i ∈ (1,…, N) which map the latent variable to the spike rates of each neuron.

Latent dynamics

Let x(t) denote a (vector-valued) latent process, where each component x_j(t), j ∈ (1,…, P), evolves according to an independent Gaussian process (GP),

x_{j} (t) ~ GP (0, k_{t}),

(1)

with covariance function k_t(t, t′) ≜ cov(x_j(t), x_j(t′)) governing how each scalar process varies over time. Although we can select any valid covariance function for k_t, here we use the exponential covariance function, a special case of the Matérn kernel, given by k(t, t′) = r exp (−∣t − t′∣/l), which is parametrized by a marginal variance r > 0 and length-scale l > 0. Samples from this GP are continuous but not differentiable, equivalent to a Gaussian random walk with a bias toward the origin, also known as the Ornstein-Uhlenbeck process [17].

The latent state x(t) at any time t is a P-dimensional vector that we will write as $x_{t} \in R^{P \times 1}$ . The collection of such vectors over T time bins forms a matrix $X \in R^{P \times T}$ . Let x_j denote the jth row of X, which contains the set of states in latent dimension j. From the definition of a GP, x_j has a multivariate normal distribution,

x_{j} ~ N (0, K_{t})

(2)

with a T × T covariance matrix K_t generated by evaluating the covariance function k_t at all time bins in (1,…, T).

Nonlinear mapping

Let $h : R^{P} \to R$ denote a nonlinear function mapping from the latent vector x_t to a firing rate λ_t. We will refer to h(x) as a tuning curve, although unlike traditional tuning curves, which describe firing rate as a function of some externally (observable) stimulus parameter, here h(x) describes firing rate as a function of the (unobserved) latent vector x. Previous work has modeled h with a parametric nonlinear function such as a deep neural network [9,10]. Here we develop a nonparametric approach using a Gaussian process prior over the log of h. The logarithm assures that spike rates are non-negative.

Let f_i(x) = log h_i(x) denote the log tuning curve for the i’th neuron in our population, which we model with a GP,

f_{i} (x) ~ GP (0, k_{x}),

(3)

where k_x is a (spatial) covariance function that governs smoothness of the function over its P-dimensional input space. For simplicity, we use the common Gaussian or radial basis function (RBF) covariance function: $k_{x} (x, x^{'}) = ρ exp (- {‖ x - x^{'} ‖}_{2}^{2} ∕ 2 δ^{2})$ , where x and x′ are arbitrary points in latent space, ρ is the marginal variance and δ is the length scale. The tuning curve for neuron i is then given by h_i(x) = exp(f_i(x)).

Let $f_{t} \in R^{T \times 1}$ denote a vector with the t’th element equal to f_i(x_t). From the definition of a GP, f_i has a multivariate normal distribution given latent vectors at all time bins $x_{1 : T} = {x_{t}}_{t = 1}^{T}$ ,

f_{i} ∣ x_{1 : T} ~ N (0, K_{x})

(4)

with a T × T covariance matrix K_x generated by evaluating the covariance function k_x at all pairs of latent vectors in x_1:T. Stacking f_i for N neurons, we will formulate a matrix $F \in R^{N \times T}$ with $f_{i}^{⊺}$ on the i’th row. The element on the i’th row and the t’th column is f_{i, t} = f_i(x_t).

Poisson spiking

Lastly, we assume Poisson spiking given the latent firing rates. We assume that spike rates are in units of spikes per time bin. Let λ_{i, t} = exp(f_{i, t}) = exp(f_i(x_t)) denote the spike rate of neuron i at time t. The spike-count of neuron i at t given the log tuning curve f_i and latent vector x_t is Poisson distributed as

y_{i, t} ∣ f_{i}, x_{t} ~ Poiss (exp (f_{i} (x_{t}))) .

(5)

In summary, our model is as a doubly nonlinear Gaussian process latent variable model with Poisson observations (P-GPLVM). One GP is used to model the nonlinear evolution of the latent dynamic x, while a second GP is used to generate the log of the tuning curve f as a nonlinear function of x, which is then mapped to a tuning curve h via a nonlinear link function, e.g. exponential function. Fig. 1 provides a schematic of the model.

3. Inference using the decoupled Laplace approximation

For our inference procedure, we estimate the log of the tuning curve, f, as opposed to attempting to infer the tuning curve h directly. Once f is estimated, h can be obtained by exponentiating f. Given the model outlined above, the joint distribution over the observed data and all latent variables is written as,

p (Y, F, X, θ) = p (Y ∣ F) p (F ∣ X, ρ, δ) p (X ∣ r, l) = \prod_{i = 1}^{N} \prod_{t = 1}^{T} p (y_{i, t} ∣ f_{i, t}) \prod_{i = 1}^{N} p (f_{i} ∣ X, ρ, δ) \prod_{j = 1}^{P} p (x_{j} ∣ r, l),

(6)

where θ = {ρ, δ, r, l} is the hyperparameter set, references to which will now be suppressed for simplification. This is a Gaussian process latent variable model (GPLVM) with Poisson observations and a GP prior, and our goal is to now estimate both F and X. A standard Bayesian treatment of the GPLVM requires the computation of the log marginal likelihood associated with the joint distribution (Eq.6). Both F and X must be marginalized out,

\log p (Y) = \log \int \int p (Y, F, X) d X d F = \log \int p (Y ∣ F) \int p (F ∣ X) p (X) d X d F .

(7)

However, propagating the prior density p(X) through the nonlinear mapping makes this inference difficult. The nested integral in (Eq. 7) contains X in a complex nonlinear manner, making analytical integration over X infeasible. To overcome these difficulties, we can use a straightforward MAP training procedure where the latent variables F and X are selected according to

F_{MAP}, X_{MAP} = {argmax}_{F, X} p (Y ∣ F) p (F ∣ X) p (X) .

(8)

Note that point estimates of the hyperparameters θ can also be found by maximizing the same objective function. As discussed above, learning X remains a challenge due to the interplay of the latent variables, i.e. the dependency of F on X. For our MAP training procedure, fixing one latent variable while optimizing for the other in a coordinate descent approach is highly inefficient since the strong interplay of variables often means getting trapped in bad local optima. In variational GPLVM [18], the authors introduced a non-standard variational inference framework for approximately integrating out the latent variables X then subsequently training a GPLVM by maximizing an analytic lower bound on the exact marginal likelihood. An advantage of the variational framework is the introduction of auxiliary variables which weaken the strong dependency between X and F. However, the variational approximation is only applicable to Gaussian observations; with Poisson observations, the integral over F remains intractable. In the following, we will propose using variations of the Laplace approximation for inference.

3.1. Standard Laplace approximation

We first use Laplace’s method to find a Gaussian approximation q(F∣Y, X) to the true posterior p(F∣Y, X), then do MAP estimation for X only. We employ the Laplace approximation for each f_i individually. Doing a second order Taylor expansion of log p(f_i∣y_i, X) around the maximum of the posterior, we obtain a Gaussian approximation

q (f_{i} ∣ y_{i}, X) = N ({\hat{f}}_{i}, A^{- 1}),

(9)

where ${\hat{f}}_{i} = {argmax}_{f_{i}} p (f_{i} ∣ y_{i}, X)$ and $A = - \nabla \nabla \log p (f_{i} ∣ y_{i}, X) ∣_{f_{i} = {\hat{f}}_{i}}$ is the Hessian of the negative log posterior at that point. By Bayes’ rule, the posterior over f_i is given by p(f_i∣y_i, X) = p(y_i∣f_i)p(f_i∣X)/p(y_i∣X), but since p(y_i∣X) is independent of f_i, we need only consider the unnormalized posterior, defined as Ψ(f_i), when maximizing w.r.t. f_i. Taking the logarithm gives

Ψ (f_{i}) = \log p (y_{i} ∣ f_{i}) + \log p (f_{i} ∣ X) = \log p (y_{i} ∣ f_{i}) - \frac{1}{2} f_{i}^{⊺} K_{x}^{- 1} f_{i} - \frac{1}{2} \log ∣ K_{x} ∣ + const .

(10)

Differentiating (Eq. 10) w.r.t. f_i we obtain

\nabla Ψ (f_{i}) = \nabla \log p (y_{i} ∣ f_{i}) - K_{x}^{- 1} f_{i}

(11)

\nabla \nabla Ψ (f_{i}) = \nabla \nabla \log p (y_{i} ∣ f_{i}) - K_{x}^{- 1} = - W_{i} - K_{x}^{- 1},

(12)

where W_i = ∔∇∇ logp(y_i∣f_i). The approximated log conditional likelihood on X (see Sec. 3.4.4 in [17]) can then be written as

\log q (y_{i} ∣ X) = \log p (y_{i} ∣ {\hat{f}}_{i}) - \frac{1}{2} {\hat{f}}_{i}^{⊺} K_{x}^{- 1} {\hat{f}}_{i} - \frac{1}{2} \log ∣ I_{T} + K_{x} W_{i} ∣ .

(13)

We can then estimate X as

X_{MAP} = {argmax}_{X} \sum_{i = 1}^{N} q (y_{i} ∣ X) p (X) .

(14)

When using standard LA, the gradient of log q(y_i∣X) w.r.t. X should be calculated for a given posterior mode ${\hat{f}}_{i}$ . Note that not only is the covariance matrix K_x an explicit function of X, but also ${\hat{f}}_{i}$ and W_i are also implicitly functions of X—when X changes, the optimum of the posterior ${\hat{f}}_{i}$ changes as well. Therefore, log q(y_i∣X) contains an implicit function of X which does not allow for a straightforward closed-form gradient expression. Calculating numerical gradients instead yields a very inefficient implementation empirically.

3.2. Third-derivative Laplace approximation

One method to derive this gradient explicitly is described in [17] (see Sec. 5.5.1). We adapt their procedure to our setting to make the implicit dependency of ${\hat{f}}_{i}$ and W_i on X explicit. To solve (Eq. 14), we need to determine the partial derivative of our approximated log conditional likelihood (Eq. 13) w.r.t. X, given as

{\frac{\partial \log q (y_{i} ∣ X)}{\partial X} = \frac{\partial \log q (y_{i} ∣ X)}{\partial X} ∣}_{explicit} + \sum_{t = 1}^{T} \frac{\partial \log q (y_{i} ∣ X)}{\partial {\hat{f}}_{i, t}} \frac{\partial {\hat{f}}_{i, t}}{\partial X}

(15)

by the chain rule. When evaluating the second term, we use the fact that ${\hat{f}}_{i}$ is the posterior maximum, so ∂Ψ(f_i)/∂f_i = 0 at $f_{i} = {\hat{f}}_{i}$ , where Ψ(f_i) is defined in (Eq. 11). Thus the implicit derivatives of the first two terms in (Eq. 13) vanish, leaving only

\frac{\partial \log q (y_{i} ∣ X)}{\partial {\hat{f}}_{i, t}} = - \frac{1}{2} tr ({(K_{x}^{- 1} + W_{i})}^{- 1} \frac{\partial W_{i}}{\partial {\hat{f}}_{i, t}}) = - \frac{1}{2} {[{(K_{x}^{- 1} + W_{i})}^{- 1}]}_{t t} \frac{\partial^{3}}{\partial {\hat{f}}_{i, t}^{3}} \log p (y_{i} ∣ {\hat{f}}_{i}) .

(16)

To evaluate $\partial {\hat{f}}_{i, t} ∕ \partial X$ , we differentiate the self-consistent equation ${\hat{f}}_{i} = K_{x} \nabla \log p (y_{i} ∣ {\hat{f}}_{i})$ (setting (Eq. 11) to be 0 at ${\hat{f}}_{i}$ ) to obtain

\frac{\partial {\hat{f}}_{i}}{\partial X} = \frac{\partial K_{x}}{\partial X} \nabla \log p (y_{i} ∣ {\hat{f}}_{i}) + K_{x} \frac{\nabla \log p (y_{i} ∣ {\hat{f}}_{i})}{\partial {\hat{f}}_{i}} \frac{\partial {\hat{f}}_{i}}{\partial X} = {(I_{T} + K_{x} W_{i})}^{- 1} \frac{\partial K_{x}}{\partial X} \nabla \log p (y_{i} ∣ {\hat{f}}_{i}),

(17)

where we use the chain rule $\frac{\partial}{\partial X} = \frac{\partial {\hat{f}}_{i}}{\partial X} \cdot \frac{\partial}{\partial {\hat{f}}_{i}}$ and $\partial \nabla \log p (y_{i} ∣ {\hat{f}}_{i}) ∕ \partial {\hat{f}}_{i} = - W_{i}$ from (Eq. 12). The desired implicit derivative is obtained by multiplying (Eq. 16) and (Eq. 17) to formulate the second term in (Eq. 15).

We can now estimate X_MAP with (Eq. 14) using the explicit gradient expression in (Eq. 15). We call this method third-derivative Laplace approximation (tLA), as it depends on the third derivative of the data likelihood term (see [17] for further details). However, there is a big computational drawback with tLA: for each step along the gradient we have just derived, the posterior mode ${\hat{f}}_{i}$ must be reevaluated. This method might lead to a fast convergence theoretically, but this nested optimization makes for a very slow computation empirically.

3.3. Decoupled Laplace approximation

We propose a novel method to relax the Laplace approximation, which we refer to as the decoupled Laplace approximation (dLA). Our relaxation not only decouples the strong dependency between X and F, but also avoids the nested optimization of searching for the posterior mode of F within each update of X. As in tLA, dLA also assumes ${\hat{f}}_{i}$ to be a function of X. However, while tLA assumes ${\hat{f}}_{i}$ to be an implicit function of X, dLA constructs an explicit mapping between ${\hat{f}}_{i}$ and X.

The standard Laplace approximation uses a Gaussian approximation for the posterior p(f_i∣y_i, X) ∝ p(y_i∣f_i)p(f_i∣X) where, in this paper, p(y_i∣f_i) is a Poisson distribution and p(f_i∣X) is a multivariate Gaussian distribution. We first do the same second order Taylor expansion of log p(f_i∣y_i, X) around the posterior maximum to find q(f_i∣y_i, X) as in (Eq. 9). Now if we approximate the likelihood distribution p(y_i∣f_i) as a Gaussian distribution $q (y_{i} ∣ f_{i}) = N (m, S)$ , we can derive its mean m and covariance S. If $p (f_{i} ∣ X) = N (0, K_{x})$ and $q (f_{i} ∣ y_{i}, X) = N ({\hat{f}}_{i}, A^{- 1})$ , the relationship between two Gaussian distributions and their product allow us to solve for m and S from the relationship $N ({\hat{f}}_{i}, A^{- 1}) \propto N (m, S) N (0, K_{x})$ : 3.3.

A = S^{- 1} + K_{x}^{- 1}, {\hat{f}}_{i} = A^{- 1} S^{- 1} m \Rightarrow S = {(A - K_{x}^{- 1})}^{- 1}, m = S A {\hat{f}}_{i} .

(18)

m and S represent the components of the posterior terms, ${\hat{f}}_{i}$ and A, that come from the likelihood. Now when estimating X, we fix these likelihood terms m and S, and completely relax the prior, p(f_i∣X). We are still solving (Eq. 14) w.r.t. X, but now q(f_i∣y_i, X) has both mean and covariance approximated as explicit functions of X. Alg. 1 describes iteration k of the dLA algorithm, with which we can now estimate X_MAP. Step 3 indicates that the posterior maximum for the current iteration ${\hat{f}}_{i} (X) = A {(X)}^{- 1} A^{k} {\hat{f}}_{i}^{k}$ is now explicitly updated as a function of X, avoiding the computationally demanding nested optimization of tLA. Intuitively, dLA works by finding a Gaussian approximation to the likelihood at ${\hat{f}}_{i}^{k}$ such that the approximated posterior of f_i, q(f_i∣y_i, X), is now a closed-form Gaussian distribution with mean and covariance as functions of X, ultimately allowing for the explicit calculation of q(y_i∣X).

4. Tuning curve estimation

Given the estimated $\hat{X}$ and $\hat{f}$ from the inference, we can now calculate the tuning curve h for each neuron. Let $x_{1 : G} = {x_{g}}_{g = 1}^{G}$ be a grid of G latent states, where $x_{g} \in R^{P \times 1}$ . Correspondingly, for each neuron, we have the log of the tuning curve vector evaluated on the grid of latent states, $f_{grid} \in R^{G \times 1}$ , with the g’th element equal to f(x_g). Similar to (Eq. 4), we can write down its distribution as

f_{grid} ∣ x_{1 : G} ~ N (0, K_{grid})

(19)

with a G × G covariance matrix K_grid generated by evaluating the covariance function k_x at all pairs of vectors in x_1:G. Therefore we can write a joint distribution for $[\hat{f}, f_{grid}]$ as

[\begin{matrix} \hat{f} \\ f_{grid} \end{matrix}] ~ N (0, [\begin{matrix} K_{\hat{x}} & k_{grid} \\ k_{grid}^{⊺} & K_{grid} \end{matrix}]) .

(20)

$K_{\hat{x}} \in R^{T \times T}$ is a covariance matrix with elements evaluated at all pairs of estimated latent vectors ${\hat{x}}_{1 : T} = {{\hat{x}}_{t}}_{t = 1}^{T}$ in $\hat{X}$ , and $k_{{grid}_{t, g}} = k_{x} ({\hat{x}}_{t}, x_{g})$ . Thus we have the following posterior distribution over f_grid:

f_{grid} ∣ \hat{f}, {\hat{x}}_{1 : T}, x_{1 : G} ~ N (μ (x_{1 : G}), Σ (x_{1 : G})) μ (x_{1 : G}) = k_{grid}^{⊺} K_{\hat{x}}^{- 1} \hat{f}, Σ (x_{1 : G}) = diag (K_{grid}) - k_{grid}^{⊺} K_{\hat{x}}^{- 1} k_{grid}

(21)

where diag(K_grid) denotes a diagonal matrix constructed from the diagonal of K_grid. Setting ${\hat{f}}_{grid} = μ (x_{1 : G})$ , the spike rate vector

{\hat{λ}}_{grid} = exp ({\hat{f}}_{grid})

(22)

describes the tuning curve h evaluated on the grid x_1:G.

5. Experiments

5.1. Simulation data

We first examine performance using two simulated datasets generated with different kinds of tuning curves, namely sinusoids and Gaussian bumps. We will compare our algorithm (P-GPLVM) with PLDS, PfLDS, P-GPFA and GPLVM (see Table 1), using the tLA and dLA inference methods. We also include an additional variant on the Laplace approximation, which we call the approximated Laplace approximation (aLA), where we use only the explicit (first) term in (Eq. 15) to optimize over X for multiple steps given a fixed ${\hat{f}}_{i}$ . This allows for a coarse estimation for the gradient w.r.t. X for a few steps in X before estimation is necessary, partially relaxing the nested optimization so as to speed up the learning procedure.

For comparison between models in our simulated experiments, we compute the R-squared (R²) values from the known latent processes and the estimated latent processes. In all simulation studies, we generate 1 single trial per neuron with 20 simulated neurons and 100 time bins for a single experiment. Each experiment is repeated 10 times and results are averaged across 10 repeats.

Sinusoid tuning curve:

This simulation generates a “grid cell” type response. A grid cell is a type of neuron that is activated when an animal occupies any point on a grid spanning the environment [19]. When an animal moves in a one-dimensional space (P = 1), grid cells exhibit oscillatory responses. Motivated by the response properties of grid cells, the log firing rate of each neuron i is coupled to the latent process through a sinusoid with a neuron-specific phase Φ_i and frequency ω_i,

f_{i} = \sin (ω_{i} x + Φ_{i}) .

(23)

We randomly generated Φ_i uniformly from the region [0,2π] and ω_i uniformly from [1.0,4.0].

An example of the estimated latent processes versus the true latent process is presented in Fig. 2A. We used least-square regression to learn an affine transformation from the latent space to the space of the true locations. Only P-GPLVM finds the global optimum by fitting the valley around t = 70. Fig. 2B displays the true tuning curves and the estimated tuning curves for neuron 4, 10, & 9 with PLDS, PfLDS, P-GPFA and P-GPLVM-dLA. For PLDS, PfLDS and P-GPFA, we replace the estimated $\hat{f}$ with the observed spike count y in (Eq. 21), and treat the posterior mean as the tuning curve on a grid of latent representations. For P-GPLVM, the tuning curve is estimated via (Eq. 22). The R² performance is shown in the first column of Fig. 2E.

Figure 2: — Results from the sinusoid and Gaussian bump simulated experiments. A) and C) are estimated latent processes. B) and D) display the tuning curves estimated by different methods. E) shows the R² performances with error bars. F) shows the convergence R² performances of three different Laplace approximation inference methods with error bars. Error bars are plotted every 10 seconds.

Deterministic Gaussian bump tuning curve:

For this simulation, each neuron’s tuning curve is modeled as a unimodal Gaussian bump in a 2D space such that the log of the tuning curve, f, is a deterministic Gaussian function of x. Fig. 2C shows an example of the estimated latent processes. PLDS fits an overly smooth curve, while P-GPLVM can find the small wiggles that are missed by other methods. Fig. 2D displays the 2D tuning curves for neuron 1, 4, & 12 estimated by PLDS, PfLDS, P-GPFA and P-GPLVM-dLA. The R² performance is shown in the second column of Fig. 2E.

Overall, P-GPFA has a quite unstable performance due to the ARD kernel function in the GP prior, potentially encouraging a bias for smoothness even when the underlying latent process is actually quite non-smooth. PfLDS performs better than PLDS in the second case, but when the true latent process is highly nonlinear (sinusoid) and the single-trial dataset is small, PfLDS losses its advantage to stochastic optimization. GPLVM has a reasonably good performance with the nonlinearities, but is worse than P-GPLVM which demonstrates the significance of using the Poisson observation model. For P-GPLVM, the dLA inference algorithm performs best overall w.r.t. both convergence speed and R² (Fig. 2F).

5.2. Application to rat hippocampal neuron data

Next, we apply the proposed methods to extracellular recordings from the rodent hippocampus. Neurons were recorded bilaterally from the pyramidal layer of CA3 and CA1 in two rats as they performed a spatial alternation task on a W-shaped maze [20]. We confine our analyses to simultaneously recorded putative place cells during times of active navigation. Total number of simultaneously recorded neurons ranged from 7-19 for rat 1 and 24-38 for rat 2. Individual trials of 50 seconds were isolated from 15 minute recordings, and binned at a resolution of 100ms.

We used this hippocampal data to identify a 2D latent space using PLDS, PfLDS, P-GPFA, GPLVM and P-GPLVMs (Fig. 3), and compared these to the true 2D location of the rodent. For visualization purposes, we linearized the coordinates along the arms of the maze to obtain 1D representations. Fig. 3A & B present two segments of 1s recordings for the two animals. The P-GPLVM results are smoother and recover short time-scale variations that PLDS ignores. The average R² performance for all methods for each rodent is shown in Fig. 3C & D where P-GPLVM-dLA consistently performs the best.

Figure 3: — Results from the hippocampal data of two rats. A) and B) are estimated latent processes during a 1s recording period for two rats. C) and D) show R² and PLL performance with error bars. E) and F) display the true tuning curves and the tuning curves estimated by P-GPLVM-dLA.

We also assessed the model fitting quality by doing prediction on a held-out dataset. We split all the time bins in each trial into training time bins (the first 90% time bins) and held-out time bins (the last 10% time bins). We first estimated the parameters for the mapping function or the tuning curve in each model using spike trains from all the neurons within training time bins. Then we fixed the parameters and inferred the latent process using spike trains from 70% neurons within held-out time bins. Finally, we calculated the predictive log likelihood (PLL) for the other 30% neurons within held-out time bins given the inferred latent process. We subtracted the log-likelihood of the population mean firing rate model (single spike rate) from the predictive log likelihood divided by number of observations, shown in Fig. 3C & D. Both P-GPLVM-aLA and P-GPLVM-dLA perform well. GPLVM has very negative PLL, omitted in the figures.

Fig. 3E & F present the tuning curves learned by P-GPLVM-dLA where each row corresponds to a neuron. For our analysis we have the true locations x_true, the estimated locations x_P-GPLVM, a grid of G locations x_1:G distributed with a shape of the maze, the spike count observation y_i, and the estimated log of the tuning curves ${\hat{f}}_{i}$ for each neuron i. The light gray dots in the first column of Fig. 3E & F are the binned spike counts when mapping from the space of x_true to the space of x_1:G. The second column contains the binned spike counts mapped from the space of x_P-GPLVM to the space of x_1:G. The black curves in the first column are achieved by replacing $\hat{x}$ and $\hat{f}$ with x_true and y respectively using the predictive posterior in (Eq. 21) and (Eq. 22). The yellow curves in the second column are the estimated tuning curves by using (Eq. 22) to get ${\hat{λ}}_{grid}$ for each neuron. We can tell that the estimated tuning curves closely match the true tuning curves from the observations, discovering different responsive locations for different neurons as the rat moves.

6. Conclusion

We proposed a doubly nonlinear Gaussian process latent variable model for neural population spike trains that can identify nonlinear low-dimensional structure underlying apparently high-dimensional spike train data. We also introduced a novel decoupled Laplace approximation, a fast approximate inference method that allows us to efficiently maximize marginal likelihood for the latent path while integrating over tuning curves. We showed that this method outperforms previous Laplace-approximation-based inference methods in both the speed of convergence and accuracy. We applied the model to both simulated data and spike trains recorded from hippocampal place cells and showed that it outperforms a variety of previous methods for latent structure discovery.

Acknowledgments

This work was supported by grants from the Simons Foundation (SCGB AWD543027) and a U19 NIH-NINDS BRAIN Initiative Award (5U19NS104648)

References

[1].Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, and Sahani M. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In Adv neur inf proc sys, pages 1881–1888, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Paninski L, Ahmadian Y, Ferreira Daniel G., Koyama S, Rad Kamiar R., Vidne M, Vogelstein J, and Wu W. A new look at state-space models for neural data. J comp neurosci, 29(1-2):107–126, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Cunningham John P and Yu BM. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11):1500–1509, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Linderman SW, Johnson MJ, Wilson MA, and Chen Z. A bayesian nonparametric approach for uncovering rat hippocampal population codes during spatial navigation. J neurosci meth, 263:36–47, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Macke JH, Buesing L, Cunningham JP, Yu BM, Shenoy KV, and Sahani M. Empirical models of spiking in neural populations. In Adv neur inf proc sys, pages 1350–1358, 2011. [Google Scholar]
[6].Buesing L, Macke JH, and Sahani M. Spectral learning of linear dynamics from generalised-linear observations with application to neural population data. In Adv neur inf proc sys, pages 1682–1690, 2012. [Google Scholar]
[7].Archer EW, Koster U, Pillow JW, and Macke JH. Low-dimensional models of neural population activity in sensory cortical circuits. In Adv neur inf proc sys, pages 343–351, 2014. [Google Scholar]
[8].Macke JH, Buesing L, and Sahani M. Estimating state and parameters in state space models of spike trains. Advanced State Space Methods for Neural and Clinical Data, page 137, 2015. [Google Scholar]
[9].Archer Evan, Park Il Memming, Buesing Lars, Cunningham John, and Paninski Liam. Black box variational inference for state space models. arXiv preprint arXiv:1511.07367, 2015. [Google Scholar]
[10].Gao Y, Archer EW, Paninski L, and Cunningham JP. Linear dynamical neural population models through nonlinear embeddings. In Adv neur inf proc sys, pages 163–171, 2016. [Google Scholar]
[11].Kao JC, Nuyujukian P, Ryu SI, Churchland MM, Cunningham JP, and Shenoy KV. Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature communications, 6, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Pfau David, Pnevmatikakis Eftychios A, and Paninski Liam. Robust learning of low-dimensional dynamics from large neural ensembles. In Adv neur inf proc sys, pages 2391–2399, 2013. [Google Scholar]
[13].Nam Hooram. Poisson extension of gaussian process factor analysis for modeling spiking neural populations Master’s thesis, Department of Neural Computation and Behaviour, Max Planck Institute for Biological Cybernetics, Tubingen, 8 2015. [Google Scholar]
[14].Zhao Y and Park IM. Variational latent gaussian process for recovering single-trial dynamics from population spike trains. arXiv preprint arXiv:1604.03053, 2016. [DOI] [PubMed] [Google Scholar]
[15].Sussillo David, Jozefowicz Rafal, Abbott LF, and Pandarinath Chethan. Lfads-latent factor analysis via dynamical systems. arXiv preprint arXiv:1608.06315, 2016. [Google Scholar]
[16].Lawrence Neil D. Gaussian process latent variable models for visualisation of high dimensional data. In Adv neur inf proc sys, pages 329–336, 2004. [Google Scholar]
[17].Rasmussen Carl and Williams Chris. Gaussian Processes for Machine Learning. MIT Press, 2006. [Google Scholar]
[18].Damianou AC, Titsias MK, and Lawrence ND. Variational inference for uncertainty on the inputs of gaussian process models. arXiv preprint arXiv:1409.2287, 2014. [Google Scholar]
[19].Hafting T, Fyhn M, Molden S, Moser MB, and Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature, 436(7052):801–806, 2005. [DOI] [PubMed] [Google Scholar]
[20].Karlsson M, Carr M, and Frank LM. Simultaneous extracellular recordings from hippocampal areas ca1 and ca3 (or mec and ca1) from rats performing an alternation task in two w-shapped tracks that are geometrically identically but visually distinct. crcns.org. 10.6080/K0NK3BZJ, 2005. [DOI]

[R1] [1].Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, and Sahani M. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In Adv neur inf proc sys, pages 1881–1888, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Paninski L, Ahmadian Y, Ferreira Daniel G., Koyama S, Rad Kamiar R., Vidne M, Vogelstein J, and Wu W. A new look at state-space models for neural data. J comp neurosci, 29(1-2):107–126, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Cunningham John P and Yu BM. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11):1500–1509, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Linderman SW, Johnson MJ, Wilson MA, and Chen Z. A bayesian nonparametric approach for uncovering rat hippocampal population codes during spatial navigation. J neurosci meth, 263:36–47, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Macke JH, Buesing L, Cunningham JP, Yu BM, Shenoy KV, and Sahani M. Empirical models of spiking in neural populations. In Adv neur inf proc sys, pages 1350–1358, 2011. [Google Scholar]

[R6] [6].Buesing L, Macke JH, and Sahani M. Spectral learning of linear dynamics from generalised-linear observations with application to neural population data. In Adv neur inf proc sys, pages 1682–1690, 2012. [Google Scholar]

[R7] [7].Archer EW, Koster U, Pillow JW, and Macke JH. Low-dimensional models of neural population activity in sensory cortical circuits. In Adv neur inf proc sys, pages 343–351, 2014. [Google Scholar]

[R8] [8].Macke JH, Buesing L, and Sahani M. Estimating state and parameters in state space models of spike trains. Advanced State Space Methods for Neural and Clinical Data, page 137, 2015. [Google Scholar]

[R9] [9].Archer Evan, Park Il Memming, Buesing Lars, Cunningham John, and Paninski Liam. Black box variational inference for state space models. arXiv preprint arXiv:1511.07367, 2015. [Google Scholar]

[R10] [10].Gao Y, Archer EW, Paninski L, and Cunningham JP. Linear dynamical neural population models through nonlinear embeddings. In Adv neur inf proc sys, pages 163–171, 2016. [Google Scholar]

[R11] [11].Kao JC, Nuyujukian P, Ryu SI, Churchland MM, Cunningham JP, and Shenoy KV. Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature communications, 6, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Pfau David, Pnevmatikakis Eftychios A, and Paninski Liam. Robust learning of low-dimensional dynamics from large neural ensembles. In Adv neur inf proc sys, pages 2391–2399, 2013. [Google Scholar]

[R13] [13].Nam Hooram. Poisson extension of gaussian process factor analysis for modeling spiking neural populations Master’s thesis, Department of Neural Computation and Behaviour, Max Planck Institute for Biological Cybernetics, Tubingen, 8 2015. [Google Scholar]

[R14] [14].Zhao Y and Park IM. Variational latent gaussian process for recovering single-trial dynamics from population spike trains. arXiv preprint arXiv:1604.03053, 2016. [DOI] [PubMed] [Google Scholar]

[R15] [15].Sussillo David, Jozefowicz Rafal, Abbott LF, and Pandarinath Chethan. Lfads-latent factor analysis via dynamical systems. arXiv preprint arXiv:1608.06315, 2016. [Google Scholar]

[R16] [16].Lawrence Neil D. Gaussian process latent variable models for visualisation of high dimensional data. In Adv neur inf proc sys, pages 329–336, 2004. [Google Scholar]

[R17] [17].Rasmussen Carl and Williams Chris. Gaussian Processes for Machine Learning. MIT Press, 2006. [Google Scholar]

[R18] [18].Damianou AC, Titsias MK, and Lawrence ND. Variational inference for uncertainty on the inputs of gaussian process models. arXiv preprint arXiv:1409.2287, 2014. [Google Scholar]

[R19] [19].Hafting T, Fyhn M, Molden S, Moser MB, and Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature, 436(7052):801–806, 2005. [DOI] [PubMed] [Google Scholar]

[R20] [20].Karlsson M, Carr M, and Frank LM. Simultaneous extracellular recordings from hippocampal areas ca1 and ca3 (or mec and ca1) from rats performing an alternation task in two w-shapped tracks that are geometrically identically but visually distinct. crcns.org. 10.6080/K0NK3BZJ, 2005. [DOI]

PERMALINK

Gaussian process based nonlinear latent structure discovery in multivariate spike train data

Anqi Wu

Nicholas A Roy

Stephen Keeley

Jonathan W Pillow

Abstract