Skip to main content
Computational Intelligence and Neuroscience logoLink to Computational Intelligence and Neuroscience
. 2013 Nov 17;2013:251905. doi: 10.1155/2013/251905

An Overview of Bayesian Methods for Neural Spike Train Analysis

Zhe Chen 1,2,*
PMCID: PMC3855941  PMID: 24348527

Abstract

Neural spike train analysis is an important task in computational neuroscience which aims to understand neural mechanisms and gain insights into neural circuits. With the advancement of multielectrode recording and imaging technologies, it has become increasingly demanding to develop statistical tools for analyzing large neuronal ensemble spike activity. Here we present a tutorial overview of Bayesian methods and their representative applications in neural spike train analysis, at both single neuron and population levels. On the theoretical side, we focus on various approximate Bayesian inference techniques as applied to latent state and parameter estimation. On the application side, the topics include spike sorting, tuning curve estimation, neural encoding and decoding, deconvolution of spike trains from calcium imaging signals, and inference of neuronal functional connectivity and synchrony. Some research challenges and opportunities for neural spike train analysis are discussed.

1. Introduction

Neuronal action potentials or spikes are the basic language that neurons use to represent and transmit information. Understanding neuronal representations of spike trains remains a fundamental task in computational neuroscience [1, 2]. With the advancement of multielectrode array and imaging technologies, neuroscientists have been able to record a large population of neurons at a fine temporal and spatial resolution [3]. To extract (“read out”) information from or inject/restore (“write in”) signals to neural circuits [4], there are emerging needs for modeling and analyzing neural spike trains recorded directly or extracted indirectly from neural signals, as well as building closed-loop brain-machine interfaces (BMIs). Many good examples and applications can be found in the volumes of the current or other special issues [5, 6].

In recent years, cutting-edge Bayesian methods have gained increasing attention in the analysis of neural data and neural spike trains. Despite its well-established theoretic principle since the inception of Bayes' rule [7], Bayesian machinery has not been widely used in large-scaled data analysis until very recently. This was partially ascribed to two facts: first, the development of new methodologies and effective algorithms; second, the ever-increasing computing power. The major theoretic or methodological development has been reported in the field of statistics, and numerous algorithms were developed in applied statistics and machine learning for successful real-world applications [8]. It is time to push this research frontier to neural data analysis. With this purpose in mind, this paper provides a tutorial review on the basic theory and the state-of-the-art Bayesian methods for neural spike train analysis.

The rest of the paper is organized as follows. Section 2 presents the background information about statistical inference and estimation, Bayes' theory, and statistical characterization of neural spike trains. Section 3 reviews several important Bayesian modeling and inference methods in light of different approximation techniques. Section 4 reviews a few representative applications of Bayesian methods for neural spike train analysis. Finally, Section 5 concludes the paper with discussions on a few challenging research topics in neural spike train analysis.

2. Background

2.1. Estimation and Inference: Statistic versus Dynamic

Throughout this paper, we denote by Y the observed variables, by X the hidden variables and by θ an unknown parameter vector, and by the transpose operator for vector or matrix. We assume that p(Y | X, θ) has a regular and well-defined form of the likelihood function. For neural spike train analysis, Y typically consists of time series of single or multiple spike trains. Given a fixed time interval (0, T], by time discretization we have Y = {Y 1, Y 2,…, Y K} (where K = T/Δ and Δ denotes the temporal bin size). A general statistical inference problem is stated as follows: given observations Y, estimate the unknown hidden variable X with a known θ, or estimate θ alone, or jointly estimate θ and X. The unknown variables θ and X can be either static or dynamic (e.g., time-varying with a Markovian structure). We will review the approaches that tackle these scenarios in this paper.

There are two fundamental approaches to solve the inference problem: likelihood approach and Bayesian approach. The likelihood approach [9] computes a point estimate by maximizing the likelihood function and represents the uncertainty of estimate via confidence intervals. The maximum likelihood estimate (m.l.e.) is asymptotically consistent, normal, and efficient, and it is invariant to reparameterization (i.e., functional invariance). However, the m.l.e. is known to suffer from overfitting, and therefore model selection is required in statistical data analysis. In contrast, the Bayesian philosophy lets data speak for themselves and models the unknowns (parameters, latent variables, and missing data) and uncertainties (which are not necessarily random) with probabilities or probability densities. The Bayesian approach computes the full posterior of the unknowns based on the rules of probability theory; the Bayesian approach can resolve the overfitting problem in a principled way [7, 8].

2.2. Bayesian Inference

The foundation of Bayesian inference is given by Bayes' rule, which consists of two rules: product rule and sum rule. Bayes' rule provides a way to compute the conditional, joint, and marginal probabilities. Specifically, let X and Y be two continuous random variables (r.v.); the conditional probability p(X | Y) is given by

p(XY)=p(X,Y)p(Y)=p(YX)p(X)p(YX)p(X)dX. (1)

If X = {X i} is discrete, then (1) is rewritten as

p(XiY)=p(Xi,Y)p(Y)=p(YXi)p(Xi)jp(YXj)p(Xj). (2)

In Bayesian language, p(Y | X), p(X), and p(X | Y) are referred to as the likelihood, prior and posterior, respectively. The Bayesian machinery consists of three types of basic operations: normalization, marginalization, and expectation, all of which involve integration. And the optimization problem consists in maximizing the posterior p(X | Y) and finding the maximum a posteriori (MAP) estimate X MAP = argXmax⁡p(X | Y). Notably, except for very few scenarios (i.e., Gaussianity), most integrations are computationally intractable or costly when dealing with high-dimensional problems. Therefore, for the sake of computational tractability, various types of approximations are often assumed at different stages of the inference procedure.

More specifically, for the state and parameter estimation problem, Bayesian inference aims to infer the joint posterior of the state and the parameter using Bayes' rule

p(X,θY)p(XY)p(θY)=p(YX,θ)p(X)p(θ)p(Y)=p(YX,θ)p(X)p(θ)p(YX,θ)p(X)p(θ)dXdθ, (3)

where the first equation assumes a factorial form of the posterior for the state and the parameter (first-stage approximation) and p(X) and p(θ) denote the prior distributions for the state and parameter, respectively. The denominator of (3) is a normalizing constant known as the partition function. When dealing with a prediction problem for unseen data Y*, we compute the posterior predictive distribution

p(YY)=p(YY,θ,X)p(XY)p(θY)dXdθ (4)

or its expected mean Y^=𝔼p(YY)[Y]=Yp(YY,θ,X)p(XY)p(θY)dXdθdY.

Sometimes, instead of maximizing the posterior, Bayesian inference attempts to maximize the marginal likelihood (also known as “evidence”) p(Y) as follows:

p(Y)=p(YX,θ)p(X)p(θ)dXdθ. (5)

The second-stage approximation in approximate Bayesian inference deals with the integration in computing (3), (4), or (5), which will be reviewed in Section 3.

Note. Maximum likelihood inference can be viewed as a special case of Bayesian inference, in which θ is represented by a Dirac-delta function centered at the point estimate θ^m.l.e.; namely, p(θ)=δ(θ-θ^m.l.e.). Nevertheless, Bayesian inference can still be embedded into likelihood inference to estimate p(X) given a point estimate of θ. The p(X) can either have an analytic form (with finite natural parameters) or be represented by Monte Carlo samples; the latter approach may be viewed as a specific case of Monte Carlo expectation-maximization (EM) methods.

2.3. Characterization of Neural Spike Trains

Neural spike trains can be modeled as a simple (temporal) point process [10]. For a single neural spike train observed in (0, T], we often discretize it with a small bin size Δ such that each bin contains no more than one spike. The conditional intensity function (CIF), denoted as λ(t | H t), is used to characterize the spiking probability of a neural point process as follows:

λ(tHt)=limΔ0Pr{spikein(t,t+Δ]Ht}Δ, (6)

where H t denotes all history information available up to time t (that may include spike history, stimulus covariate, etc.). When λ(t | H t) is history independent, then the stochastic process is an inhomogeneous Poisson process. For notation simplicity, we sometimes use λ t to replace λ(t | H t) when no confusion occurs. When Δ is sufficiently small, the product λ(t | H t)Δ is approximately equal to the probability of observing a spike within the interval ((t − 1)Δ, tΔ]. Assuming that the CIF λ t is characterized by a parameter θ and an observed or latent variable X, then the point process likelihood function is given as [1113]

p(YX,θ)=exp{0Tlogλ(τθ,X)dy(τ)0Tλ(τθ,X)dτ}, (7)

where dy(t) is an indicator function of the spike presence within the interval ((t − 1)Δ, tΔ]. In the presence of multiple spike trains from C neurons, assuming that multivariate point process observations are conditionally independent at any time t given a new parameter θ, one then has

p(Y1:CX,θ)=c=1Cp(YcX,θ)=c=1Cexp{0Tlogλc(τθ,X)dyc(τ)0Tλc(τθ,X)dτ}. (8)

Since neural spike trains are fully characterized by the CIF, the modeling goal is then turned to model the CIF, which can have a parametric or nonparametric form. Identifying the CIF and its associated parameters is essentially a neural encoding problem (Section 4.2). A convenient modeling framework is the generalized linear model (GLM) [14, 15], which can model the binary (0/1) or spike count measurements. Within the exponential family, one can use the logit link function to model the binomial distribution, which has a generic form of log⁡(p t/(1 − p t)) = θ X; one can also use the log link function to model the Poisson distribution, which has a generic form of log⁡(λ t) = θ X.

In addition, researchers have used the negative binomial distribution to model spike count observations to capture the overdispersion phenomenon (where the variance is greater than the mean statistic). In many cases, for the purpose of computational tractability, researchers often use a Gaussian approximation for Poisson spike counts through a variance stabilization transformation. Table 1 lists a few population probability distributions for modeling spike count observations.

Table 1.

Probability distributions for modeling neuronal spike count observations.

Distribution Mean statistic Variance Note for observations Y
Binomial (p) 𝔼[Y] = p p(1 − p) Y ∈ {0,1}
Poisson (λ) 𝔼[Y] = λ λ Y ≥ 0, Y +
Negative binomial (r, p) 𝔼[Y] = pr/(1 − p) pr/(1 − p)2 Y ≥ 0, Y + (overdispersed Poisson)
Skellam (μ 1, μ 2) 𝔼[Y] = μ 1μ 2 μ 1 + μ 2 Y (difference between two Poissons)

Another popular statistical model for characterizing population spike trains is the maximum entropy (MaxEnt) model with a log-linear form [16, 17]. Given an ensemble of C neurons, the ensemble spike activity can be characterized by the following form:

p(X)=1𝒵(X)exp(i=1Cθcxc+i,jCθijxixj)1𝒵(X)exp(i=1C+C2θifi(X)), (9)

where x i ∈ {−1, +1}, 〈·〉 denotes the sample average, 〈x c〉 denotes the mean firing rate of the cth neuron, f i(X) denotes a generic function of X (where the couplings θ i have to match the measured expectation values 〈f i(X)〉), and 𝒵(X) denotes the partition function. The basic MaxEnt model (9) assumes the stationarity of the data and includes the first- and second-order moment statistics but no stimulus component, but these assumptions can be relaxed to further derive an extended model.

An important issue for characterizing neural spike trains is model selection and the associated goodness-of-fit assessment. For goodness-of-fit assessment of spike train models, the reader is referred to [11, 18]. In addition, standard statistical techniques such as cross-validation, leave-one-out, and the receiver-operating-characteristic (ROC) curve may be considered. The model selection issue can be resolved by the likelihood principle based on well-established criteria (such as the Bayesian information criterion or Akaike's information criterion) [9, 11] or resolved by the Bayesian principle. Bayesian model selection and variable selection will be reviewed in Section 3.4.

3. Bayesian Modeling and Inference Methods

The common strategy of Bayesian modeling is to start with specific prior distributions for the unknowns. The prior distributions are characterized by some hyperparameters, which can be directly optimized or modeled by the second-level hyperpriors. If the prior is conjugate to the likelihood, then the posterior has the same form as the prior [8]. Hierarchical Bayesian modeling characterizes the uncertainties of all unknowns at different levels.

In this section, we will review some, either exact or approximate, Bayesian inference methods. The approximate Bayesian inference methods aim to compute or evaluate the integration by approximation. There are two types of approaches to accomplish this goal: deterministic approximation and stochastic approximation. The deterministic approximation can rely on Gaussian approximation, deterministic sampling (e.g., sigma-point approximation [19, 20]) or variational approximation [2123]. The stochastic approximation uses Monte Carlo sampling to achieve a point mass representation of the probability distribution. These two approaches have been employed to approximate the likelihood or posterior function in many inference problems, such as model selection, filtering and smoothing, and state and parameter joint estimation. Detailed coverage of these topics can be found in many excellent books (e.g., [2428]).

3.1. Variational Bayes (VB)

VB is based on the idea of variational approximation [2123] and is also referred to as ensemble learning [24]. To avoid overfitting in maximum likelihood estimation, VB aims to maximize the marginal log-likelihood or its lower bound as follows:

logp(Y)=logdθdXp(θ)p(X,Yθ)=logdθdXq(X,θ)p(θ)p(X,Yθ)q(X,θ)dθdXq(X,θ)logp(θ)p(X,Yθ)q(X,θ)=logp(X,Y,θ)q+q(X,θ)(q(X,θ)), (10)

where p(θ) denotes the parameter prior distribution, p(X, Y | θ) defines the complete data likelihood, and q(X, θ) is called the variational posterior distribution which approximates the joint posterior of the unknown state and parameter p(X, θ | Y). The term q represents the entropy of the variational posterior distribution q, and (q(X, θ)) is referred to as the free energy. The lower bound is derived based on the Jensen's inequality [29]. Maximizing the free energy (q(X, θ)) is equivalent to minimizing the Kullback-Leibler (KL) divergence [29] between the variational posterior and true posterior (denoted by KL(q||p)); since the KL divergence is nonnegative, we have (q) = log⁡p(Y) − KL(q||p) ≤ log⁡p(Y). The optimization problem in (10) can be resorted to the VB-EM algorithm [23] in a similar fashion as the standard EM algorithm [30].

A common (but not necessary) VB assumption is a factorial form of the posterior q(X, θ) = q(X)q(θ), although one can further impose certain structure within the parameter space. In the case of mean-field approximation, we have q(X, θ) = q(X)∏i q(θ i). With selected priors p(X) and p(θ), one can maximize the free energy by alternatively solving two equations: ∂/∂X = 0 and ∂/∂θ = 0. Specifically, VB-EM inference can be viewed as a natural extension of the EM algorithm, which consists of the following two steps.

  1. VB-E step: given the available information of q(θ), maximize the free energy with respect to the function q(X) and update the posterior q(X).

  2. VB-M step: given the available information of q(X), maximize the free energy with respect to the function q(θ) and update the posterior q(θ). The posterior update will have an analytic form provided that the prior p(θ) is conjugate to the complete-data likelihood function (the conjugate-exponential family).

These two steps are alternated repeatedly until the VB algorithm reaches the convergence (say, the incremental change of value is below a small threshold). Similar to the iterative EM algorithm, the VB-EM inference has local maxima in optimization. To resolve this issue, one may use multiple random initializations or employ a deterministic annealing procedure [31]. The EM algorithm can be viewed as a variant of the VB algorithm in that the VB-M step replaces the point estimate (i.e., q(θ) = δ(θθ MAP)) from the traditional M-step with a full posterior estimate. Another counterpart of the VB-EM is the maximization-expectation (ME) algorithm [32], in which the VB-E step uses the MAP point estimate q(X) = δ(XX MAP), while the VB-M step updates the full posterior.

It is noted that when the latent variables and parameters are intrinsically coupled or statistically correlated, the mean-field approximation will not be accurate, and consequently the VB estimate will be strongly biased. To alleviate this problem, the latent-space VB (LSVB) method [33, 34] aims to maximize a tighter lower bound of the marginal log-likelihood from (10) as follows:

logp(Y)dXq(X)logp(X,Y)q(X)=dXq(X)logdθp(X,Y,θ)p(θ)q(X)(q(X))maxq(θ)(q(X)q(θ)). (11)

The reader is referred to [33, 34] for more details and algorithmic implementation.

Note. (i) Depending on specific problems, the optimization bound of VB methods may not be tight, which may cause a large estimate bias or underestimated variance [35]. Desirably, a data-dependent lower bound is often tighter (such as the one used in Bayesian logistic regression [25]). (ii) It was shown in [36] that the VB method for statistical models with latent variables can be viewed as a special case of local variational approximation, where the log-sum-exp function is used to form the lower bound of the log-likelihood. (iii) The VB-EM inference was originally developed for the probabilistic models in the conjugate-exponential family, but it can be extended to more general models based on approximation [37].

3.2. Expectation Propagation (EP)

EP is a message-passing algorithm that allows approximate Bayesian inference for factor graphs (one type of probabilistic graphical model that shows how a function of several variables can be factored into a product of simple functions and can be used to represent a posterior distribution) [38]. For a specific r.v. X (either continuous or discrete), the probability distribution p(X) is represented as a product of factors as follows:

p(X)=afa(X). (12)

The basic idea of EP is to “divide-and-conquer” by approximating the factors one by one as follows:

fa(X)f~a(X) (13)

and then use the product of approximated term as the final approximation as follows:

q(X)=af~a(X). (14)

As a result, EP replaces the global divergence KL(p(X)||q(X)) by the local divergence between two product chains as follows:

KL(p(X)||q(X))=KL(fa(X)bafb(X)||f~a(X)baf~b(X))KL(fa(X)baf~b(X)||f~a(X)baf~b(X)). (15)

To minimize (15), the EP inference procedure is planned as follows.

Step 1. Use message-passing algorithms to pass messages f~a(X) between factors.

Step 2. Given the received message f~b(X) for factor a (for all ba), minimize the local divergence to obtain f~a(X), and send it to other factors.

Step 3. Repeat the procedure until convergence.

Note. (i) EP aims to find the closest approximation q such that KL(p||q) is minimized, whereas VB aims to find the variational distribution to minimize KL(q||p) (note that the KL divergence is asymmetric, and KL(p||q) and KL(q||p) have different geometric interpretations [39]). (ii) Unlike the global approximation technique (e.g., moment matching), EP uses a local approximation strategy to minimize a series of local divergence.

3.3. Markov Chain Monte Carlo (MCMC)

MCMC methods are referred to as a class of algorithms for drawing random samples from probability distributions by constructing a Markov chain that has the equilibrium distribution as the desired distribution [40]. The designed Markov chain is reversible and has detailed balance. For example, given a transition probability P, the detailed balance holds between each pair of state i and j in the state space if and only if π i P ij = π j P ji (where π i = Pr(X t−1 = i) and P ij = Pr(X t−1 = i, X t = j)). The appealing use of MCMC methods for Bayesian inference is to numerically calculate high-dimensional integrals based on the samples drawn from the equilibrium distribution [41].

The most common MCMC methods are the random walk algorithms, such as the Metropolis-Hastings (MH) algorithm [42, 43] and Gibbs sampling [44]. The MH algorithm is the simplest yet the most generic MCMC method to generate samples using a random walk and then to accept them with a certain acceptance probability. For example, given a random-walk proposal distribution g(ZZ′) (which defines a conditional probability of moving state Z to Z′), the MH acceptance probability 𝒜(ZZ′) is

𝒜(ZZ)=min(1,p(Z)g(ZZ)p(Z)g(ZZ)), (16)

which gives a simple MCMC implementation. Gibbs sampling is another popular MCMC method that requires no parameter tuning. Given a high-dimensional joint distribution p(Z) = p(z 1,…, z n), Gibbs sampler draws samples from the individual conditional distribution p(z i | Z i) in turn while holding others fixed (where Z i denote the n − 1 variables in Z except for z i).

For high-dimensional sampling problems, the random-walk behavior of the proposal distribution may not be efficient. Imagine that there are two directions (increase or decrease in the likelihood space) for a one-dimensional search; there will be 2n search directions in an n-dimensional space. On average, it will take about 2n/n steps to hit the exact search direction. Notably, some sophisticated MCMC algorithms employ side information to improve the efficiency of the sampler (i.e., the “mixing” of the Markov chain). Examples of non-random-walk methods include successive overrelaxation, hybrid Monte Carlo, gradient-based Langevin MCMC, and Hessian-based MCMC [24, 4547].

Many statistical estimation problems (e.g., change point detection, clustering, and segmentation) consist in identifying the unknown number of statistical objects (e.g., change points, clusters, and boundaries), which are categorized as the variable-dimensional statistical inference problem. For this kind of inference problem, the so-called reversible jump MCMC (RJ-MCMC) method has been developed [48], which can be viewed as a variant of MH algorithm that allows proposals to change the dimensionality of the space while satisfying the detailed balance of the Markov chain.

Note. As discussed in Section 2.2, since the fundamental operations of Bayesian statistics involve integration, the MCMC methods appear naturally as the most generic techniques for Bayesian inference. On the one hand, the recent decades have witnessed an exponential growth in the MCMC literature for its own theoretic and algorithmic developments. On the other hand, there has been also an increasing trend in applying MCMC methods to neural data analysis, ranging from spike sorting, tuning curve estimation, and neural decoding to functional connectivity analysis, some of which will be briefly reviewed in Section 4.

3.4. Bayesian Model Selection and Variable Selection

Statistical model comparison can be carried on by Bayesian inference. From Bayes' rule, the model posterior probability is expressed by

p(iD)p(Di)p(i). (17)

Under the assumption of equal model priors, maximizing the model posterior is equivalent to maximizing the model evidence (or marginal likelihood) as follows:

p(Di)=θp(D,θi)dθ=θp(Dθ,i)p(θi)dθ. (18)

The Bayes factor (BF), defined as the ratio of evidence between two models, can be computed as [49]

BF=p(D1)p(D2)=p(D,θ11)dθ1p(D,θ22)dθ2=p(θ11)p(Dθ1,1)dθ1p(θ22)p(Dθ2,2)dθ2. (19)

Specifically, the BF is treated as the Bayesian alternative to P values for testing hypotheses (in model selection) and for quantifying the degree the observed data support or conflict with a hypothesis [50]. As discussed previously in Section 3.1, the marginal likelihood may be intractable for a large class of probabilistic models. In practice, the BF is often computed based on numerical approximation, such as the Laplace-Metropolis Estimator [51] or sequential Monte Carlo methods [52]. In addition, for a large sample size, the logarithm of the BF can be roughly approximated by the Bayesian information criterion (BIC) [9], whose computation is much simpler without involving numerical integration.

Bayesian model selection can also be directly implemented via the so-called MCMC model composition (MC3). The basic idea of MC3 is to simulate a Markov chain {(t)} with an equilibrium distribution as p( i | D). For each model , define a neighborhood nbd() and a transition matrix q by setting q(′) = 0 for all ′ ∉ nbd(). Draw a new sample ′ from q(′) and accept the new sample with a probability

min{1,p(D)p(D)}. (20)

Otherwise the chain remains unchanged. Once the Markov chain converges to the equilibrium, one can construct the model posterior based on Monte Carlo samples.

Within a fixed model class, it is often desirable to have a compact or sparse representation of the model to alleviate overfitting. Namely, many coefficients of the model parameters are zeros. A very useful approach for variable selection is the so-called automatic relevance determination (ARD) that encourages sparse Bayesian learning [24, 26, 53]. More specifically, ARD provides a way to infer hyperparameters in hierarchical Bayesian modeling. Given the likelihood p(Y | θ) and the parameter prior p(θ | ω) (where ω denotes the hyperparameters), one can assign a hyperprior p(ω | η) for ω such that the marginal distribution p(θ) = ∫p(θ | ω)p(ω) is peaked and long-tailed (thereby favoring a sparse solution). The hyperprior p(ω) can be either identical or different for each element in θ. In the most general form, we can write

p(θ)=ip(θi)=ip(θiωi)p(ωiηi)dωi. (21)

The hyperprior parameters η = {η i} can be fixed or optimized from data. Upon completing Bayesian inference, the estimated mean and variance statistics of some coefficients θ i will be close to zero (i.e., with the least relevance) and therefore can be truncated. The ARD principle has been widely used in various statistical models, such as linear regression, GLM, and the relevance vector machine (RVM) [26].

3.5. Bayesian Model Averaging (BMA)

BMA is a statistical technique aiming to account for the uncertainty in the model selection process [54]. By averaging many different competing statistical models (e.g., linear or Cox regression and GLM), BMA incorporates model uncertainties into parameter inference and data prediction.

Consider an example of GLM involving choosing independent variables and the link function. Every possible combination of choices defines a different model, say { 0, 1,…, K} (where 0 denotes the null model). Upon computing K Bayes factors BF10 = p(D | 1)/p(D | 0), BF20 = p(D | 2)/p(D | 0),…, and BFK0 = p(D | K)/p(D | 0), the posterior probability p( k | D) is computed as [54]

p(kD)=πkBFk0i=0KπiBFi0, (22)

where π k = p( k)/p( 0) denotes the prior odds for model k against 0. In the case of GLM, the marginal likelihood can be approximated by the Laplace method [55].

3.6. Bayesian Filtering: Kalman Filter, Point Process Filter, and Particle Filter

Bayesian filtering aims to infer a filtered or predictive posterior distribution of temporal data in a sequential fashion, which is often cast within the framework of state space model (SSM) [13, 56, 57]. Without loss of generality, let x t denote the state at discrete time t and let y 0:t denote the cumulative observations up to time t. The filtered posterior distribution of the state, conditional on the observations y 0:t, bears a form of recursive Bayesian estimation as follows:

p(xty0:t)=p(xt)p(y0:txt)p(y0:t)=p(xt)p(yt,y0:t1xt)p(yt,y0:t1)=p(xt)p(ytxt,y0:t1)p(y0:t1xt)p(yty0:t1)p(y0:t1)=p(xt)p(ytxt,y0:t1)p(xty0:t1)p(y0:t1)p(yty0:t1)p(y0:t1)p(xt)=p(ytxt,y0:t1)p(xty0:t1)p(yty0:t1)=p(ytxt)p(xty0:t1)p(yty0:t1), (23)

where the first four steps are derived from Bayes' rule and the last equality of (23) assumes the conditional independence between the observations. The one-step state prediction, also known as the Chapman-Kolmogorov equation [58], is given by

p(xty0:t1)=p(xtxt1)p(xt1y0:t1)dxt1, (24)

where the probability distribution (or density) p(x t | x t−1) describes a state transition equation and the probability distribution (or density) p(y t | x t) is the observation equation. Together (23) and (24) provide the fundamental relations to conduct state space analyses. The above formulation of recursive Bayesian estimation holds for both continuous and discrete variables, for either x or y or both. When the state variable is discrete and countable (in which we use S t to replace x t), the SSM is also referred to as a hidden Markov model (HMM), with associated p(S t | S t−1) and p(y t | S t). Various approximate Bayesian methods for the HMM have been reported [23, 59, 60]. When the hidden state consists of both continuous and discrete variables, the SSM is referred to as a switching SSM, with associated p(x t | x t−1, S t) and p(y t | x t, S t) [27, 61]. In this case, the inference and prediction involve multiple integrals or summations. For example, the prediction equation (24) will be rewritten as

p(xty0:t1,S0:t1)=St1p(xtxt1,St)p(StSt1)×p(xt1y0:t1,S0:t1)dxt1 (25)

whose exact or naive implementation can be computationally prohibitive given a large discrete state space.

When the state and observation equations are both continuous and Gaussian, the Bayesian filtering solution yields the celebrated Kalman filter [62], in which the posterior mean and posterior variance are updated recursively. In fact, based on a Gaussian approximation of nonnegative spike count observations, the Kalman filter has been long used in spike train analysis [63, 64]. However, such a naive Gaussian approximation does not consider the point process nature of neural spike trains. Brown and his colleagues [6567] have proposed a point process filter to recursively estimate the state or parameter in a dynamic fashion. Without loss of generality, assume that the CIF (6) is characterized by a parameter θ via an exponential form, namely, λ tλ(t | θ t) = exp⁡(θ t X t), and assume that the parameter follows a random-walk equation θ t = θ t−1 + w t (where w t denotes random Gaussian noise with zero mean and variance σ 2); then one can use a point process filter to estimate the time-varying parameter θ at arbitrarily fine temporal resolution (i.e., the bin size can be as small as possible for the discrete-time index t) as follows:

θt+1t=θtt(one-step  mean  prediction), (26)
Vt+1t(θ)=Vt+1t(θ)+σ2(one-step  variance  prediction), (27)
θt+1t+1=θt+1t+Vt+1t(θ)θλ(θt+1t)λ(θt+1t)×[dyt+1λ(θt+1t+1)Δ]=θt+1t+Vt+1t(θ)Xt+1×[dyt+1λ(θt+1t+1)Δ](posterior  mode), (28)
Vt+1t+1(θ)=[(Vt+1t(θ))1+Xt+1Xt+1λ(θt+1t)Δ]1(posterior  variance), (29)

where θ t+1∣t+1 and V t+1∣t+1(θ) denote the posterior mode and posterior variance for the parameter θ, respectively. Equations (26)–(29) are reminiscent of Kalman filtering. Equations (26) and (27) for one-step mean and variance predictions are the same as Kalman filtering, but (28) and (29) are different from Kalman filtering due to the presence of non-Gaussian observations and nonlinear operation in (28). In (28), [dy t+1λ(θ t+1∣t+1)Δ] is viewed as the innovations term, and V t+1∣t X t+1 may be interpreted as a “Kalman gain.” The quantity of the Kalman gain determines the “step size” in error correction. In (29), the posterior state variance is derived by inverting the second derivative of the log-posterior probability density log⁡p(θ t | Y) based on a Gaussian approximation of the posterior distribution around the posterior mode [6567]. For this simple example, we have

logp(θtY0:t,Ht)12(θtθt1t1)Vt+1t1(θtθt1t1)+[logλtdytλtΔ],logp(θtY0:t,Ht)θt=(θtθt1t1)Vt+1t1+1λtθλt[dytλtΔ],2logp(θtY0:t,Ht)θtθt=Vt+1t1+[(2λtθtθt1λt  (λtθt)21λt2)×[dytλtΔ](λtθt)21λtΔ]. (30)

Setting the first-order derivative ∂log⁡p(θ t | Y 0:t, H t)/∂θ t to zero and rearranging terms yield (28), and setting V t+1∣t+1(θ) = −[∂2log⁡p(θ t | Y 0:t, H t)/(∂θ tθ t )]−1 yields (29).

The Gaussian approximation is based on the first-order Laplace method. In theory one can also use a second-order method to further improve the approximation accuracy [68]. However, in practice the performance gain is relatively small in the presence of noise and model uncertainty while analyzing real experimental data sets. Although the above example only considers a univariate point process (i.e., a single neuronal spike train), it is straightforward to extend the analysis to multivariate point processes (multiple neuronal spike trains). When the number of the neurons increases, the accuracy of Gaussian approximation of log-posterior also improves due to the Law of large numbers.

An alternative way for estimating a non-Gaussian posterior is to use a particle filter [69]. Several reports have been published in the context of neural spike train analysis [70, 71]. The basic idea of particle filtering is to employ sequential Monte Carlo (importance sampling and resampling) methods and draw a set of independent and identically distributed (i.i.d.) samples (i.e., “particles”) from a proposal distribution; the samples are propagated through the likelihood function, weighted, and reweighted after each iteration update. In the end, one can use Monte Carlo samples (or their importance weights) to represent the posterior. For example, to evaluate the expectation of a function f(x t) with respect to the posterior p(x t | y 0:t), we have

𝔼[f(xt)]=f(xt)p(xty0:t)q(xty0:t)q(xty0:t)dxt=f(xt)W(xt)q(xty0:t)dxti=1Ncf(xt(i))W(xt(i))i=1NcW(xt(i))=f^(xt), (31)

where W(x t) = p(x t | y 0:t)/q(x t | y 0:t) denotes the importance weight function and {x t (i)}i=1 Nc denotes the N c particles drawn from the proposal distribution q(x t | y 0:t). When the sample size N c is sufficiently large (depending on the dimensionality of x), the estimate f^(xt) will be an unbiased estimate of 𝔼[f(x t)]. Based on sequential important sampling (SIS), the importance weights of each sample can be recursively updated as follows [69]:

W(xt(i))=W(xt1(i))p(ytxt(i))p(xt(i)xt1(i))q(xt(i)x0:t1(i),yt). (32)

In practice, choosing a proper proposal distribution q(x t | x 0:t−1, y t) is crucial (see [69] for detailed discussions). In the neuroscience literature, Brockwell et al. [70] used a transition prior p(x t | x t−1) as the proposal distribution, which yields a simple form of update from (32) as follows:

W(xt(i))=W(xt1(i))p(ytxt(i)). (33)

That is, the importance weights W(x t (i)) are only scaled by the instantaneous likelihood value. Despite its simplicity, the transition prior proposal distribution completely ignores the information of current observation y t. To overcome this limitation, Ergun et al. [71] used a filtered (Gaussian) posterior density derived from the point process filter as the proposal distribution, and they reported a significant performance gain in estimation while maintaining the algorithmic simplicity (i.e., sampling from a Gaussian distribution). In addition, the VB approach can be integrated with particle filtering to obtain a variational Bayesian filtering algorithm [72].

Note. (i) If the online operation is not required, we can estimate a smoothed posterior distribution p(x t | y 0:T) to obtain a more accurate estimate. The above Bayesian filters can be extended to the fixed-lag Kalman smoother, point process smoother, and particle smoother [63, 66, 69]. (ii) For neural spike train analysis, the formulation of Bayesian filtering is applicable not only to simple point processes but also to marked point processes [73] or even spatiotemporal point processes.

3.7. Bayesian Nonparametrics

The contrasting methodological pairs “frequentist versus Bayes” and “parametric versus nonparametric” are two examples of dichotomy in modern statistics [74]. The historical roots of Bayesian nonparametrics are dated back to the late 1960s and 1970s. Despite its theoretic development over the past few decades, successful applications of nonparametric Bayesian inference have not been widespread until recently, especially in the field of machine learning [75]. Since nonparametric Bayesian models accommodate a large number of degrees of freedom (infinite-dimensional parameter space) to exhibit a rich class of probabilistic structure, such approaches are very powerful in terms of data representation. The fundamental building blocks are two stochastic processes: Dirichlet process (DP) and Gaussian process (GP). Although detailed technical reviews of these topics are far beyond the scope of this paper, we would like to point out the strengths of these methods in two aspects of statistical data analysis.

  1. Data clustering, partitioning, and segmentation: unlike the finite mixture models, nonparametric Bayesian models define a prior distribution over the set of all possible partitions, in which the number of clusters or partitions may grow as the increase of the data samples in both static and dynamic settings, including the infinite Gaussian mixture model, Dirichlet process mixtures, Chinese restaurant process, and infinite HMM [7476]. The model selection issue is resolved implicitly in the process of infinite mixture modeling.

  2. Prediction and smoothing: unlike the fixed finite-dimensional parametric models, the GP defines priors for the mean function and covariance function, where the covariance kernel function determines the smoothness and stationarity between the data points. Since the predictive posterior is Gaussian, the prediction uncertainty can be computed analytically [28, 77].

Therefore, Bayesian nonparametrics offer greater flexibility for modeling complex data structures. Unfortunately, most inference algorithms for Bayesian nonparametric models involve MCMC methods, which can be computationally prohibitive for large-scale neural data analysis. Therefore, exploiting the sparsity structure of specific neural data and designing efficient inference algorithms are two important directions in practical applications [78].

4. Bayesian Methods for Neural Spike Train Analysis

In this section, we review some representative applications of Bayesian methods for neural spike train analysis, with specific emphases on the real experimental data. Nevertheless, the list of the references is by no means complete, and some other complementary references can be found in [79, 80]. Specifically, the strengths of the Bayesian methods are highlighted in comparison with other standard methods; the potentially issues arising from these methods are also discussed.

4.1. Spike Sorting and Tuning Curve Estimation

To characterize the firing property of single neurons, it is necessary to first identify and sort the spikes from the recorded multiunit activity (MUA) (which is referred to as the discrete ensemble spikes passing the threshold criterion) [8183]. However, spike sorting is often a difficult and error-prone process. Traditionally, spike sorting is formulated as a clustering problem based on spike waveform features [84]. Parametric and nonparametric Bayesian inference methods have been developed for mixture modeling and inference (e.g., [25, 26]), especially for determining the model size [85, 86]. Unlike the maximum likelihood estimation (which produces a hard label for each identified spike), Bayesian approaches produce a soft label (posterior probability) for individual spike; such uncertainties may be considered in subsequent analyses (such as tuning curve estimation and decoding). Spike sorting can also be formulated as a dynamic model inference problem, in the context of state space analysis [87] or in the presence of nonstationarity [88]. Recent studies have suggested that spike sorting should take into account not only spike waveform features but also the neuronal tuning property [89, 90], suggesting that these two processes shall be integrated.

At the single neuron level, a Poisson neuronal firing response is completely characterized by its tuning curve or receptive field (RF). Naturally, estimating the neuronal tuning curve is the second step following spike sorting. Standard tuning curve or RF estimation methods include the spike-triggered average (STA) and spike-triggered covariance (STC). The Bayesian versions of the STA and STC have been proposed [91, 92]. Binning and smoothing are two important issues in firing rate estimation Bayesian methods provide a principled way to estimate the peristimulus time histogram (PSTH) [93]. For estimating a time-varying firing rate profile similar to PSTH, the Bayesian adaptive regression splines (BARS) method offers a principled solution for bin size selection and smoothing based on the RJ-MCMC method [94]. Notably, BARS is more computationally intensive. For similar estimation performance (validated on simulated data), a more computationally efficient approach has been developed using Bayesian filtering-based state space analysis [95]. In addition, Metropolis-type MCMC approaches have been proposed for high-dimensional tuning curve estimation [96, 97].

4.2. Neural Encoding and Decoding

The goal of neural encoding is to establish a statistical mapping (which can be either a biophysical or data-driven model) between the stimulus input and neuronal responses, and the goal of neural decoding is to extract or reconstruct information of the stimulus given the observed neural signals. For instance, the encoded and decoded variables of interest can be a rodent's position during spatial navigation, the monkey's movement kinematics in a reach-to-grasp task, or specific visual/auditory/olfactory stimuli during neuroscience experiments.

Without loss of generality, let {X~,Y~} denote the observed stimuli and neuronal responses, respectively, at the encoding stage, and let θ denote the model parameter of a specific encoding model ; then the posterior distribution of the model (and model parameters) is written as

p(θ,X~,Y~)p(X~,Y~θ,)p(θ)p(). (34)

Once the model is determined, one can infer the posterior mean by θ^=θp(θX~,Y~,)dθ. Depending on the selected likelihood or prior, variations of Bayesian neural encoding methods have been developed [98100].

Given the parameter posterior p(θX~,Y~,) from the encoding analysis, decoding analysis aims to infer the latent variable X given new data Y at the decoding stage (with preselected ). Within the Bayesian framework, it is equivalent to finding the X MAP [101] as follows:

XMAP=argmaxXp(Xθ,Y,)=argmaxXp(YX,θ,)p(θX~,Y~,)p(X)dθargmaxXp(YX,θ^,)p(X), (35)

which consists of two numerical problems: maximization and integration. In the approximation in the last step of (35), we have used p(θX~,Y~,)δ(θ-θ^), where θ^ denotes the estimated mean or mode statistic from p(θX~,Y~,). The optimization problem is more conveniently written in the log domain as follows:

logp(XY,θ^)logp(YX,θ^)+logp(X). (36)

If X follows a Markovian process, this can be solved by recursive Bayesian filtering [65, 67] (Section 3.6). When X is non-Markovian but p(X) and the likelihood are both log-concave, this can be resorted to a global optimization problem [57, 102]. Imposing prior information and structure (e.g., sparsity, spatiotemporal correlation) onto p(X) is important for obtaining either a meaningful solution or a significant optimization speedup [103, 104]. In contrast, when p(X) is flat or noninformative, the MAP solution will be similar to the m.l.e.

In the literature, the majority of neural encoding or decoding models fall within two parametric families: linear model (e.g., [63, 105]) and GLM (e.g., [64, 106, 107]), although nonparametric encoding models have also been considered [108, 109]. Methods for Bayesian neural decoding include (i) Kalman filtering [63], (ii) point process filtering [6567, 110, 111], (iii) particle filtering [70, 71], and (iv) MCMC methods [112]. The areas of experimental neuroscience data include the retina, primary visual cortex, primary somatosensory cortex, auditory periphery (auditory nerves and midbrain auditory neurons), primary auditory cortex, primary motor cortex, premotor cortex, hippocampus, and the olfactory bulb.

It is important to point out that most spike-count or point process based decoding algorithms rely on the assumptions that neural spikes have been properly sorted (some neural decoding algorithms (e.g., [113]) are based on detected MUA instead of sorted single unit activity). Recently, there have been a few efforts in developing spike-sorting-free decoding algorithms, by either estimating the cell identities as missing variables [114] or modeling the spike identities by their proxy based on a spatiotemporal point process [115, 116]. Although this work has been carried out using likelihood inference, it is straightforward to extend it to the Bayesian framework. In the example of decoding the rat's position from recorded ensemble hippocampal spike activity [115, 116], we used a model-free (without θ) and data-driven Bayes' rule as follows:

p(XY,X~,Y~)p(YX,X~,Y~)p(X), (37)

in which p(X) denotes the prior and the likelihood p(YX,X~,Y~) is evaluated nonparametrically (namely, nonparametric neural decoding). By assuming that the joint/marginal/conditional distributions (p(X, Y) and p(X~,Y~), p(X) and p(X~), and p(Y | X) and p(Y~X~)) are stationary during both encoding and decoding phases, the MAP estimate of decoding analysis is obtained by

XMAP=argXmaxp(YX,X~,Y~)p(X)argXmaxf(Y|p(XX~),p(X,YX~,Y~))p(X), (38)

where f is a nonlinear function that involves the marginal and joint pdf's in the argument [115, 116], in which the pdf's are constructed based on a kernel density estimator (KDE). Alternatively, the nonparametric pdf in (38) can be replaced by a parametric form [115] as follows:

XMAPargXmaxf(Y|p(Xθ),p(X,Yθ))p(X), (39)

where p(X | θ) = ∫p(X, Y | θ)dY is the parametric marginal and θ is the point estimate obtained from the training samples {X~,Y~}.

Note. (i) Neural encoding and decoding analyses are established upon the assumption that the neural codes are well understood—namely, how neuronal spikes represent and transmit the information of the external world. Whether being a rate code, a timing code, a latency code, or an independent or correlated population code, Bayesian approach provides a universal strategy to test the coding hypothesis or extract the information [117]. (ii) The sensitivity of spike trains to noise may affect the effectiveness to the encoding-decoding process. From an information-theoretic perspective, various sources of spike noise, such as misclassified spikes (false positives) and misdetected, or misclassified spikes (false negatives), may affect differently the mutual information between the input (stimulus) and output (spikes) channel [118, 119]. In designing a Bayesian decoder, it is important to take into account the noise issue. A decoding strategy that is robust to the noise assumption will presumably yield the best performance [115, 116].

4.3. Deconvolution of Neural Spike Trains

Fluorescent calcium imaging tools have become increasingly popular for observing the spiking activity of large neuronal populations. However, extracting or deconvolving neural spike trains from the raw fluorescence movie or video sequences remains a challenging estimation problem. The standard dF/F or Wiener filtering approaches do not capture the true statistics of neural spike trains and are sensitive to the noise statistics [120].

A principled approach is to formulate the deconvolution problem of a filtered point process via state space analysis and Bayesian inference [121, 122] (see also [123] for another type of Bayesian deconvolution approach using MCMC). Let F t denote the measured univariate fluorescence time series, which is modeled as a linear Gaussian function of the intracellular calcium concentration ([Ca2+]) as follows:

Ft=α[Ca2+]t+β+ϵt, (40)

where β denotes a constant baseline and ϵ t ~ 𝒩(0, σ 2) denotes the Gaussian noise with zero mean and variance σ 2. The calcium concentration can be modeled as a first-order autoregressive (AR) process corrupted by Poisson noise as follows:

α[Ca2+]t=α[Ca2+]t1+nt, (41)

where nt~Poisson(λΔ) and the bin size Δ is chosen to assure that the mean firing rate is independent of the imaging frame rate.

Let θ = {α, β, γ, σ 2, λ}; given the above generative biophysical model, Bayesian deconvolution aims to seek the MAP estimate of spike train as follows:

n^=argmaxnt0p(nF,θ)=argmaxnt0p(Fn,θ)p(nθ)=argmaxnt0t=1Tp(FtCat2+,θ)t=1Tp(ntθ). (42)

Within the state space framework, Vogelstein and colleagues [121] proposed a particle filtering method to infer the posterior probability of spikes at each imaging frame, given the entire fluorescence traces. However, the Monte Carlo approach is computationally expensive and may not be suitable for analyses of a large population of neurons. To meet the real-time processing requirement, they further proposed an approximate yet fast solution by replacing the Poisson distribution by an exponential distribution with the same mean (therefore relaxing the nonnegative integer constraint to the nonnegative real number) [122]. And the approximate solution is given by the following optimization problem:

n^=argmaxnt0t=1T12σ2(FtαCat2+β)2ntλΔ=argmaxCat2+γCat12+0t=1T12σ2(FtαCat2+β)2(Cat2+γCat12+)λΔ. (43)

The approximation of exponential form makes the optimization problem concave with respect to Ca2+, from which the global optimum can be obtained using constrained convex optimization [102]. Once the estimate of the calcium trace is obtained, the MAP spike train can be simply inferred by a linear transformation.

In a parallel fashion, the parameter θ can be similarly estimated by Bayesian inference as follows:

θMAP=argmaxθp(FCa2+,θ)p(Ca2+θ)dCa2+argmaxθp(Fn^,θ)p(n^θ), (44)

where the approximation in the second step assumes that the major mass in the integral is around the MAP sequence n^ (or equivalently the Ca2+ traces). Therefore, the joint estimate (n^,θMAP) can be computed iteratively from (43) and (44) until convergence.

Note. The output of Bayesian deconvolution yields a probability vector between 0 and 1 of having a spike in a given time frame. Selection of different thresholds on the probability vector leads to different detection errors (a tradeoff between the false positives and false negatives). Nevertheless, the Bayesian solution is much more superior to the standard least-squares method. It is noteworthy that a new fast deconvolution method has recently been proposed based on finite rate of innovation (FRI) theory, with reported performance better than the approximate Bayesian solution [124].

4.4. Inference of Neuronal Functional Connectivity and Synchrony

Identifying the functional connectivity of simultaneously recorded neuronal ensembles is an important research objective in computational neuroscience. This analysis has many functional applications such as in neural decoding [125] and in understanding the collective dynamics of coordinated spiking cortical networks [126]. Compared to the standard nonparametric approaches such as cross-correlogram and joint peristimulus time histogram (JPSTH), parametric model-based statistical approaches offer several advantages in neural data interpretation [127].

To model the spike train point process data, without loss of generality we use the following logistic regression model with a logit link function. Specifically, let c be the index of a target neuron, and let i = 1,…C be the indices of triggered neurons (whose spike activity is assumed to trigger the firing of the target neuron). The Bernoulli (binomial) logistic regression GLM is written as

logit(πt)=θcXt=θ0c+j=1Jθjcxj,t=θ0c+i=1Ck=1Kθi,kcxi,tk, (45)

where dim⁡(θ c) = J + 1 = C × K + 1 for the augmented parameter vector θ c = {θ 0 c, θ i,k c} and X t = {x 0, x i,tk}. Here, x 0 ≡ 1, and x i,tk denotes the raw spike count from neuron i at the kth time-lag history window (or a predefined smooth basis function such as in [125]). The spike count is nonnegative; therefore x i,tk ≥ 0. Alternatively, (45) can be rewritten as

πt=exp(θcXt)1+exp(θcXt)=exp(θ0c+j=1Jθjcxj,t)1+exp(θ0c+j=1Jθjcxj,t), (46)

which yields the probability of a spiking event at time t. Equation (46) defines a spiking probability model for neuron c based on its own spiking history and that of the other neurons in the ensemble. Here, exp⁡(θ 0 c) can be interpreted as the baseline firing probability of neuron c. Depending on the algebraic (positive or negative) sign of coefficient θ i,k c, exp⁡(θ i,k c) can be viewed as a “gain” factor (dimensionless, >1 or <1) that influences the relative firing probability of neuron c from another neuron i at the previous kth time lag. Therefore, a negative value of θ i,k c will strengthen the inhibitory effect; a positive value of θ i,k c will enhance the excitatory effect. Two neurons are said to be functionally connected if any of their pairwise connections is nonzero (or the statistical estimate is significantly nonzero).

For inferring the functional connectivity of neural ensembles, in addition to the standard likelihood approaches [127, 128], various forms of Bayesian inference have been developed for the MaxEnt model, GLM, and Bayesian network [129132]. In a similar context, a Bayesian method has been developed based on the deconvolved neuronal spike trains from calcium imaging data [133].

Bayesian methods also proved useful in detecting higher-order correlations among neural assemblies [134, 135]. Higher-order correlations are often characterized by synchronous neuronal firing at a timescale of 5–10 ms. These findings have been reported in experimental data from prefrontal cortex, somatosensory cortex, and visual cortex across many species and animals. Consider a set of C neurons. Each neuron is represented by two states: 1 (firing) or 0 (silent). At any time instant, the state of the C neurons is represented by the vector X = (x 1, x 2,…, x C) (the time index is omitted for simplicity), and in total there are 2C neuronal states. For instance, a general joint distribution of three neurons can be expressed by a log-linear model [134]

p(x1,x2,x3)=exp(θ0+θ1x1+θ2x2+θ3x3+θ12x1x2+θ13x1x3+θ23x2x3+θ123x1x2x3), (47)

which is a natural extension of the MaxEnt model described in (9). A nonzero coefficient of θ 123 would imply the presence of third-order correlation among the three neurons. In experimental data, the number of synchronous events may be scarce in single trials, and the interaction coefficients may also be time-varying. State space analysis and Bayesian filtering offer a principled framework to address these issues [135]. However, the computational bottleneck is the curse of dimensionality when the value of C is moderately large (220 ≈ 106). In the presence of finite data sample size, it is reasonable to impose certain structural priors onto the parameter space for the Bayesian solution.

5. Discussion

We have presented an overview of Bayesian inference methods and their applications to neural spike train analysis. Although the focus of current paper is on neural spike trains, the Bayesian principle is also applicable to other modalities of neural data (e.g., [136]). Due to space limitation, we only cover representative methods and applications in this paper, and the references are reflective of our personal choices from the humongous literature.

In comparison with the standard methods, Bayesian methods provide a flexible framework to address many fundamental estimation problems at different stages of neural data analysis. Regardless of the specific Bayesian approach to be employed, the common goal of Bayesian solutions consists in replacing a single point estimate (or hard decision label) with a full posterior distribution (or soft decision label). As a tradeoff, Bayesian practioners have to encounter the increasing cost of computational complexity (especially while using MCMC), which may be prohibitive for large-scale spike train data sets. Furthermore, special attention shall be paid to select the optimal technique among different Bayesian methods that ultimately lead to quantitatively different approximate Bayesian solutions.

Despite the significant progresses made to date, there remain many research challenges and opportunities for applying Bayesian machinery to neural spike trains, and we will mention a few of them below.

5.1. Nonstationarity

Neural spiking activity is highly nonstationary at various timescales. Sources that account for such nonstationarity may include the animal's behavioral variability across trials, top-down attention, learning, motivation, or emotional effects across time. These effects are time-varying across behaviors. In addition, individual neuronal firing may be affected by other unobserved neural activity, such as through modulatory or presynaptic inputs from other nonrecorded neurons. Therefore, it may be important to consider these latent variables while analyzing neural spike trains [137]. Bayesian methods are a natural solution to model and infer such latent variables. Traditional mixed-effects models can be adapted to a hierarchical Bayesian model to capture various sources of randomness.

5.2. Characterization of Neuronal Dependencies

Neural responses may appear correlated or synchronous at different timescales. It is important to characterize such neuronal dependencies in order to fully understand the nature of neural codes. It is also equally important to associate the neural responses to other measurements, such as behavioral responses, learning performance, or local field potentials. Commonly, correlation statistics or information-theoretic measures have been used (e.g., [138]). Other advanced statistical measures have also been proposed, such as the log-linear model [139], Granger causality [140], transfer entropy [141], or copula model [142]. Specifically, the copula offers a universal framework to model statistical dependencies among continuous, discrete, or mixed-valued r.v., and it has an intrinsic link to the mutual information; Bayesian methods may prove useful for selecting the copula class or the copula mixtures [143]. However, because of the nonstationary nature of neural codes (Section 5.1), it remains a challenge to identify the “true” dependencies among the observed neural spike trains, and it remains important to rule out and rule in neural codes under specific conditions.

5.3. Characterization and Abstraction of Neuronal Ensemble Representation

Since individual neuronal spike activity is known to be stochastic and noisy, in the single-trial analysis it is anticipated that the information extracted from neuronal populations is more robust than that from a single neuron. How to uncover the neural representation of population codes in a single-trial analysis has been an active research topic in neuroscience. This is important not only for abstraction, interpretation, and visualization of population codes but also for discovering invariant neural representations and their links to behavior. Standard dimensionality reduction techniques (e.g., principle component analysis, multidimensional scaling, or locally linear embedding) have been widely used for such analyses. However, these methods have ignored the temporal component of neural codes. In addition, no explicit behavioral correlate may become available in certain modeling tasks. Recently, Bayesian dynamic models, such as the Gaussian process factor analysis (GPFA) [144] and VB-HMM [145147], have been proposed to visualize population codes recorded from large neural ensembles across different experimental conditions. To learn the highly complex structure of spatiotemporal neural population codes, it may be beneficial to borrow the ideas from the machine learning community and to integrate the state-of-the-art unsupervised and supervised deep Bayesian learning techniques.

5.4. Translational Neuroscience Applications

Finally in the long run, it is crucial to apply basic neuroscience knowledge derived from quantitative analyses of neural data to translational neuroscience research. Many clinical research areas may benefit from the statistical analyses reviewed here, such as design of neural prosthetics for patients with tetraplegia [107], detection and control of epileptic seizures, optical control of neuronal firing in behaving animals, or simulation of neural firing patterns to achieve optimal electrotherapeutic effect [148]. Bridging the gap between neural data analysis and their translational applications (such as treating neurological or neuropsychiatric disorders) would continue to be a prominent mission accompanying the journey of scientific discovery.

Acknowledgments

The author was supported by an Early Career Award from the Mathematical Biosciences Institute, Ohio State University. This work was also supported by the NSF-IIS CRCNS (Collaborative Research in Computational Neuroscience) Grant (no. 1307645) from the National Science Foundation.

References

  • 1.Brown EN, Kass RE, Mitra PP. Multiple neural spike train data analysis: state-of-the-art and future challenges. Nature Neuroscience. 2004;7(5):456–461. doi: 10.1038/nn1228. [DOI] [PubMed] [Google Scholar]
  • 2.Grün S, Rotter S. Analysis of Parallel Spike Trains. New York, NY, USA: Springer; 2010. [Google Scholar]
  • 3.Stevenson IH, Kording KP. How advances in neural recording affect data analysis. Nature Neuroscience. 2011;14(2):139–142. doi: 10.1038/nn.2731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Stanley GB. Reading and writing the neural code. Nature Neuroscience. 2013;16:259–263. doi: 10.1038/nn.3330. [DOI] [PubMed] [Google Scholar]
  • 5.Chen Z, Berger TW, Cichocki A, Oweiss KG, Quian Quiroga R, Thakor NV. Signal processing for neural spike trains. Computational Intelligence and Neuroscience. 2010;2010:2 pages. doi: 10.1155/2010/698751.698751 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Macke J, Berens P, Bethge M. Statistical analysis of multi-cell recordings: linking population coding models to experimental data. Frontiers in Computational Neuroscience. 2011;5(article 35) doi: 10.3389/fncom.2011.00035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bernardo J, Smith AFM. Bayesian Theory. New York, NY, USA: John & Wiley; 1994. [Google Scholar]
  • 8.Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd edition. New York, NY, USA: Chapman & Hall/CRC; 2004. [Google Scholar]
  • 9.Pawitan Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood. New York, NY, USA: Clarendon Press; 2001. [Google Scholar]
  • 10.Daley DJ, Vere-Jones D. An Introduction to the Theory of Point Processes. 2nd edition. New York, NY, USA: Springer; 2003. [Google Scholar]
  • 11.Brown EN, Barbieri R, Eden UT, Frank LM. Likelihood methods for neural data analysis. In: Feng J, editor. Computational Neuroscience: A Comprehensive Approach. New York, NY, USA: CRC Press; 2003. pp. 253–286. [Google Scholar]
  • 12.Brown EN. Theory of point processes for neural systems. In: Chow CC, Gutkin B, Hansel D, et al., editors. Methods and Models in Neurophysics. San Diego, Calif, USA: Elsevier; 2005. pp. 691–727. [Google Scholar]
  • 13.Chen Z, Barbieri R, Brown EN. State-space modeling of neural spike train and behavioral data. In: Oweiss K, editor. Statistical Signal Processing for Neuroscience and Neurotechnology. San Diego, Calif, USA: Elsevier; 2010. pp. 161–200. [Google Scholar]
  • 14.Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology. 2005;93(2):1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]
  • 15.McCullagh P, Nelder A. Generalized Linear Models. 2nd edition. Vol. 22. New York, NY, USA: Chapman & Hall/CRC Press; 1989. (Computational Intelligence and Neuroscience). [Google Scholar]
  • 16.Schneidman E, Berry MJ, II, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440(7087):1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nasser H, Marre O, Cessac B. Spatio-temporal spike train analysis for large scale networks using the maximum entropy principle and Monte Carlo method. Journal of Statistical Mechanics. 2013;2013P03006 [Google Scholar]
  • 18.Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. The time-rescaling theorem and its application to neural spike train data analysis. Neural Computation. 2002;14(2):325–346. doi: 10.1162/08997660252741149. [DOI] [PubMed] [Google Scholar]
  • 19.Julier S, Uhlmann J, Durrant-Whyte HF. A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Transactions on Automatic Control. 2000;45(3):477–482. [Google Scholar]
  • 20.Särkkä S. On unscented Kalman filtering for state estimation of continuous-time nonlinear systems. IEEE Transactions on Automatic Control. 2007;52(9):1631–1641. [Google Scholar]
  • 21.Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. Introduction to variational methods for graphical models. Machine Learning. 1999;37(2):183–233. [Google Scholar]
  • 22.Attias H. A variational Bayesian framework for graphical models. In: Solla SA, Leen TK, Müller KR, editors. Advances in Neural Information Processing Systems (NIPS) 12. Boston, Mass, USA: MIT Press; 2000. [Google Scholar]
  • 23.Beal M, Ghahramani Z. Variational Bayesian learning of directed graphical models. Bayesian Analysis. 2006;1(4):793–832. [Google Scholar]
  • 24.MacKay DJ. Information Theory, Inference, and Learning Algorithms. New York, NY, USA: Cambridge University Press; 2003. [Google Scholar]
  • 25.Bishop CM. Pattern Recognition and Machine Learning. New York, NY, USA: Springer; 2006. [Google Scholar]
  • 26.Murphy KP. Machine Learning: A Probabilistic Perspective. Cambridge, Mass, USA: MIT Press; 2012. [Google Scholar]
  • 27.Barber D. Bayesian Reasoning and Machine Learning. New York, NY, USA: Cambridge University Press; 2012. [Google Scholar]
  • 28.Barber D, Cemgil AT, Chiappa S. Bayesian Time Series Models. New York, NY, USA: Cambridge University Press; 2011. [Google Scholar]
  • 29.Cover TM, Thomas JA. Elements of Information Theory. 2nd edition. New York, NY, USA: John Wiley & Sons; 2006. [Google Scholar]
  • 30.Dempster A, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B. 1977;39:1–38. [Google Scholar]
  • 31.Katahira K, Watanabe K, Okada M. Deterministic annealing variant of variational Bayes method. Journal of Physics. 2008;95(1)012015 [Google Scholar]
  • 32.Kurihara K, Welling M. Bayesian k-means as a “maximization-expectation” algorithm. Neural Computation. 2009;21(4):1145–1172. doi: 10.1162/neco.2008.12-06-421. [DOI] [PubMed] [Google Scholar]
  • 33.Sung J, Ghahramani Z, Bang S-Y. Latent-space variational bayes. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008;30(12):2236–2242. doi: 10.1109/TPAMI.2008.157. [DOI] [PubMed] [Google Scholar]
  • 34.Sung J, Ghahramani Z, Bang S-Y. Second-order latent-space variational bayes for approximate bayesian inference. IEEE Signal Processing Letters. 2008;15:918–921. [Google Scholar]
  • 35.Turner RE, Sahani M. Two problems with variational expectation maximisation for time series models. In: Barber D, Cemgil AT, Chiappa S, editors. Bayesian Time Series Models. New York, NY, USA: Cambridge University Press; 2011. pp. 115–138. [Google Scholar]
  • 36.Watanabe K. An alternative view of variational Bayes and asymptotic approximations of free energy. Machine Learning. 2012;86(2):273–293. [Google Scholar]
  • 37.Honkela A, Raiko T, Kuusela M, Tornio M, Karhunen J. Approximate riemannian conjugate gradient learning for fixed-form variational bayes. Journal of Machine Learning Research. 2010;11:3235–3268. [Google Scholar]
  • 38.Minka TP. A family of algorithms for approximate Bayesian inference [Ph.D. thesis] Cambridge, Mass, USA: Department of EECS, Massachusetts Institute of Technology; 2001. [Google Scholar]
  • 39.Amari S-I, Nagaoka H. Methods of Information Geometry. New York, NY, USA: Oxford University Press; 2007. [Google Scholar]
  • 40.Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. New York, NY, USA: Chapman & Hall/CRC; 1995. [Google Scholar]
  • 41.Robert CP, Casella G. Monte Carlo Statistical Methods. 2nd edition. New York, NY, USA: Springer; 2004. [Google Scholar]
  • 42.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 1953;21(6):1087–1092. [Google Scholar]
  • 43.Hastings WK. Monte carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57(1):97–109. [Google Scholar]
  • 44.Geman S, Geman D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;6(6):721–741. doi: 10.1109/tpami.1984.4767596. [DOI] [PubMed] [Google Scholar]
  • 45.Neal RM. 9508. University of Toronto; Department of Statistics; 1995. Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. [Google Scholar]
  • 46.Marshall T, Roberts G. An adaptive approach to Langevin MCMC. Statistics and Computing. 2012;22(5):1041–1057. [Google Scholar]
  • 47.Qi Y, Minka TP. Hessian-based Markov chain Monte-Carlo algorithms. Proceedings of the 1st Cape Cod Workshop on Monte Carlo Methods; September 2002; Cape Cod, Mass, USA. [Google Scholar]
  • 48.Green PJ. Reversible jump Markov chain monte carlo computation and Bayesian model determination. Biometrika. 1995;82(4):711–732. [Google Scholar]
  • 49.Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association. 1995;90(430):773–795. [Google Scholar]
  • 50.Lavine M, Schervish MJ. Bayes factors: what they are and what they are not. American Statistician. 1999;53(2):119–122. [Google Scholar]
  • 51.Lewis SM, Raftery AE. Estimating Bayes factors via posterior simulation with the Laplace-Metropolis estimator. Journal of the American Statistical Association. 1997;92(438):648–655. [Google Scholar]
  • 52.Toni T, Stumpf MPH. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics. 2009;26(1):104–110. doi: 10.1093/bioinformatics/btp619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Neal RM. Bayesian Learning for Neural Networks. New York, NY, USA: Springer; 1996. [Google Scholar]
  • 54.Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Statistical Science. 1999;14(4):382–417. [Google Scholar]
  • 55.Raftery AE. Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika. 1996;83(2):251–266. [Google Scholar]
  • 56.Chen Z, Brown EN. State space model. Scholarpedia. 2013;8(3)30868 [Google Scholar]
  • 57.Paninski L, Ahmadian Y, Ferreira DG, et al. A new look at state-space models for neural data. Journal of Computational Neuroscience. 2010;29(1-2):107–126. doi: 10.1007/s10827-009-0179-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Papoulis A. Probability, Random Variables, and Stochastic Processes. 4th edition. New York, NY, USA: McGraw-Hill; 2002. [Google Scholar]
  • 59.Robert CP, Rydén T, Titterington DM. Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method. Journal of the Royal Statistical Society B. 2000;62(1):57–75. [Google Scholar]
  • 60.Scott SL. Bayesian methods for hidden Markov models: recursive computing in the 21st century. Journal of the American Statistical Association. 2002;97(457):337–351. [Google Scholar]
  • 61.Ghahramani Z. Learning dynamic Bayesian networks. In: Giles CL, Gori M, editors. Adaptive Processing of Sequences and Data Structures. New York, NY, USA: Springer; 1998. pp. 168–197. [Google Scholar]
  • 62.Kalman RE. A new approach to linear filtering and prediction problems. Transactions of the ASME. 1960;82:35–45. [Google Scholar]
  • 63.Wu W, Gao Y, Bienenstock E, Donoghue JP, Black MJ. Bayesian population decoding of motor cortical activity using a Kalman filter. Neural Computation. 2006;18(1):80–118. doi: 10.1162/089976606774841585. [DOI] [PubMed] [Google Scholar]
  • 64.Wu W, Kulkarni JE, Hatsopoulos NG, Paninski L. Neural decoding of hand motion using a linear state-space model with hidden states. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2009;17(4):370–378. doi: 10.1109/TNSRE.2009.2023307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brown EN, Frank LM, Tang D, Quirk MC, Wilson MA. A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. Journal of Neuroscience. 1998;18(18):7411–7425. doi: 10.1523/JNEUROSCI.18-18-07411.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Computation. 2003;15(5):965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]
  • 67.Eden UT, Frank LM, Barbieri R, Solo V, Brown EN. Dynamic analysis of neural encoding by point process adaptive filtering. Neural Computation. 2004;16(5):971–998. doi: 10.1162/089976604773135069. [DOI] [PubMed] [Google Scholar]
  • 68.Koyama S, Castellanos Pérez-Bolde L, Rohilla Shalizi C, Kass RE. Approximate methods for state-space models. Journal of the American Statistical Association. 2010;105(489):170–180. doi: 10.1198/jasa.2009.tm08326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo Methods in Practice. New York, NY, USA: Springer; 2001. [Google Scholar]
  • 70.Brockwell AE, Rojas AL, Kass RE. Recursive Bayesian decoding of motor cortical signals by particle filtering. Journal of Neurophysiology. 2004;91(4):1899–1907. doi: 10.1152/jn.00438.2003. [DOI] [PubMed] [Google Scholar]
  • 71.Ergun A, Barbieri R, Eden UT, Wilson MA, Brown EN. Construction of point process adaptive filter algorithms for neural system using sequential Monte Carlo methods. IEEE Transactions on Biomedical Engineering. 2007;54:419–428. doi: 10.1109/TBME.2006.888821. [DOI] [PubMed] [Google Scholar]
  • 72.Šmídl V, Quinn A. Variational Bayesian filtering. IEEE Transactions on Signal Processing. 2008;56(10):5020–5030. [Google Scholar]
  • 73.Salimpour Y, Soltanian-Zadeh H, Salehi S, Emadi N, Abouzari M. Neuronal spike train analysis in likelihood space. PLoS ONE. 2011;6(6) doi: 10.1371/journal.pone.0021256.e21256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hjort NL, Holmes C, Müller P, Walker SG. Bayesian Nonparametrics. New York, NY, USA: Cambridge University Press; 2010. [Google Scholar]
  • 75.Ghahramani Z. Bayesian nonparametrics and the probabilistic approach to modeling. Philosophical Transactions on Royal Society of London A. 2012;37120110553 [Google Scholar]
  • 76.Fox E, Sudderth E, Jordan M, Willsky A. Bayesian nonparametric methods for learning markov switching processes. IEEE Signal Processing Magazine. 2010;27(6):43–54. [Google Scholar]
  • 77.Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Cambridge, Mass, USA: MIT Press; 2005. [Google Scholar]
  • 78.Van Gael J, Saatci Y, Teh YW, Ghahramani Z. Beam sampling for the infinite hidden Markov model. 25th International Conference on Machine Learning; July 2008; fin. pp. 1088–1095. [Google Scholar]
  • 79.Gabbiani F, Koch C. Principles of spike train analysis. In: Koch C, Segev I, editors. Methods in Neuronal Modeling: From Synapses to Networks. 2nd edition. Boston, Mass, USA: MIT Press; 1998. pp. 313–360. [Google Scholar]
  • 80.Kass RE, Ventura V, Brown EN. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology. 2005;94(1):8–25. doi: 10.1152/jn.00648.2004. [DOI] [PubMed] [Google Scholar]
  • 81.Prentice JS, Homann J, Simmons KD, Tkačik G, Balasubramanian V, Nelson PC. Fast, scalable, bayesian spike identification for Multi-Electrode arrays. PLoS ONE. 2011;6(7) doi: 10.1371/journal.pone.0019884.e19884 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Wood F, Black MJ, Vargas-Irwin C, Fellows M, Donoghue JP. On the variability of manual spike sorting. IEEE Transactions on Biomedical Engineering. 2004;51(6):912–918. doi: 10.1109/TBME.2004.826677. [DOI] [PubMed] [Google Scholar]
  • 83.Ekanadham C, Tranchina D, Simoncelli EP. A blind deconvolution method for neural spike identification. Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS '11); December 2011; MIT Press; [Google Scholar]
  • 84.Lewicki MS. A review of methods for spike sorting: the detection and classification of neural action potentials. Network. 1998;9(4):R53–R78. [PubMed] [Google Scholar]
  • 85.Nguyen DP, Frank LM, Brown EN. An application of reversible-jump Markov chain Monte Carlo to spike classification of multi-unit extracellular recordings. Network. 2003;14(1):61–82. doi: 10.1088/0954-898x/14/1/304. [DOI] [PubMed] [Google Scholar]
  • 86.Wood F, Black MJ. A nonparametric Bayesian alternative to spike sorting. Journal of Neuroscience Methods. 2008;173(1):1–12. doi: 10.1016/j.jneumeth.2008.04.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Herbst JA, Gammeter S, Ferrero D, Hahnloser RHR. Spike sorting with hidden Markov models. Journal of Neuroscience Methods. 2008;174(1):126–134. doi: 10.1016/j.jneumeth.2008.06.011. [DOI] [PubMed] [Google Scholar]
  • 88.Calabrese A, Paninski L. Kalman filter mixture model for spike sorting of non-stationary data. Journal of Neuroscience Methods. 2011;196(1):159–169. doi: 10.1016/j.jneumeth.2010.12.002. [DOI] [PubMed] [Google Scholar]
  • 89.Ventura V. Automatic spike sorting using tuning information. Neural Computation. 2009;21(9):2466–2501. doi: 10.1162/neco.2009.12-07-669. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Ventura V. Traditional waveform based spike sorting yields biased rate code estimates. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(17):6921–6926. doi: 10.1073/pnas.0901771106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Park M, Pillow JW. Receptive field inference with localized priors. PLoS Computational Biology. 2011;7(10) doi: 10.1371/journal.pcbi.1002219.e1002219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Park IM, Pillow JW. Bayesian spike-triggered covariance analysis. In: Shawe-Taylor J, Zemel R, Bartlett P, Fereira F, Weinberger KQ, editors. Advances in Neural Information Processing Systems (NIPS) Vol. 24. Boston, Mass, USA: MIT Press; 2011. pp. 1692–1700. [Google Scholar]
  • 93.Endres D, Oram M. Feature extraction from spike trains with Bayesian binning: ‘Latency is where the signal starts’. Journal of Computational Neuroscience. 2010;29(1-2):149–169. doi: 10.1007/s10827-009-0157-3. [DOI] [PubMed] [Google Scholar]
  • 94.Dimatteo I, Genovese CR, Kass RE. Bayesian curve-fitting with free-knot splines. Biometrika. 2001;88(4):1055–1071. [Google Scholar]
  • 95.Smith AC, Scalon JD, Wirth S, Yanike M, Suzuki WA, Brown EN. State-space algorithms for estimating spike rate functions. Computational Intelligence and Neuroscience. 2010;2010 doi: 10.1155/2010/426539.426539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Cronin B, Stevenson IH, Sur M, Körding KP. Hierarchical bayesian modeling and Markov chain Monte Carlo sampling for tuning-curve analysis. Journal of Neurophysiology. 2010;103(1):591–602. doi: 10.1152/jn.00379.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Taubman H, Vaadia E, Paz R, Chechik G. A Bayesian approach for characterizing direction tuning curves in the supplementary motor area of behaving monkeys. Journal of Neurophysiology. 2013 doi: 10.1152/jn.00449.2012. [DOI] [PubMed] [Google Scholar]
  • 98.Paninski L, Pillow J, Lewi J. Statistical models for neural encoding, decoding, and optimal stimulus design. In: Cisek P, Drew T, Kalaska J, editors. Computational Neuroscience: Theoretical Insights Into Brain Function. Elsevier; 2007. [DOI] [PubMed] [Google Scholar]
  • 99.Gerwinn S, Macke JH, Seeger M, Bethge M. Bayesian inference for spiking neuron models with a sparsity prior. In: Platt JC, Koller D, Singer Y, Roweis S, editors. Advances in Neural Information Processing Systems (NIPS) Vol. 20. Boston, Mass, USA: MIT Press; 2008. pp. 529–536. [Google Scholar]
  • 100.Pillow JW, Scott JG. Fully Bayesian inference for neural models with negative-binomial spiking. In: Bartlett P, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems (NIPS) Vol. 25. Boston, Mass, USA: MIT Press; 2012. pp. 1907–1915. [Google Scholar]
  • 101.Koyama S, Eden UT, Brown EN, Kass RE. Bayesian decoding of neural spike trains. Annals of the Institute of Statistical Mathematics. 2010;62(1):37–59. [Google Scholar]
  • 102.Boyd S, Vandenberghe L. Convex Optimization. New York, NY, USA: Cambridge University Press; 2004. [Google Scholar]
  • 103.Pillow JW, Ahmadian Y, Paninski L. Model-based decoding, information estimation, and change-point detection techniques for multineuron spike trains. Neural Computation. 2011;23(1):1–45. doi: 10.1162/NECO_a_00058. [DOI] [PubMed] [Google Scholar]
  • 104.Ramirez AD, Ahmadian Y, Schumacher J, Schneider D, Woolley SMN, Paninski L. Incorporating naturalistic correlation structure improves spectrogram reconstruction from neuronal activity in the songbird auditory midbrain. Journal of Neuroscience. 2011;31(10):3828–3842. doi: 10.1523/JNEUROSCI.3256-10.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Chen Z, Takahashi K, Hatsopoulos NG. Sparse Bayesian inference methods for decoding 3D reach and grasp kinematics and joint angles with primary motor cortical ensembles. Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology (EMBC '13); 2013; pp. 5930–5933. [DOI] [PubMed] [Google Scholar]
  • 106.Zhang K, Ginzburg I, McNaughton BL, Sejnowski TJ. Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. Journal of Neurophysiology. 1998;79(2):1017–1044. doi: 10.1152/jn.1998.79.2.1017. [DOI] [PubMed] [Google Scholar]
  • 107.Truccolo W, Friehs GM, Donoghue JP, Hochberg LR. Primary motor cortex tuning to intended movement kinematics in humans with tetraplegia. Journal of Neuroscience. 2008;28(5):1163–1178. doi: 10.1523/JNEUROSCI.4415-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Truccolo W, Donoghue JP. Nonparametric modeling of neural point processes via stochastic gradient boosting regression. Neural Computation. 2007;19(3):672–705. doi: 10.1162/neco.2007.19.3.672. [DOI] [PubMed] [Google Scholar]
  • 109.Coleman TP, Sarma SS. A computationally efficient method for nonparametric modeling of neural spiking activity with point processes. Neural Computation. 2010;22(8):2002–2030. doi: 10.1162/NECO_a_00001-Coleman. [DOI] [PubMed] [Google Scholar]
  • 110.Shanechi MM, Brown EN, Williams ZM. Neural population partitioning and a concurrent brain-machine interface for sequential control motor function. Nature Neuroscience. 2012;12:1715–1722. doi: 10.1038/nn.3250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Shanechi MM, Wornell GW, Williams Z, Brown EN. A parallel point-process filter for estimation of goal-directed movements from neural signals. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '10); March 2010; Dallas, Tex, USA. pp. 521–524. [Google Scholar]
  • 112.Ahmadian Y, Pillow JW, Paninski L. Efficient Markov chain monte carlo methods for decoding neural spike trains. Neural Computation. 2011;23(1):46–96. doi: 10.1162/NECO_a_00059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Bansal AK, Truccolo W, Vargas-Irwin CE, Donoghue JP. Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials. Journal of Neurophysiology. 2012;107(5):1337–1355. doi: 10.1152/jn.00781.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Ventura V. Spike train decoding without spike sorting. Neural Computation. 2008;20(4):923–963. doi: 10.1162/neco.2008.02-07-478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Chen Z, Kloosterman F, Layton S, Wilson WA. Transductive neural decoding of unsorted neuronal spikes of rat hippocampus. Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology (EMBC '12); August 2012; pp. 1310–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Kloosterman F, Layton S, Chen Z, Wilson MA. Bayesian decoding of unsorted spikes in the rat hippocampus. Journal of Neurophysiology. 2013 doi: 10.1152/jn.01046.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Jacobs AL, Fridman G, Douglas RM, et al. Ruling out and ruling in neural codes. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(14):5936–5941. doi: 10.1073/pnas.0900573106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Johnson DH. Information theory and neural information processing. IEEE Transactions on Information Theory. 2010;56(2):653–666. [Google Scholar]
  • 119.Smith C, Paninski L. Computing loss of efficiency in optimal Bayesian decoders given noisy or incomplete spike trains. Network. 2013;24(2):75–98. doi: 10.3109/0954898X.2013.789568. [DOI] [PubMed] [Google Scholar]
  • 120.Greenberg DS, Houweling AR, Kerr JND. Population imaging of ongoing neuronal activity in the visual cortex of awake rats. Nature Neuroscience. 2008;11(7):749–751. doi: 10.1038/nn.2140. [DOI] [PubMed] [Google Scholar]
  • 121.Vogelstein JT, Watson BO, Packer AM, Yuste R, Jedynak B, Paninskik L. Spike inference from calcium imaging using sequential Monte Carlo methods. Biophysical Journal. 2009;97(2):636–655. doi: 10.1016/j.bpj.2008.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Vogelstein JT, Packer AM, Machado TA, et al. Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology. 2010;104(6):3691–3704. doi: 10.1152/jn.01073.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Andrieu C, Barat E, Doucet A. Bayesian deconvolution of noisy filtered point processes. IEEE Transactions on Signal Processing. 2001;49(1):134–146. [Google Scholar]
  • 124.Oñativia J, Schultz SR, Dragotti PL. A finite rate of innovation algorithm for fast and accurate spike detection from two-photon calcium imaging. Journal of Neural Engineering. 2013;10 doi: 10.1088/1741-2560/10/4/046017.046017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Pillow JW, Shlens J, Paninski L, et al. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature. 2008;454(7207):995–999. doi: 10.1038/nature07140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Truccolo W, Hochberg LR, Donoghue JP. Collective dynamics in human and monkey sensorimotor cortex: predicting single neuron spikes. Nature Neuroscience. 2010;13(1):105–111. doi: 10.1038/nn.2455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Chornoboy ES, Schramm LP, Karr AF. Maximum likelihood identification of neural point process systems. Biological Cybernetics. 1988;59(4-5):265–275. doi: 10.1007/BF00332915. [DOI] [PubMed] [Google Scholar]
  • 128.Okatan M, Wilson MA, Brown EN. Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity. Neural Computation. 2005;17(9):1927–1961. doi: 10.1162/0899766054322973. [DOI] [PubMed] [Google Scholar]
  • 129.Rigat F, de Gunst M, van Pelt J. Bayesian modelling and analysis of spatio-temporal neuronal networks. Bayesian Analysis. 2006;1(4):733–764. [Google Scholar]
  • 130.Stevenson IH, Rebesco JM, Hatsopoulos NG, Haga Z, Miller LE, Kording KP. Bayesian inference of functional connectivity and network structure from spikes. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2009;17(3):203–213. doi: 10.1109/TNSRE.2008.2010471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Chen Z, Putrino DF, Ghosh S, Barbieri R, Brown EN. Statistical inference for assessing functional connectivity of neuronal ensembles with sparse spiking data. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2011;19(2):121–135. doi: 10.1109/TNSRE.2010.2086079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Eldawlatly S, Zhou Y, Jin R, Oweiss KG. On the use of dynamic Bayesian networks in reconstructing functional neuronal networks from spike train ensembles. Neural Computation. 2010;22(1):158–189. doi: 10.1162/neco.2009.11-08-900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Mishchenko Y, Vogelstein J, Paninski L. A Bayesian approach for inferring neuronal connectivity from calcium uorescent imaging data. Annals of Applied Statistics. 2011;5:1229–1261. [Google Scholar]
  • 134.Martignon L, Deco G, Laskey K, Diamond M, Freiwald W, Vaadia E. Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies. Neural Computation. 2000;12(11):2621–2653. doi: 10.1162/089976600300014872. [DOI] [PubMed] [Google Scholar]
  • 135.Shimazaki H, Amari S, Brown EN, Gruen S. State-space analysis of time-varying higherorder spike correlation for multiple neural spike train data. PLoS Computational Biology. 2012;8(3) doi: 10.1371/journal.pcbi.1002385.e1002385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Turner BM, Forstmann BU, Wagenmakers E-J, Brown SD, Sederberg PB, Steyvers M. A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage. 2013;72:193–206. doi: 10.1016/j.neuroimage.2013.01.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Pillow JW, Latham P. Neural characterization in partially observed populations of spiking neurons. In: Platt JC, Koller D, Singer Y, Roweis S, editors. Advances in Neural Information Processing Systems (NIPS) Vol. 20. Boston, Mass, USA: MIT Press; 2008. pp. 1161–1168. [Google Scholar]
  • 138.Li L, Park IM, Seth S, Sanchez JC, Príncipe JC. Functional connectivity dynamics among cortical neurons: a dependence analysis. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2012;20(1):18–30. doi: 10.1109/TNSRE.2011.2176749. [DOI] [PubMed] [Google Scholar]
  • 139.Kass RE, Kelly RC, Loh W-L. Assessment of synchrony in multiple neural spike trains using loglinear point process models. The Annals of Applied Statistics. 2011;5(2B):1262–1292. doi: 10.1214/10-AOAS429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Kim S, Putrino D, Ghosh S, Brown EN. A Granger causality measure for point process models of ensemble neural spiking activity. PLoS Computational Biology. 2011;7(3) doi: 10.1371/journal.pcbi.1001110.e1001110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Vicente R, Wibral M, Lindner M, Pipa G. Transfer entropy-a model-free measure of effective connectivity for the neurosciences. Journal of Computational Neuroscience. 2011;30(1):45–67. doi: 10.1007/s10827-010-0262-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Berkes P, Woood F, Pillow J. Characterizing neural dependencies with copula models. In: Platt JC, Koller D, Singer Y, Roweis S, editors. Advances in Neural Information Processing Systems (NIPS) Vol. 20. Boston, Mass, USA: MIT Press; 2008. [Google Scholar]
  • 143.Smith MS. Bayesian approaches to copula modelling. In: Damien P, Dellaportas P, Polson N, Stephens D, editors. Bayesian Theory and Applications. New York, NY, USA: Oxford University Press; 2013. [Google Scholar]
  • 144.Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology. 2009;102(1):614–635. doi: 10.1152/jn.90941.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Chen Z, Kloosterman F, Brown EN, Wilson MA. Uncovering spatial topology represented by rat hippocampal population neuronal codes. Journal of Computational Neuroscience. 2012;33(2):227–255. doi: 10.1007/s10827-012-0384-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Chen Z, Gomperts SN, Yamamoto J, Wilson WA. Neural representation of spatial topology in the rodent hippocampus. Neural Computation. 2014;26(1):1–39. doi: 10.1162/NECO_a_00538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Chen Z, Wilson MA. A variational nonparametric Bayesian approach for inferring rat hippocampal population codes. Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology (EMBC '13); 2013; pp. 7092–7095. [DOI] [PubMed] [Google Scholar]
  • 148.Famm K, Litt B, Tracey KJ, Boyden ES, Slaoui M. A jump-start for electroceuticals. Nature. 2013;496:159–161. doi: 10.1038/496159a. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Computational Intelligence and Neuroscience are provided here courtesy of Wiley

RESOURCES