Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Dec 29.
Published in final edited form as: Neural Comput. 2009 Jul;21(7):1797–1862. doi: 10.1162/neco.2009.06-08-799

Discrete- and Continuous-Time Probabilistic Models and Algorithms for Inferring Neuronal UP and DOWN States

Zhe Chen 1, Sujith Vijayan 2, Riccardo Barbieri 3, Matthew A Wilson 4, Emery N Brown 5
PMCID: PMC2799196  NIHMSID: NIHMS142917  PMID: 19323637

Abstract

UP and DOWN states, the periodic fluctuations between increased and decreased spiking activity of a neuronal population, are a fundamental feature of cortical circuits. Understanding UP-DOWN state dynamics is important for understanding how these circuits represent and transmit information in the brain. To date, limited work has been done on characterizing the stochastic properties of UP-DOWN state dynamics. We present a set of Markov and semi-Markov discrete- and continuous-time probability models for estimating UP and DOWN states from multiunit neural spiking activity. We model multiunit neural spiking activity as a stochastic point process, modulated by the hidden (UP and DOWN) states and the ensemble spiking history. We estimate jointly the hidden states and the model parameters by maximum likelihood using an expectation-maximization (EM) algorithm and a Monte Carlo EM algorithm that uses reversible-jump Markov chain Monte Carlo sampling in the E-step. We apply our models and algorithms in the analysis of both simulated multiunit spiking activity and actual multiunit spiking activity recorded from primary somatosensory cortex in a behaving rat during slow-wave sleep. Our approach provides a statistical characterization of UP-DOWN state dynamics that can serve as a basis for verifying and refining mechanistic descriptions of this process.

1 Introduction

1.1 Neuronal State and Recurrent Networks

The state of the neural system reflects the phase of an active recurrent network, which organizes the internal states of individual neurons into synchronization through recurrent network synaptic activity with balanced excitation and inhibition.1 The neuronal state dynamics can be externally or internally driven. The externally driven dynamics results from either sensory-driven adaptation or encoding of sensory percept; the internally driven dynamics results from changes in internal factors, such as attention shift. Different levels of neuronal state also bring in the dynamics of state transition. Generally state transitions are network controlled and can be triggered by the activation of single cells, which are reflected by changes in their intracellular membrane conductance.

From a computational modeling point of view, two types of questions arise. First, how do neurons generate, maintain, and transit between different states? Second, given the neuronal (intracellular or extracellular) recordings, how can the neuronal states be estimated? The computational solutions to the first question emphasize the underlying neuronal physiology or neural mechanism, which we call mechanistic models, whereas the solutions to the second question emphasize the representation or interpretation of the data, which we call statistical models. In this article, we are taking the second approach to model a specific phenomenon regarding the neuronal state.

1.2 Neuronal UP and DOWN States

The notion of neuronal UP and DOWN states refers to the observation that neurons have two distinct subthreshold membrane potentials that are relevant for action potential (i.e., spike) generation. A neuron is said to be depolarized (or excited) if its intracelluar membrane potential is above the resting membrane potential threshold (around −70 to −80 mV) and is said to be hyperpolarized (or inhibited) if its membrane potential is below the threshold. When a sufficient level of excitation is reached, a spike is likely to occur. Essentially, membrane potential fluctuations define two states of the neocortex. The DOWN state defines a quiescent period during which little or no activity occurs, whereas the UP state corresponds to an active cortical state with depolarized membrane potentials and action potential firing driven by synaptic input. It was generally believed that the spontaneous UP and DOWN states are generated by a balance of excitatory and inhibitory neurons in recurrent networks (Haider, Duque, Hasentaub, & McCormick, 2006). In recent years, many neurophysiological studies have been reported regarding the neuronal UP and DOWN states, ranging from intracellular or extracellular recordings (e.g., Sanchez-Vives & McCormick, 2000; Haider, Duque, Hasentaub, Yu, & McCormick, 2007). The UP and DOWN states that are characterized by the cortical slow oscillation in intracellular membrane potentials are also reflected in extracellular recordings, such as local field potential (LFP) or electroencephalograph (EEG), single-unit activity or multiunit activity (MUA). In the literature, the UP and DOWN states have been characterized by examining extracellular LFP recordings (Sirota, Csicsvari, Buhl, & Buzsáki, 2003; Battaglia, Sutherland, & McNaughton, 2004; Wolansky, Clement, Peters, Palczak, & Dickson, 2006) in either the somatosensory cortex of anesthetized or awake animals (e.g., Haslinger, Ulbert, Moore, Brown, & Devor, 2006; Luczak, Barthó, Marguet, Buzsáki, & Harris, 2007) or the visual cortex of nonanesthetized animals during sleep (e.g., Ji & Wilson, 2007). Recently, attention has also turned to multiunit spike trains in an attempt to relate spike firing activities with EEG recordings (Ji & Wilson, 2007).

In order to examine the relationship between sleep and memory in rats or animals, simultaneous recordings are often conducted in the neocortex and hippocampus with the goal of studying the cortico-hippocampal circuit and the functional connectivity of these two regions while the animals perform different tasks. It has been reported (e.g., Volgushev, Chauvette, Mukovski, & Timofeev, 2006) that during the slow wave sleep (SWS), which is characterized by 0.5 to 2.0 Hz slow oscillations (Buzsáki, 2006), neorcortical neurons undergo near-synchronous transitions, every second or so, between UP and DOWN states. The process of the alternating switch between the two states appears to be a network phenomenon that originates in the neocortex (Ji & Wilson, 2007; Vijayan, 2007).

The work reported here was driven by the experimental data accumulated in our lab (Ji & Wilson, 2007; Vijayan, 2007). The growing interest in UP and DOWN states in the neuroscience literature motivated us to develop probabilistic models for the UP and DOWN modulated MUA. Specifically, the UP-DOWN states are modeled as a latent two-state Markovian (or semi-Markovian) process (Battaglia et al., 2004), and the modeling goal is to establish the probability for state transition or the probability density of UP or DOWN state duration and the likelihood model that takes into account both a global hidden state variable and individual history dependence of firing. In comparison with the standard and deterministic threshold-based method, our proposed stochastic models provide a means for representing the uncertainty of state estimation given limited experimental recordings.

1.3 Markov and Semi-Markov Processes and Hidden Markov Models

A stochastic process is said to be Markovian if it satisfies the Markov property; the knowledge of the previous history of states is irrelevant for the current and future states. A Markov chain is a discrete-time Markov process with the Markov property. The Markov process and Markov chain are both “memoryless.” A Markov process or Markov chain contains either continuous-valued or finite discrete-valued states. A discrete-state Markov process contains a finite alphabet set (or finite state space), with each element representing a distinct discrete state. The change of the state is called the transition, and the probability of changing from one state to the other is called the transition probability. For the Markov chain, the current state has only finite-order dependence on the previous states. Typically the first-order Markov property is assumed; in this case, the probability of Sk+1 being in a particular state at time k + 1, given knowledge of states up to time k, depends on the state Sk at time k, namely, Pr(Sk+1S0, S1, …, Sk) = Pr(Sk+1Sk). A semi-Markov process (the Markov renewal process) extends the continuous-time Markov process to the condition that the interoccurrence times are not exponential.

When the state space is not directly observable, a Markov process is called hidden or latent. The so-called hidden Markov process is essentially a probabilistic function of the stochastic process (for a review, see Ephraim & Merhav, 2002). In the discrete-time context, the hidden Markov model (HMM) is a probabilistic model that characterizes the hidden Markov chain. The HMM is a generative model in that its full model {π, P, B} (where π denotes the initial state probability, P denotes the transition probability, and B denotes the emission probability) completely characterizes the underlying probabilistic structure of the Markov chain. Generally several conditions are assumed in the standard HMM: (1) the transition and emission probabilities are stationary or quasi-stationary; (2) the observations, either continuous or discrete valued, are assumed to be identically and independently distributed (i.i.d.); and (3) the model generally assumes a first- or finite-order Markov property. In the literature, there are several methods to tackle the inference problem in the HMM. One (and maybe the most popular) approach is rooted in maximum likelihood estimation. A particular solution is given by the expectation-maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977), which attempts to solve the missing data problem in the statistics literature. This turns out to be also equivalent to the Baum-Welch algorithm proposed by Baum and Welch and colleagues (Baum, Petrie, Soules, & Weiss, 1970; Baum, 1972). The Baum-Welch algorithm contains a forward-backward procedure (E-step) and reestimation (M-step), and it iteratively increases the likelihood of the incomplete data until the local maximum or the stationary point of the likelihood function is reached. Another inference method is rooted in Bayesian statistics. The Bayesian inference for HMM defines the prior probability for the unknown parameters (including the number of state) and attempts to estimate their posterior distributions. Since the posterior distribution is usually analytically intractable, a numerical approximation method is also used. The Markov chain Monte Carlo (MCMC) algorithms try to simulate a Markov chain to approach the equilibrium of the posterior distribution. The Metropolis-Hastings algorithm is a general MCMC procedure to simulate a Markov or semi-Markov chain. When the state space is transdimensional (this problem often arises from model selection in statistical data analysis), the reversible-jump MCMC (RJMCMC) methods (Green, 1995; Robert, Rydén, & Titterington, 2000) have also been developed. Due to the development of efficient inference algorithms, HMM and its variants have been widely used in speech recognition, communications, bioinformatics, and many other applications (e.g., Rabiner, 1989; Durbin, Eddy, Krough, & Mitchison, 1998).

1.4 Point Process and Cox Process

A point process is a continuous-time stochastic process with observations being either 0 or 1. Spike trains recorded from either single or multiple neurons are point processes. We will give a brief mathematical background for point process in a later section and refer the reader to Brown (2005) and Brown, Barbieri, Eden, and Frank (2003) for the complete and rigorous mathematical details of point processes in the context of computational neuroscience treatment. An important feature of spike trains is that the point process observations are not independently distributed; in other words, the current observation (either 0 or 1) is influenced by the previous spiking activities. This type of history dependence requires special attention for probabilistic modeling of the point process.

A Cox process is a doubly stochastic process, which defines a generalization of Poisson process (Cox & Isham, 1980; Daley & Vere-Jones, 2002). Specifically, the time-dependent conditional intensity function (CIF), often denoted as λt, is a stochastic process by its own.2 A representative example of the Cox process is the Markov-modulated Poisson process, which has a state-dependent Poisson rate parameter.

1.5 Overview of Relevant Literature

Hidden Markov processes have a rich history of applications in biology. Tremendous effort has been devoted to modeling ion channels as discrete- or continuous-time Markov chains; several inference algorithms were developed for these models (Chung, Krishnamurthy, & Moore, 1991; Fredkin & Rice, 1992; Ball, Cai, Kadane, & O'Hagan, 1999). However, the observations used in ion-channel modeling are continuous, and the likelihood is often modeled by a gaussian or gaussian mixture distribution.

For discrete observations, Albert (1991) proposed a two-state Markov mixture model of a counting Poisson process and provided a maximum likelihood estimate (MLE) for the parameters. A more efficient forward-backward algorithm was later proposed by Le, Leroux, and Puterman (1992) with the same problem setup. In these two models, the two-state Markov transition probability is assumed to be stationary; although Albert also pointed out the possibility of modeling nonstationarity, no exact algorithm was given. In addition, efficient EM algorithms have been developed for discrete- or continuous-time Markov-modulated point processes (Deng & Mark, 1993; Rydén, 1996; Roberts, Ephraim, & Dieguez, 2006), but applying them to neural spike trains is not straightforward.

In the context of modeling neural spike trains, many authors (e.g., Radons, Becker, Dülfer, & Krüger, 1994; Abeles et al., 1995; Gat, Tishby, & Abeles, 1997; Jones, Fontanini, Sadacca, & Katz, 2007; Achtman et al., 2007; Kemere et al., 2008) used HMM for the purpose of analyzing and classifying the patterns of neural spike trains, but their models are restricted to discrete time and the Markov chain is homogeneous (i.e., the transition probability is stationary). In these studies, the hidden states are discrete, and the spike counts were used as the discrete observations for the likelihood models. Smith and Brown (2003) extended the standard linear state-space model (SSM) with continuous state and observations to an SSM with a continuous state Markov-modulated point process, and an EM algorithm was developed for the hidden state estimation problem. Later the theory was extended to the SSM with mixed continuous, binary, and point process observations (Coleman & Brown, 2006; Prerau et al., 2008; Eden & Brown, 2008), but the latent process was still limited to the continuous-valued state. In a similar context, Danóczy and Hahnloser (2006) also proposed a two-state HMM for detecting the “singing-like” and “awake-like” states of sleeping songbirds with neural spike trains; their model assumes a continuous-time Markov chain (with the assumption of knowing the exact timing of state transitions), and the sojourn time follows an exponential distribution; in addition, the CIF of the point process was assumed to be discrete in their work. All of these restricted assumptions have limited the computational model for analyzing real-world spike trains. Recently, more modeling efforts have been dedicated to estimating the hidden state and parameters using an HMM (or its variants) for estimating the stimulus-response neuronal model (Jones et al., 2007; Escola & Paninski, 2008). Xi and Kass (2008) recently also used a RJMCMC method to characterize the bursty and nonbursty states from goldfish retinal neurons.

In modeling the hidden semi-Markov processes or semi-Markov chains, in which the sojourn time is no longer exponentially distributed, Guon (2003) developed an EM algorithm for a hidden semi-Markov chain with finite discrete-state sojourn time, but the computational complexity of the EM algorithm is much greater than the conventional HMM.3

1.6 Contribution and Outline

In this article, with the goal of estimating the population neuron's UP or DOWN state, we propose discrete-state Markov or semi-Markov probabilistic models for neural spikes trains, which are modeled as doubly stochastic point processes. Specifically, we propose discrete-time and continuous-time SSMs and develop the associated inference algorithms for tackling the joint (state and parameter) estimation problem.

Our contributions have three significant distinctions from the published literature: (1) the point-process observations are not i.i.d. Specifically, the rate parameters or the CIFs of the spike trains are modulated by a latent discrete-state variable and past spiking history. (2) In the continuous-time probabilistic models, the state transition is not necessarily Markovian; in other words, the hidden state is semi-Markovian in the sense that the sojourn time is no longer exponentially distributed. (3) The maximum likelihood inference algorithms are derived for discrete-time and continuous-time probabilistic models for estimating the neuronal UP or DOWN states, and the proposed Monte Carlo EM (MCEM) algorithm is rooted in a RJMCMC sampling method and is well suited for various probabilistic models of the sojourn time.

The rest of the article is organized as follows. In section 2, we present the discrete-time HMM and the EM-based inference algorithm. In section 3, we develop the continuous-time probabilistic Markov and semi-Markov chains and their associated inference algorithms. In section 4, we demonstrate and validate our proposed models and inference algorithms with both simulated and real-world spike train data. We present some discussions in section 5, followed by the conclusion in section 6.

1.7 Notation

In neural spike analysis, we examine spike trains from either single or multiunit activity. Due to digitalized recordings, we assume that the time interval [0, T] of continuous-time neural spike train observations is properly discretized with a small time resolution Δ, so the time indexes are discrete integers within k ∈ [1, K], such that kΔ ∈ ((k − 1)Δ, kΔ] and KΔ = T. Let NtkcNkc denote the counting process for spike train c at time tk, and let dNtkcdNkc denote the indicator variable for 0/1 observation: dNkc=1 if there is a spike and 0 otherwise. Other notations are rather straightforward, and we will define them in the proper places. Most notations used in this article are summarized in Table 1.

Table 1.

Summary of Notation.

c index of spike trains c = 1, 2, …, C
m index of simulated Markov chains m = 1, 2, …, M
t continuous-time index t ∈ [0, T]
ti spike timing of the ith spike in continuous time
Δ smallest time bin size
k discrete-time index k = 1, 2, …, K, KΔ = T
yk number of counts observed from discrete-time Markov chain, yk ∈ {0, ℕ}
Sk discrete-time first-order Markov state, Sk ∈ {1, …, L}
S0 initial Markov state at time 0
S0:T, S1:k history of the Markov state from time 0 to T (or 1 to k)
n number of state jumps within the latent process S0:T
l index of state jumps l = 1, 2, …, n
{S(t); 0 ≤ tT} realization of hidden Markov process
Inline graphic= (n, τ, χ) triplet that contains all information of continuous-time Markov chain {S(t)}
τ = (τ0, …, τn) (n + 1)-length vector of the sojourn times of {S(t)}
χ = (χ0, …, χn) (n + 1)-length vector of visited states in the sojourn times of {S(t)}
Inline graphic(0) initial state of MCMC sampler
νl ν0 = 0, vl=r=0l1τr(l=1,2,,n+1)
Inline graphic0:T, Inline graphic1:K history of point-process observations from time 0 to T (or 1 to k)
N(t), Nk counting process in continuous and discrete time, N(t), Nk ∈ {0, ℕ}
dN(t), dNk indicator of point-process observations, 0 or 1
Pij transition probability from state i to j for a discrete-time Markov chain, Σj Pij = 1
qij transition rate from state i to j for a continuous-time Markov chain, Σj qij = 0
ri = qii total transition rate of state i for a continuous-time Markov chain, ri = Σji qij
πi initial prior probability Pr(S0 = i)
ak(i) forward message of state i at time k
bk(i) backward message of state i at time k
γk(i) marginal conditional probability Pr(Sk = iInline graphic0:T)
ξk(i, j) joint conditional probability Pr(Sk−1 = i, Sk = jInline graphic0:T)
graphic file with name nihms142917ig1.jpg log likelihood of the complete data
R(Inline graphicInline graphic′) proposal transition density from state Inline graphic to Inline graphic
Inline graphic = Inline graphic1Inline graphic2Inline graphic3 prior ratio × likelihood ratio × proposal probability ratio
graphic file with name nihms142917ig5.jpg acceptance probability, Inline graphic = min(1, Inline graphic)
graphic file with name nihms142917ig6.jpg Jacobian
λk conditional intensity function of the point process at time k
θ parameter vector that contains all unknown parameters
p(x) probability density function
F(x) cumulative distribution function, F(x)=xp(z)dz
Φ(x) gaussian cumulative distribution function
erf(x) error function
Inline graphic(·) indicator function
Inline graphic(a, b) uniform distribution within the region (a, b)

2 Discrete-Time Markov Modulated Probabilistic State-Space Model

To infer the neuronal UP and DOWN states, in this section we develop a simple, discrete-time Markov modulated state-space model that can be viewed as a variant of the standard HMM applied to spike train analysis. The underlying probabilistic structure is Markovian and homogeneous, and the inference algorithm is efficient in identifying the statistics of the hidden state process. Based on that, in the next section we develop a continuous-time probabilistic model in order to overcome some of limitations imposed by this discrete-time probabilistic model.

2.1 Hidden Markov Model

Let us consider a discrete-time homogeneous Markov chain. By discrete time, we assume that the time is evenly discretized into fixed-length intervals, which have time indices k = 1, …, K. The neuronal UP or DOWN state, which is characterized by a latent discrete-time first-order Markov chain, is unobserved (and therefore hidden), and the observed spike trains or the spike counts recorded from the MUA are functionally determined by the hidden state. The standard HMM is characterized by three elements: transition probability, emission probability,4 and initial state probability (Rabiner, 1989). At the first approximation, we assume that the underlying latent process follows a two-state HMM with stationary transition and emission probabilities.

  • The initial probability of state is denoted by a vector π = {πi}, where πi = Pr(S0 = i) (i = 0, 1). Without loss of generality, we assume that the amplitude of the hidden state is predefined, and the discrete variable Sk ∈ {0, 1} indicates either a DOWN (0) or UP (1) state.

  • The transition probability matrix is written as
    P=(P00P01P10P11), (2.1)
    with P01 = 1 − P00 and P10 = 1 − P11 corresponding to the transition probabilities from state 0 to state 1 and from state 1 to state 0, respectively.
  • Given the discrete hidden state Sk, the observed numbers of total spikes across all tetrodes (i.e., MUA), y1, y2, …, yK (yk ∈ ℕ), follow probability distributions that depend on the time-varying rate λk,
    Pr(Yk=ykSk=i)=eλkλkykyk!, (2.2)
    where the parameter λk is determined by
    λk=exp(μ+αSk+β(Nk1NkJ)), (2.3)
    where exp(μ) denotes the baseline firing rate and Sk denotes the hidden discrete-state variable at time k. The term Nk−1NkJ represents the total number of spikes observed during the history period (kJ, k − 1], which accounts for the history dependence of neuronal firing. The choice of the length of history dependence is often determined empirically based on the preliminary data analysis, such as the histogram of the interspike interval (ISI). Equations 2.2 and 2.3 can be understood in terms of a generalized linear model (GLM) (e.g., McCullagh & Nelder, 1989; Truccolo, Eden, Fellow, Donoghue, & Brown, 2005), where the link function is a log function and the distribution is Poisson. Note that when β = 0 (i.e., history independence is assumed), we obtain an inhomogeneous Poisson process, and λk reduces to a Poisson rate parameter.

Taking the logarithm to both sides of equation 2.2, equation 2.3 can be rewritten as

logλk=μ+αSk+βnk, (2.4)

where ñk = Nk−1NkJ. More generally, we can split the time period (kJ, k − 1] into several windows (say, with equal duration δ), and equation 2.4 can be rewritten as

logλk=μ+αSk+jβjnk,j=μ+αSk+βTnk, (2.5)

where β = {βj} and ñk = {ñk, j} are two vectors with proper dimensionality, and ñk, j = NkNk−(j+1)δ denotes the observed number of multiunit spike counts within the time interval (k − (j + 1)δ, k]. If we further assume that the observations yk at different time indices k are mutually independent, the observed data likelihood is given by

p(y1:KS1:K,θ)=k=1Kexp(λk)λkykyk!. (2.6)

Note that λk ≡ λ(Sk) is functionally dependent on the latent process Sk, although we have omitted it from the notation for brevity. In statistics, the hidden variables {Sk} are treated as the missing data, {yk} as the observed (incomplete) data, and their combination {Sk, yk} as the complete data. Let θ denote all of the unknown parameters; then the complete data likelihood is given by

p(S1:K,y1:Kθ)=p(y1:KS1:K,θ)p(S1:Kθ). (2.7)

And the complete data log likelihood, denoted as Inline graphic, is derived as (by ignoring the constant)

=logp(S0:K,y1:Kθ)=k=1K(yklogλkλk)+i=01ilogπi+k=2Ki=01j=01ξk(i,j)logPij, (2.8)

where ξk(i, j) = Pr(Sk−1 = i, Sk = j).

2.2 Forward-Backward and Viterbi Algorithms

The inference and learning procedure for the standard HMM is given by an efficient estimation procedure known as the EM algorithm, which is also known as the Baum-Welch algorithm (Baum et al., 1970; Baum, 1972). Rooted in maximum likelihood estimation, the EM algorithm iteratively and monotonically maximizes (or increases) the log-likelihood function given the incomplete data (Dempster et al., 1977). In the E-step, a forward-backward procedure is used to recursively estimate the hidden state posterior probability. In the M-step, based on the missing state statistics (estimated from the E-step), the reestimation procedure and Newton-Ralphson algorithm are used to estimate the unknown parameters θ = (π, P, μ, α, β). In each full iteration, the EM algorithm iteratively maximizes the so-called Q-function,

Q(θnewθold)=E[logp(S^1:K,y1:Kθ)θold]=E[k=1K(yklogλ^kλ^k)+i=01ilogπ^i+k=2Ki=01j=01ξk(i,j)logP^ijθold]; (2.9)

the new θnew is obtained by maximizing the incomplete data likelihood conditional on the old parameters θold; and the iterative optimization procedure continues until the algorithm ultimately converges to a local maximum or a stationary point. For the self-containing purpose, we present a brief derivation of the EM algorithm (Rabiner, 1989) for the two-state HMM estimation problem.

2.2.1 E-Step: Forward-Backward Algorithm

In the E-step, the major task of the forward-backward procedure is to compute the conditional state probabilities for the two states:

Pr(Sk=1y1:K,θ)=Pr(y1:K,Sk=1θ)Pr(y1:Kθ)=Pr(y1:K,Sk=1θ)l=01Pr(y1:K,Sk=lθ) (2.10)
Pr(Sk=0y1:K,θ)=1Pr(Sk=1y1:K,θ), (2.11)

as well as the conditional state joint probability:

Pr(Sk1=i,Sk=jy1:K,θ)=Pr(y0:K,Sk1=i,Sk=jθ)Pr(y1:Kθ)=Pr(y0:K,Sk1=i,Sk=jθ)l=01m=01Pr(y1:K,Sk1=l,Sk=mθ). (2.12)

To make the notation simple, in the derivation below, we let the conditional θ be implicit in the equation.

To estimate equations 2.10 and 2.11, we first factorize the joint probability as

Pr(y1:K,Sk=l)=Pr(y1:k,Sk=l)Pr(yk+1:Ky1:k,Sk=l)=Pr(y1:k,Sk=l)Pr(yk+1:Ky1:kSk=l)ak(l)bk(l)forl=0,1, (2.13)

where

ak(l)=Pr(y1:k,Sk=l)fork=2,,Ka1(l)=Pr(y1:k,Sk=l)Pr(y1S1=l)bk(l)=Pr(yk+1:KSk=l)fork=1,,K1b1(l)=1,

and the forward and backward messages ak(l) and bk(l) can be computed recursively along the discrete-time index k (Rabiner, 1989):

ak(l)=ak1(i)PilPr(ykSk=l)=ak1(i)Pilexp(λk)λkykyk!bk(l)=bk+1(i)PliPr(yk+1Sk+1=i)=bk+1(i)Pliexp(λk+1)λk+1yk+1yk+1!,

where Pil denotes the transition probability from state i to l.

Given {ak, bk}, we can estimate equation 2.12 by

Pr(Sk1=i,Sk=jy1:K,θ)=ak(i)PijPr(yk+1Sk+1=j)bk+1(j). (2.14)

Furthermore, we can compute the observed likelihood (of the incomplete data) by

p(y1:K)=l=01Pr(y1:K,SK=l)=l=01aK(l). (2.15)

Given equations 2.13 and 2.15, the state posterior conditional probability is given by Bayes' rule,

Pr(Sk=iy1:K)=Pr(Sk=i,y1:K)p(y1:K)=ak(i)bk(i)l=01aK(l). (2.16)

In the term of the computational overhead for the above-described two-state HMM, the forward-backward procedure requires a linear order of computational complexity Inline graphic(4K) and memory storage Inline graphic(2K).

2.2.2 M-Step: Reestimation and Newton-Ralphson Algorithm

In the M-step, we update the unknown parameters (based on their previous estimates) by setting the partial derivatives of the Q-function to zeros: Q(θ)θ=0, from which we may derive either closed-form or iterative solutions.

Let ξk(i, j) = Pr(Sk−1 = i, Sk = jy1:K, θ) and γk(i) = Pr(Sk = iy1:K, θ) denote, respectively, the conditional marginal and joint state probabilities (which are the sufficient statistics for the complete data log likelihood 2.9). From the E-step, we may obtain

γk(i)=ak(i)bk(i)i=01ak(l)bk(l)=jξk(j,i)=jξk+1(i,j). (2.17)

The transition probabilities are given by Baum's reestimation procedure:

P^ij=k=2Kξk(i,j)k=2Kjξk(i,j)=k=2Kξk(i,j)k=2Kγk(i). (2.18)

Specifically the transition probabilities P01 and P10 are estimated by closed-form expressions,

P^01=k=2Kξk(0,1)k=2Kj=01ξk(0,j)=k=2Kξk(0,1)k=2Kγk(0), (2.19)
P^10=k=2Kξk(1,0)k=2Kj=01ξk(1,j)=k=2Kξk(1,0)k=2Kγk(1). (2.20)

Next, we need to estimate the other unknown parameters (μ, α, β) that appear in the likelihood model. Since there is no closed-form solution for μ, α, and β, we may use the iterative optimization methods, such as the Newton-Ralphson algorithm or the iterative weighted least squares (IWLS) algorithm (e.g., Pawitan, 2001), to optimize the parameters in the M-step.

Let S^k=i=01iγk(i) denote the computed mean statistic of a hidden state at time k; by setting the derivatives of Inline graphic with regard to the parameters α, μ, and β (and similarly for vector β), to zeros, we obtain

k=JKS^kyk=k=JKS^kexp(μ+αS^k+βnk), (2.21)
k=JKnkyk=k=JKnkexp(μ+αS^k+βnk), (2.22)
k=JKyk=k=JKexp(μ+αS^k+βnk), (2.23)

respectively. Typically, a fixed number of iterations is preset for the Newton-Ralphson algorithm in the internal loop within the M-step.

Finally, the convergence of the EM algorithm is monitored by the incremental changes of the log likelihood as well as the parameters. If the absolute value of the change is smaller than 10−5, the EM algorithm is terminated.

2.2.3 Viterbi Algorithm

Upon estimating parameters θ = (π, P, μ, α, β), we can run the Viterbi algorithm (Viterbi, 1967; Forney, 1973) for decoding the most likely state sequences. The Viterbi algorithm is a dynamical programming method (Bellman, 1957) that uses the “Viterbi path” to discover the single most likely explanation for the observations. Specifically, the maximum a posteriori (MAP) state estimate Ŝk at time k is

S^kMAP=argmaxγk(i)i{0,1}1kK. (2.24)

The computational overhead of the forward Viterbi algorithm has an overall time complexity Inline graphic(4K) and space complexity Inline graphic(2K).

3 Continuous-Time Markovian and Semi-Markovian State-Space Models

The discrete-time probabilistic model discussed in the previous section imposes strong assumptions on the transition between the UP and DOWN states. First, it is stationary in the sense that the transition probability is time invariant; second, the transition is strictly Markovian. In this section, we relax these assumptions and further develop continuous-time, data-driven (either Markovian or semi-Markovian) state-space models, which is more appropriate and realistic in characterizing the nature or statistics of the state transitions. In addition, the inference algorithm for the continuous-time models uses the estimation output from the discrete-time model (developed in section 2) as the initialization condition, which also helps to accelerate the algorithmic convergence process.

3.1 Continuous-Time Probabilistic Model

One important distinction between the discrete-time and continuous-time Markov chains is that the former allows state changes to occur only at regularly spaced intervals, whereas the latter is aperiodic, in that the time between state changes is exponentially distributed. Therefore, the notion of a “single-step transition probability” is no longer valid in continuous time since the “step” does not exist. In fact, the transition probability in continuous time is characterized by either the transition rate or the sojourn time probability density function (pdf) between the two state change events. Let us assume that the pdf of the sojourn time for state j characterized by a parameter vector θj. Hence, the transition probability between state 0 (DOWN) and 1 (UP) is characterized by the corresponding pdfs p(θ0, z) and p(θ1, z), respectively, where z is now the random variable in terms of holding time. For example, given the current state status (say, state j) and current time t, the probability of escaping or changing the current state (to other different state) will be computed from the cumulative distribution function (cdf):

F(z)Pr[0zt]=0tp(θj,z)dz, (3.1)

and the probability of remaining in the present state will be computed from

Pr[z>t]=1F(z)=tp(θj,z)dz, (3.2)

which is known as the survival function in reliability and survival analysis. As seen, the transition probability matrix in continuous time now depends on the elapsed time (starting from the state onset) as well as the present state status. In general, we write the transition probability matrix as a parameterized form P(θ), where θ = (θ0, θ1) characterizes the associated sojourn time pdf parameters for the DOWN (0) and UP (1) states. As we will see, choosing different probability density models for the sojourn time results in different formulations of the continuous-time Markov or semi-Markov chain.

In modeling the neural spike train point processes, the CIF characterizes the instantaneous firing probability of a discrete event (i.e., spike). Specifically, the product between the CIF λ(t) and the time interval Δ tells approximately the probability of observing a spike within the interval [t, t + Δ) (e.g., Brown et al., 2003):

λ(t)=limΔ0Pr{N(t+Δ)N(t)=10:t}Δ.

For each spike train, we model the CIF in a parametric form,5

λc(t)λc(t0:t)=exp(μc+αcSt+γc0teβcτdNc(tτ)dτ), (3.3)

where exp(μc) denotes the baseline firing rate for the cth spike train and βc denotes an exponential decaying parameter that takes into account the history dependence of firing from time 0 upto time t. The nonnegative term 0teβcτdNc(tτ)dτ defines a convolution between the exponential decaying window and the firing history of spike train c up to time t. Because of digitalized recording devices, all continuous-time signals are sampled and recorded in digital format with a very high sampling rate (32 kHz in our setup). Therefore, we still deal with the “discretized” version of a continuous-time signal. In this case, we sometimes use St and Sk interchangeably if no confusion occurs. However, as we see below, their technical treatments are rather different. In the context of continuous-time observations (Δ = 1 ms), every time interval has at most one spike from each spike train. For computational ease, we approximate the integral in equation 3.3 with a finite discrete sum of firing history as follows:

λkcλc(k1:k)=exp(μc+αcSk+βcTnk), (3.4)

where ñk is a vector containing the number of spike counts within the past intervals, and the length of the vector defines a finite number of windows of spiking history. By assuming that the spike trains are mutually independent, the observed data likelihood is given by (Brillinger, 1988; Brown et al., 2003)

p(dN1:K1:CS1:K,θ)=c=1Ck=1Kexp(dNkclog[λkcΔ]λkcΔ). (3.5)

The complete statistics of the continuous-time latent process may be characterized by a triplet: Inline graphic = (n, τ, χ), where τ = (τ0, τ1, …, τn) is a vector that contains the duration lengths of sojourn times of Inline graphic, and χ = (χ0, χ1, …, χn) represents the states visited in these sojourns. Let ν0 = 0, νj=i=0j1τi(i=1,2,,n+1) and νn+1 = T. Alternatively, the complete data likelihood, equation 3.5, can be rewritten in another form,

p(dN1:K1:CS,θ)=c=1Cl=1nPr(dNl1:Cχl,θ), (3.6)

where dNl1:C denotes all of spike train observations during the sojourn time [νl−1, νl] for the continuous-time (semi-)Markov process {S(t)}.

If we model each spike train as an inhomogeneous Poisson process with time-varying CIF λc(t), the expected number of spike counts observed within the duration [νl−1, νl] in the cth spike train is given by the integrated intensity (also referred to as cumulative hazard function):

λlc=νl1νlλc(t)dt. (3.7)

Correspondingly, the observed data likelihood function, equation 3.5, is given by (Daley & Vere-Jones, 2002)

p(dN1:K1:CS,θ)=c=1Cl=1n(exp(λlc)i=1ylcλc(ti))=c=1Cl=1n(exp(λlc)(λlc)ylcylc!),

where ylc denotes the spike counts of the cth spike train during the interval (νl−1, νl], and ti denotes the continuous-time index of the ith spike of a specific spike train during the interval (νl−1, νl].

Ultimately, we can write the complete-data log likelihood in a compact form:6

=logp(S0:T,y1:nθ)=l=1nj=01(ξl(j,j)log[F(θj,τl)]+ijξl(i,j)log[1F(θj,τl)])+c=1Cl=1n(νl1νllogλc(t)dNc(t)λlc), (3.8)

where θj denotes the parameter(s) of the probability model of the sojourn time associated with state j.

3.2 Continuous-Time Markov Chain

In a continuous-time Markov chain (i.e., Markov process), state transitions from one state to another can occur at any instant of time. Due to the Markov property, the time that the system spends in any given state is memoryless, and the distribution of the survival time depends on the state (but not on the time already spent in the state); in other words, the sojourn time is exponentially distributed,7 which can be characterized by a single rate parameter. The rate parameter, also known as the continuous-time state transition rate, defines the probability per time unit that the system makes a transition from one state to the other during an infinitesimal time interval:

qij=limΔt0Pr(St+Δt=jSt=i)Δt,ij. (3.9)

The total transition rate of state i satisfies the rate balance condition:

riqii=jiqij. (3.10)

The holding time of the sojourn for state i follows an exponential distribution exp(−riτ), or, equivalently, the transition times of state i are generated by a homogeneous Poisson process characterized by rate parameter ri. For a two-state Markov chain, the transition rate matrix may be written as

Q=(q00q10q01q11)=(r0r1r0r1). (3.11)

For an exponential random variable z, its cdf is computed as Pr[zt]=trerzdz=1ert, where rerz is the pdf of the exponential distribution with a rate parameter r. The reciprocal of the rate parameter, 1/r, is sometimes called the survival parameter in the sense that the exponential random variable z that survives the duration of time has the mean Inline graphic[z] = 1/r. In light of equations 3.1 and 3.2, at a given specific time t, the probability of remaining within the current state sojourn is Pr[z>t]=1Pr[zt]=trerzdz=ert. Let r0 and r1 denote the transition rate for states 0 and 1, respectively. Let τ = tν denote the elapsed time from the state transition up to the current time instant t; then the parameterized transition probability P = {Pij} is characterized by

Pij(τri)={exp(riτ),i=j1exp(riτ),ij(i,j{0,1}). (3.12)

Now, the transition probability, instead of being constant, is a probabilistic function of the time after the Markov process makes a transition to or from a given state (the holding time from the last transition or the survival time to the next transition).

3.2.1 Imposing Physiological Constraints

Due to biophysical or physiological constraints, the sojourn time for a specific state might be subject to a certain range constraint, reflected in terms of the pdf. Without loss of generality, let p(τ) denote the standard pdf for a random variable τ, and let (τ) denote the “censored” version of p(τ),8 which is defined by

p(τ)=1cp(τ)I[a,b](τ)={1cp(τ),τ[a,b]0,otherwise,

where Inline graphic(·) is an indicator function and a > 0 and b ∈ (a, ∞) are the lower and upper bounds of the constrained random variable τ (which is always positive for the duration of the sojourn time), respectively. The scalar c is a normalized constant determined by c = F (b) − F (a), where F (·) denotes the corresponding cdf of the standard pdf p(τ) and F (∞) = 1. Likewise, the censored version of the cdf is computed by F(τ)=τp(τ)dτ=1caτp(τ)dτ.

Suppose the sojourn time τ of a continuous-time Markov chain follows a censored version of the exponential distribution; then we can write its censored pdf as

p(τ)={rexp(rτ)exp(ra),τa>00,otherwise, (3.13)

where the normalizing constant is given by c = F(∞) − F(a) = 1 − [1 − exp(−ra)] = exp(−ra).

3.3 Continuous-Time Semi-Markov Chain

In contrast to the Markov process, the semi-Markov process is a continuous-time stochastic process {St} that draws the sojourn time {νl} spent in specific discrete states from a nonexponential distribution. In other words, the characterization of the sojourn time is no longer an exponential pdf. Table 2 lists a few examples of continuous-time probability models for characterizing the sojourn time duration for the interevent interval (Tuckwell, 1989). In general, the nonexponential censored pdf with a lower bound gives the flexibility to model the “refractory period” of the UP or DOWN state.

Table 2.

Summary of Continuous-Time Probability Models for the Transition Probability Density Function p(τ) (Where τ Is a Nonnegative or Positive Random Variable That Denotes the Holding Time), Cumulative Distribution Function F(τ), and Survival Function 1 − F(τ).

pdf p(τ) cdf F(τ) l– F(τ) Domain
Exponential r exp(−rτ) 1 − exp(−) exp(−) τ ∈ [0, ∞), r > 0
Weibull ()α−1 exp[−()α] 1 − exp[−()α] exp[−()α] τ ∈ [0, ∞), r > 0, α > 0
Gamma
τs1exp(τ/θ)Γ(s)θs
γ(s,τ/θ)Γ(s)
1γ(s,τ/θ)Γ(s)
τ ∈ [0, ∞), s > 0, θ > 0
Log normal
1τ2πσ2exp((lnτμ)22σ2)
12+12erf[lnτμ2σ]
1212erf[lnτμ2σ]
τ ∈ (0, ∞), μ > 0, σ > 0
Inverse gaussian
s2πτ3exp(s(τμ)22μ2τ)
Φ(sτ(tμ1))+exp(2sτ)Φ(sτ(tμ+1))
τ ∈ (0, ∞), μ > 0, s > 0

Note: erf(z)=2π0zexp(t2)dt denotes the error function, Φ(z;μ,σ)=zexp((tμ)22σ2)dt denotes the gaussian cumulative distribution function, and these two functions relate to each other by Φ(z)=12[1+erf(z2)].

3.3.1 Two-Parameter Exponential Family of Distributions for the UP and DOWN State

To characterize the nonexponential survival time behavior of semi-Markov processes, here we restrict our attention to three probability distributions that belong to the two-parameter exponential family of continuous probability distributions. We define the censored versions of three pdfs as follows:

  • The censored gamma distribution (τ; s, κ):
    p(τ;s,κ)=1cI[a,b](τ)p(τ;s,κ)=1cI[a,b](τ)τs1exp(τ/κ)Γ(s)κs,
    where κ and s represent the scale and shape parameters, respectively. If s is an integer, then the gamma distribution represents the sum of s exponentially distributed random variables, each with a mean κ. Similarly, c is a normalized constant for the censored pdf (τ; s, κ): c = F(b) − F (a), and F (·) is the cdf of the standard gamma distribution.
  • The censored log-normal distribution (τ; μ, σ):
    p(τ;μ,σ)=1cI[a,b](τ)p(τ;μ,σ)=1cI[a,b](τ)1xσ2πexp([ln(τ)μ]22σ2),
    where the mean, median, and variance of the log-normal distribution are exp(μ + σ2/2), exp(μ), and exp(2μ + σ2)[exp(σ2) − 1], respectively.
  • The censored inverse gaussian distribution (τ; μ, s):
    p(τ;μ,s)=1cI[a,b](τ)p(τ;μ,s)=1cI[a,b](τ)s2πτ3exp(s(τμ)22μ2τ),
    where μ and s represent the mean and shape parameters, respectively.

The choice of the last two probability distributions is mainly motivated by the empirical data analysis published earlier (Ji & Wilson, 2007). Typically, for a specific data set, a smoothed histogram analysis is conducted to visualize the shape of the distribution, and the Kolmogorov-Smirnov (KS) test can be used to empirically evaluate the fit of specific probability distributions. Mostly likely, none of parametric probability distribution candidate would fit perfectly (i.e., within 95% confidence interval) for the real-world data; we often choose the one that has the best fit.9 In the simulation data shown in Figure 1, we have used the following constraints for the UP and DOWN states: [0.1, 3] (unit: second) for UP state and [0.05, 1] (unit: second) for DOWN state. The lower and upper bounds of these constraints are chosen in light of the results reported from Ji and Wilson (2007). Note that the shapes of the log-normal and inverse gaussian pdfs and cdfs are very similar, except that inverse gaussian distribution is slightly sharper when the value of the random variable is small (Takagi, Kumagai, Matsunaga, & Kusaka, 1997). In addition, the tail behavior of these two distributions differs; however, provided we consider only their censored versions (with finite duration range), the tail behavior is not a major concern here.

Figure 1.

Figure 1

(a) The duration histograms of the UP (mean 0.96, median 0.67, support [0.1, 3], unit: second) and DOWN (mean 0.17, median 0.13, support [0.05, 1], unit: second) states of spiking data recorded from four behaving rats' visual cortex during SWS. Note that the statistics used in b are identical to those in (Ji & Wilson, 2007, Figure 2b). The y-axis in both plots shows the count statistics of all cortical UP and DOWN durations. (b) Censored versions of the log-normal and inverse gaussian pdfs for the UP (left panel) and DOWN (right panel) states. (c) Censored versions of the log-normal and inverse gaussian survival functions (1-cdf) for the UP (left panel) and DOWN (right panel) states. As comparison, the dashed lines in b and c show the holding time probability for an exponential distribution. Note that in the UP state, the holding time probability in both two-parameter distributions decays more slowly than that of the exponential distribution, whereas in the DOWN state, the holding time probability of exponential distribution decays more slowly than the others.

3.4 EM Algorithm

In the continuous-time model, we treat the individual spike trains separately and aim to estimate their own parameters. Let θ=(θup,θdown,{μc}c=1C,{αc}c=1C,{βc}c=1C,) denote the unknown parameters of interest, where θup and θdown represent the parameters associated with the parametric pdfs of the UP and DOWN states, respectively; the rest of the parameters are associated with the CIF model for respective spike trains. For an unknown continuous-time latent process {S(t); 0 ≤ tT} (where S(t) ∈ {0, 1}), let n be the number of jumps between two distinct discrete states. Let Inline graphic = (n, τ, χ) be a triplet of the Markov or semi-Markov process, where τ = (τ0, τ1, …, τn) is a vector that contains the duration of the sojourn time of Inline graphic and χ = (χ0, χ1, …, χn) represents the states visited in these sojourns.

Let Inline graphic denote all the spiking timing information of the observed spike trains. Similar to the discrete-time HMM, we aim to maximize the Q-function, defined as follows:

Q(θ)=Sp(SY,θ)logp(Y,Sθ). (3.14)

The inference can be tackled in a similar fashion by the EM algorithm.

First, let us consider a simpler task where the transition time of the latent process is known and the number of state jumps is given. In other words, the parameters n and τ are both available, so the estimation goal becomes less demanding. We need to estimate only χ and θ.

Since the number of state transition, n, is known, p(Inline graphic, Inline graphicθ) is simplified to

p(S,Yθ)=l=1nPSl1,Sl(τl)P(YlS(χl)). (3.15)

Let ξl(i, j) = Pr(Inline graphic(χl−1) = i, Inline graphic(χl) = j) and γl(i) = Pr(Inline graphic(χl) = i). In the case of the continuous-time Markov chain, the complete data log likelihood is given by

(S,θ)=logp(S0:T,Yθ)=l=1nj=01(ξl(j,j)(rjτl)+ijξl(j,i)log[1exp(rjτl)])+c=1Cl=1n(νl1νllogλc(t)dNc(t)λlc), (3.16)

where θ = (r0, r1, μ1α1, β1, …, μC, αC, βC) denotes the augmented parameter vector that contains all of the unknown parameters. In the case of the continuous-time semi-Markov chain where the sojourn time is modeled by a nonexponential probability distribution, we can write the log-likelihood function as

=logp(S0:T,Yθ)=l=1n(j=01ξl(j,j)log[1F(τl;θj)]+ijξl(j,i)log[F(τl;θj)])+c=1Cl=1n(νl1νllogλc(t)dNc(t)λlc), (3.17)

where F (·) denotes the cdf of the nonexponential probability distribution.

Conditioned on the parameter θ, the posterior probabilities ξl and γl (for l = 1, …, n) can be similarly estimated by the forward-backward algorithm as in the E-step for the HMM, whereas in the M-step, the new parameter θnew is obtained by

θnew=argmaxθSPr{SY,θold}log(Pr{S,Yθ}). (3.18)

More specifically, for the parameters associated with the transition probability density model, we might, for example, assume that the UP and DOWN state durations are both log normal distributed with parameters θj = {μj, σj}(j = 0, 1), and they can be estimated by

{μj,σj}=argmaxμ,σ{l=1nξl(j,j)log[1F(τl;μ,σ)]+l=1,ijnξl(j,i)log[F(τl;μ,σ)]}. (3.19)

In light of Table 2, setting the derivatives of μj and σj to zeros yields

0=l=1n1σexp((lnτlμ)22σ2)(ξl(j,j)1F(τl;μ,σ)+ijξl(j,i)F(τl;μ,σ)),0=l=1nμlnτlσ2exp((lnτlμ)22σ2)(ξl(j,j)1F(τl;μ,σ)+ijξl(j,i)F(τl;μ,σ)),

where we have used derf(z)dz=2πexp(z2) in light of Table 2. The above two equations can be solved iteratively with the Newton-Ralphson algorithm.

Once the state estimate Ŝ(t) is available from the E-step,10 the parameters associated with the likelihood model can also be estimated by the Newton-Ralphson or the IWLS algorithm,

{μc,αc,βc}=argmaxμ,α,β(l=1nνl1νllogλc(t)dNc(t)λlc), (3.20)

where λc(t) and λlc are defined by equations 3.3 (or 3.4) and 3.7, respectively.

Notably, the EM algorithm described above has a few obvious drawbacks. Basically, it assumes that the information as to when and how many state transitions occur during the latent process is given; once the number of state jumps (say, n) is determined, it does not allow the number n to change, so it is incapable of online model selection. Furthermore, it is very likely that the EM algorithm suffers from the local maximum problem, especially if the initial conditions of the parameters are far from the true estimates. In the following, we present an alternative inference method to overcome these two drawbacks, and the method can be regarded as a generalization of the EM algorithm, except that the E-step state estimation is replaced by a Monte Carlo sampling procedure. This method is often called the Monte Carlo EM (MCEM) algorithm (Chan & Ledolter, 1995; McLachlan & Krishnan, 1996; Tanner, 1996; Stjernqvist, Rydén, Sköld, & Staaf, 2007). The essence of MCEM is the theory of Markov chain Monte Carlo (MCMC), which will be detailed below.

3.5 Monte Carlo EM Algorithm

The basic idea of MCMC sampler is to draw a large number of samples randomly from the posterior distribution and then obtain a sample-based numerical approximation of the posterior distribution. Unlike other Monte Carlo samplers (such as importance sampling and rejection sampling), the MCMC method is well suited for sampling from a complex and high-dimensional probability distribution. Instead of drawing independent samples from the posterior distribution directly, MCMC constructs a Markov chain such that its equilibrium will eventually approach the posterior distribution. The Markov chain theory states that given an arbitrary initial value, the chain will ultimately converge to the equilibrium point provided the chain is run sufficiently long. In practice, determining the convergence as well as the “burn-in time” for MCMC samplers requires diagnostic tools (see Gilks, Richardson, & Spiegelhalter, 1995, for general discussions of the MCMC methods). Depending on the specific problem, the MCMC method is typically computationally intensive, and the convergence process can be very slow. Nevertheless, here we focus on methodology development, and therefore the computational demand is not the main concern. Specifically, constructing problem-specific MCMC proposal distributions (densities) is the key to obtain an efficient MCMC sampler that has a fast convergence speed and a good mixing property (Brooks, Guidici, & Roberts, 2003). For the variable-dimension RJMCMC method, we present some detailed mathematical treatments in appendix A.

The Monte Carlo EM (MCEM) algorithm works just like an ordinary EM algorithm, except that in the E-step, the expectation operations (i.e., computation of sufficient statistics) are replaced by Monte Carlo simulations of the missing data. Specifically, M realizations of the latent process S(t) (0 ≤ tT) are simulated, and in this case the Q-function can be written as

Q(S,θ)Q^(S,θ)=1Mm=1MQ(S(m),θ), (3.21)

where Inline graphic(m) denotes the mth simulated latent process for the unknown state (missing data).

The M-step of MCEM is the same as the conventional M-step in the EM algorithm. Specifically, the parameters of the CIF appearing in the likelihood model are estimated using equation 3.20. However, the estimation of the parameters for the UP or DOWN state sojourn depends on the type of probability distribution. Here we distinguish three different possible scenarios.

First, when the latent process is a continuous-time Markov chain and the sojourn time durations for the UP and DOWN states are both exponentially distributed, then θup and θdown correspond to the rate parameters r1 and r0, respectively. The Q-function for a single Monte Carlo realization of Inline graphic can be written as

Q(S,θ)=i=0,ji1(n^ijlogqij+riTi)+c=1Cl=1n(νl1νllogλc(t)dNc(t)λlc), (3.22)

where nij denotes the number of jumps from state i to state j during [0, T], and Ti=0TI(S(t)=i)dt denote the total time or the sojourn length of {S(t)} spent in state i during [0, T]. Let ni=l=0nI(χl=i) denote the number of events that occur while {S(t)} is in state i; then it is known that the transition rate matrix can be estimated by ij = nij/Ti and ri = qii = ni/Ti, where nij, ni, and Ti are the sufficient statistics (Rydén, 1996). In this case, ri corresponds to the MLE. With M Monte Carlo realizations, the rate parameter will be estimated by

r=1Mm=1Ml=0nI(χl(m)=0)0TI(S(m)(t)=0)dt.

In the second scenario, when the latent process is a continuous-time semi-Markov chain where the sojourn time durations for both UP and DOWN states are nonexponentially distributed, the Q-function can be written as

Q(S,θ)=1Mm=1Ml=0n1(j=01jiijlog[1F(τl(m);θj)])+c=1Cl=1n(νl1νllogλc(S¯(t),θ)dNc(t)λlc), (3.23)

where S¯(t)=1Mm=1MS^(m)(t) is obtained from the Monte Carlo mean statistic of M simulated latent processes. Similarly, the parameters of the nonexponential probability distributions can be estimated by their MLE based on their Monte Carlo realizations. For instance, in the case of inverse gaussian distribution, the mean parameter is given by μj=1njMm=1Ml=0nτl(m)I(χl(m)=j), and the shape parameter is given by σj=[1njMm=1Ml=0n((τl(m))1μj1)I(χl(m)=j)]1. In the case of log-normal distribution, the mean parameter is given by μj=1njMm=1Ml=0nI(χl(m)=j)lnτl(m), and the SD parameter is given by σj=1njMm=1Ml=0nI(χl(m)=j)|lnτl(m)μj|.

If, in the third scenario, one of the state sojourn time durations is exponentially distributed and the other is nonexponential, then the resulting latent process is still a semi-Markov chain, and the estimation procedure remains similar to that in the above two cases.

3.5.1 Initialization of the MCMC Sampler

It is important to choose sensible initial values for the simulated (semi-) Markov chain since a poor choice of the initial Inline graphic(0) can lead to a sampler that takes a very long time to converge or result in a poor mixing of the (semi-) Markov chain. In our experiment, we typically use a discrete-time HMM model (with a 10 ms bin size) to estimate the hidden state sequence and then interpolate it to obtain an initial estimate of the continuous-time state process (with 1 ms bin size), from which we obtain the initial values of {n, τ, χ}.

3.5.2 Algorithmic Procedure

In summary, the MCEM algorithm is run as follows:

  • Initialize the MCMC sampler state for Inline graphic = {n, τ, ν}.

  • Iterate the MCEM algorithm's E and M steps until the log likelihood reaches a local maximum or saddle point.

    1. Monte Carlo E-step: Given an initial state Inline graphic(0), run the RJMCMC sampling procedure to draw M Monte Carlo samples {S(m)}m=1M, based on which to compute the necessary Monte Carlo sufficient statistics.

    2. M-step: estimate the parameters {θup, θdown} with their MLE.

    3. M-step: optimize the parameters {μc,αc,βc}c=1C according to equation 3.20.

  • Upon convergence, reconstruct the hidden state Ŝ(t) in the continuous-time domain.

  • Compute λc(t) for each spike train c, and conduct goodness-of-fit tests (see below) for all spike trains being modeled.

3.5.3 Reconstruction of Hidden State

There are two ways to reconstruct the hidden state of the latent process. The first is to apply the Viterbi algorithm once the MCEM inference is completed (when n and τ have been determined). In the second, and simpler, approach, we can determine the state by the following rule (Ball et al., 1999). For m = 1, 2, …, M, let

S^(m)(t)={1,ifχt(m)UP,0,ifχt(m)DOWN, (3.24)

and let S¯(t)=1Mm=1MS^(m)(t), and the point estimate of the hidden state is

S^(t)={1ifS¯(t)0.5,0ifS¯(t)<0.5. (3.25)

Furthermore, the marginal prior probability of the hidden state is given by

π^i=1Mm=1MI(S(m)(0)=i). (3.26)

3.5.4 Goodness-of-Fit Tests

Upon estimating the CIF model λc(t) for each spike train (see equation 3.3), the goodness of fit of the estimated model is tested in light of the time-rescaling theorem (Brown, Barbieri, Ventura, Kass, & Frank, 2002). Specifically, given a point process specified by J discrete events: 0 < u1 < ⋯ < uJ < T, define the random variables zj=uj1ujλ(τ)dτ for j = 1, 2, …, J − 1. Then the random variables zjs are independent, unit-mean exponentially distributed. By introducing the variable of transformation vj = 1 − exp(−zj), vjs are independent and uniformly distributed within the region [0, 1]. Let gj = Φ−1(vj) (where Φ(·) denotes the cdf of the standard gaussian distribution); then gjs will be independent standard gaussian random variables. Furthermore, the standard Kolmogorov-Smirnov (KS) test is used to compare the cdf of vj against that of the random variables uniformly distributed within [0, 1]. The KS statistic is the maximum deviation of the empirical cdf from the uniform cdf. To compute it, the vjs are sorted from the smallest to the largest value; then we plot values of the cdf of the uniform density defined as j0.5J against the ordered vj s. The points should lie on the 45 degree line. In a Cartesian plot of the empirical cdf as the y-coordinate versus the uniform cdf as the x-coordinate, the 95% confidence interval lines are y=x±1.36(J1)1/2. The KS distance, defined as the maximum distance between the KS plot and the 45 degree line, is used to measure the lack of fit between the model and the data.

Furthermore, we measure the independence of the time-rescaled time series by computing the autocorrelation function of gjs: ACF(m)=1Jmj=1Jmgjgj+m. Since gjs are normally distributed, if they are independent, then they are also uncorrelated; hence, ACF(m) shall be small for all values of m, and the associated 95% confidence interval is 0±1.96(J1)1/2.

3.5.5 Implementation and Convergence

Let the triple Inline graphic = (n, τ, χ) denote all the information of the continuous-time latent process, which contains n state jumps and n + 1 durations of corresponding sojourn times, and the discrete states that are visited in the sojourns.

To simulate the Markov chain, we first obtain the initial conditions for both the state and parameters {Inline graphic(0), θ(0)}. Next, we run the MCMC sampler (see appendix A for details) to generate a sequence of Monte Carlo samplers {Inline graphic(k)} for k = 1, 2, …, M, and the realizations {Inline graphic(k)} can be viewed as the samples drawn from the conditional posterior p(Inline graphicInline graphic, θ). At each MCEM step, the parameter vector θ is updated in the Monte Carlo M-step using the sufficient statistics obtained from p(Inline graphicInline graphic, θ). Typically, to reduce the correlation of the simulated samples, a “burn-in” period is discarded at the beginning of the simulated (semi-) Markov chain. Even after the burn-in period, the successive realizations of {Inline graphic(k)} might still be highly correlated. This problem can be alleviated by using the “thinning” or subsampling technique: every Np simulated samples in the chain is used. Although the thinning technique can increase the Monte Carlo variance of the estimate (Geyer, 1992), it is widely used in practice to reduce the correlation among the samples. In our experiment, we typically chose Np = 10.

At the end of each Monte Carlo sampling step, the sufficient statistics for computing the acceptance probability and updating the parameter θ (in the M-step) need to be stored, and all new information (n, τ, χ) needs to be updated each time Inline graphic changes.

In terms of convergence, Markov chain theory tells us that when M → ∞, the samples are asymptotically drawn from the desired target (equilibrium) distribution. However, choosing the proper value of M is often problem dependent, and the convergence diagnosis of the MCMC sampler remains an active research topic in the literature (e.g., Gelman & Rubin, 1992; Cowles & Carlin, 1996).

4 Experimental Results

4.1 Synthetic Data

First, we simulate four spike trains with the time-rescaling theorem (see Figure 2 for a snapshot of one realization). The latent state variable is assumed to be drawn from a two-state discrete space: S(t) ∈ {0, 1}. The simulation is conducted with a 1 ms time bin size with the following model:

Figure 2.

Figure 2

Synthetic data. (a) The simulated UP and DOWN hidden state process. (b) The simulated time-varying traces of conditional intensity function (CIF) λc(t) (c = 1, …, 4). (c) The four simulated spike trains. (d) The averaged firing rate across four spike trains (the solid gray curve corresponds to the temporally smoothed firing rate using a 30 ms width gaussian kernel).

λc(t)=exp(μc+αcS(t)+βcn(t)),(c=1,,4),

where ñ(t) denotes the number of spike counts across all spike trains during the previous 100 ms prior to the current time index t, and the parameters of individual spike trains are set as follows:

μ1=3.5,μ2=4.0,μ3=3.8,μ4=3.8,α1=7.0,α2=8.0,α3=7.6,α4=7.6,β1=0.06,β2=0.05,β3=0.03,β4=0.05.

For the simulated hidden latent process, the total duration is T = 30 s, and the number of jumps varies from 35 to 45, yielding an average occurrence rate of state transitions of about 80 min−1. Furthermore, we assume that the sojourn time durations for both UP and DOWN states follow a log-normal distribution. For the UP state, the survival time length is randomly drawn from logn(−0.4005, 0.8481) (such that the mean and median value of the duration are 0.67 s and 0.96 s, respectively), with a lower bound of 0.15 s; and for the DOWN state, the survival time length is randomly drawn from logn(−1.9661, 0.6231) (such that the mean and median value of the duration are 0.14 s and 0.17 s, respectively), with a lower bound of 0.05 s.

To test the discrete-time HMM model, the spike trains are binned in 10 ms and collected by spike counts. We employed the EM algorithm with the following initialization parameters: π0 = π1 = 0.5, P00 = P11 = 0.9, P01 = P10 = 0.1. For the synthetic data, the EM algorithm typically converges within 200 iterations. The forward-backward algorithm computes all necessary sufficient statistics. Upon convergence, the Viterbi algorithm produced the ultimate state sequence output, yielding an average decoding error rate of 1.52% (averaged over 10 independent runs). In this case, since the CIF model is given (no model mismatch issue is involved), the decoding error rate is reasonably low even with the discrete-time HMM. As a comparison, we also employed the threshold-based method (Ji & Wilson, 2007; see appendix B for brief descriptions) to classify the UP and DOWN states using the simulated spike trains. It was found (see Table 3) that the discrete-time HMM method yields better performance than the threshold-based method. Figure 3 plots a snapshot of hidden state estimation obtained from the discrete-time HMM in our experiments.

Table 3.

Comparison of Experimental Results on the Simulation Data.

State Estimation Error Rate

Method Mean ± SD Best Worst
Threshold based 2.91 ± 0.31% 2.41% 3.48%
Discrete HMM-EM 1.52 ± 0.34 1.07 2.07
Continuous MCEM 1.26 ± 0.42 0.74 1.95

Note: Mean performance is averaged over 10 independent random trials.

Figure 3.

Figure 3

A snapshot of UP and DOWN state estimation obtained from the discrete-time HMM for the simulated spike train data.

Next, we applied the MCEM algorithm to refine the latent state estimation in the continuous-time domain. Naturally, with a smaller bin size, the continuous-time model allows precisely segmenting the UP and DOWN states for identifying the location of state transition. With the initial conditions obtained from the discrete-time EM algorithm, we simulated the Markov chains for 20,000 iterations and discarded the first 1000 iterations (“burn-in” period). For the synthetic data, the MCEM algorithm converged after 20 to 30 iterations, and we were able to further improve the decoding accuracy by reducing the average error rate to 1.26%. As seen in Table 3, the continuous-time model outperformed the discrete-time HMM model in terms of the lower estimation error. However, in terms of estimating the correct number of state transitions, the HMM obtained nearly 100% accuracy in all 10 Monte Carlo trials (except for two trials that miscount two more jumps); in this sense, the HMM estimation result can be treated as a very good initial state as the input to the continuous-time semi-Markov chain model. In addition, the continuous-time model yields a 10 times greater information transmission rate (1 bit/ms) than the discrete-time model (1 bit/10 ms). We also computed the probability distribution statistics of the UP and DOWN states from both estimation methods. In the discrete-time HMM, we used the sample statistics of the UP and DOWN state durations as the estimated results. These results were also used as the initial values for the continuous-time semi-Markov process, where the MCEM algorithm was run to obtain the final estimate. The results are summarized in Table 4.

Table 4.

Duration Length Statistics of the UP and DOWN States from the Simulation Data.

True Sample Statistics
(from HMM)
Estimated
(from MCEM)
Mean (UP) −0.4005 −0.4468 −0.4212
SD (UP) 0.8481 0.6827 0.7735
Mean (DOWN) −1.9661 −2.1708 −2.0256
SD (DOWN) 0.6231 0.6335 0.6301

Once the estimates of {S(t)} and {μc, αc, βc} become available, we compute λc(t) for the simulated spike trains in continuous time (with Δ = 1 ms). Furthermore, the goodness-of-fit tests are employed to the rescaled time series, and the KS plots and the autocorrelation plots for the simulated spike trains are shown in Figure 4. As seen from the figure, these plots fall almost within the 95% confidence bounds, indicating the model fit is sufficiently satisfactory.

Figure 4.

Figure 4

The fitted KS plots (top row) and autocorrelation plots (bottom row) for the four simulated spike trains from one Monte Carlo experiment (dotted and dashed lines in the plots indicate the 95% confidence bounds).

Finally we also do extra simulation studies by examining the sensitivity of different methods regarding the change of two factors: the modulation gains of the hidden state and the number of observed spike trains. The first issue examines the impact of the global network activity during the UP state, that is, the αc component appearing in λc(t). Specifically, we modify the gain parameters of individual spike trains (while keeping remaining parameters unchanged) as follows:

α1=6.7,α2=7.8,α3=7.3,α4=7.3,

such that each λc is reduced about 20% during the UP state period. It appears that the average performance of the threshold-based method degraded from the original 2.91% to 3.62% (with a trial-and-error selected threshold), while the performance of the probabilistic models remained almost unchanged. This is partly because of the fact that a correct generative model is used and the uncertainties of the hidden state were taken into account during the final estimation (see Figure 5). In the meantime, if αc is decreased more and more, the mean MUA firing rate will be significantly decreased, the rate difference between UP and DOWN periods is reduced, and therefore the ambiguity between them increases. At some point, it can be imagined that all methods will break down unless the bin size is increased accordingly (at the cost of loss of accuracy in the classification boundary). Due to space limitations, we do not explore this issue further here.

Figure 5.

Figure 5

Snapshot illustrations of simulated synthetic spike trains and the estimated state posterior probability from the (a) HMM and (b) continuous-time semi-Markov model (b). The shaded area denotes the posterior probability of the hidden state being in an UP state. The estimation error rates (compared with the ground truth) in these two cases are 1.9% and 1.4%, respectively.

The second issue examines the estimation accuracy of the missing variable against the number of observed variables. It is expected that as more and more observations are added, the resulting discrepancy between the threshold-based method and the probabilistic models will also become smaller, since the uncertainty of the hidden state is less (or the posterior of the hidden variable is larger with more spike train observations). In our simulations, we did extra experiments by either reducing or increasing the number of simulated spike trains, followed by rechecking the results across different setups for different methods. The estimation error results are summarized in Figure 6. Specifically, for the threshold-based method, as more and more spike trains are added, its estimation error gradually improves. This is expected since the threshold selection criterion (see appendix B) heavily depends on the number of the spike train observations, and adding more spike train observations help to disambiguate the boundary between the UP and DOWN states. Meanwhile, for the probabilistic models, the estimation performance either slightly improves (in the discrete-time HMM) or remains roughly the same (in the continuous-time model). This is partly because adding more observations will also increase the number of parameters to be estimated in the continuous-time model, so the difficulty of inference also increases; whereas the HMM performance is likely to saturate quickly due to either the insufficiency of the model or the local minimum problem inherent in the EM algorithm. This observation implies that the probabilistic models are particularly valuable when the number of spike train observations is relatively small and that the simple threshold-based method becomes more and more reliable in terms of estimation accuracy—yet its performance is still worse than that of two probabilistic models. This is probably because it is difficult to find an optimal kernel smoothing parameter or the two threshold values (see appendix B). Moreover, the threshold-based method cannot produce the statistics of interest (e.g., posterior probability, transition probability).

Figure 6.

Figure 6

The estimation error comparison of different methods by varying the number of spike train observations (the statistics are computed based on five independent simulated trials). In all conditions, the spike trains are generated using the same conditions: μc = −3.6, αc = 7.2, and βc = 0.05.

4.2 Real-World Spike Train Data

Next, we apply our models to validate some real-world simultaneously recorded spike trains collected from one behaving rat (Vijayan, 2007), where the MUA, cortical and hippocampal EEGs, and EMG have been simultaneously recorded (see Figure 7 for a snapshot). We presented the spike train data of a single animal (in one day), recorded from the primary somatosensory cortex (SI) during SWS. Neurophysiological studies of neural spike trains and EEGs across different rats and different recording days, as well as the comparison between the cortex and hippocampus, will be presented elsewhere. In this study, 20 clearly identified cortical cells from eight tetrodes were recorded and sorted. All spiking activity from these eight tetrodes was used to determine the UP and DOWN states.

Figure 7.

Figure 7

A snapshot of recordings of cortical MUA, raw cortical EEG, cortical theta wave (4–8 Hz), cortical delta wave (2–4 Hz), raw hippocampal EEG, hippocampal ripple power (more than 100 Hz), hippocampal theta wave, and EMG.

For this study, we selected about 15.7 minutes of recordings from a total of 11 (interrupted) SWS periods of one rat,11 where the mean ± SD length of SWS periods is 85.7 ± 35.8 s (maximum 156.1 s, minimum 30.7 s). We pulled out the multiunit spikes from eight tetrodes (no spike sorting is necessary here). For each spike train (from one tetrode), we empirically chose the following CIF model, as defined in the continuous-time domain:12

λc(t)=exp(μc+αcSt+γc0teβcτdN(tτ)dτ)exp(μc+αcSt+γc0τ¯eβcτdN(tτ)dτ)exp(μc+αcSt+γcβcTNtτ¯:t),

where the exponential decaying parameter βc is initially set to a value that lets e−βcτ ≈ 0 for τ > τ̄ = 100 ms, which leads to the second line of approximation; the third line of approximation appears when we replace the continuous convolution with a discrete vector product, in which β denotes the vector containing 100 coefficients sampled from e−βcτ with a 1 ms temporal resolution, and Ntτ̄:t denotes the vector containing 100 0 or 1 elements that indicate, respectively the absence or presence of spikes. For the initial values, we set γc = 0 01 for all spike trains; μc and αc were hand-tuned based on a small data set.13

Since θ will be largely dependent on Inline graphic in the MCEM algorithm, a sensible choice of initial state Inline graphic(0) is important for the convergence of the MCMC sampler. We initialized the state with the results obtained from the discrete-time HMM (10 ms bin size) and interpolated the intermediate missing values in the continuous-time domain (1 ms bin size). The rate parameter defined for the HMM (see equation 2.5) was assumed as follows:14

λk=exp(μ+αSk+β1nk2:k1+β2nk4:k2+β3nk6:k4),

and nk−2:k−1 defines the number of spike counts (across all spike trains) within the previous 10 ms time interval prior to time index k or tk (with bin size 10 ms). Therefore, the spiking history dependence is described by the number of spike counts in the past three history windows: 10–20 ms, 20–40 ms, and 40–60 ms. The initial parameters were set as μ = −0.5, α = 1, β ≡ [β1, β2, β3]T = 0. The discrete-time HMM converged after about 100 iterations when the log likelihood stops to increase. After that, we ran the Viterbi algorithm to obtain an initial guess of {n, τ, χ} for the continuous-time model. It is assumed that if Sk(0)=Sk+1(0), the same state spans the region [kΔ, (k + 1)Δ] (Δ = 10 ms), while if Sk(0)Sk+1(0), then a single jump occurs at time (k + 0.5)Δ. Furthermore, we initialized the parameters of individual spike trains that were obtained from a GLM fit (based on about 500 ms of empirical data analysis; see note 12). The HMM estimation results are summarized in Tables 5 and 6. Since there is no ground truth about the latent process for the real-world data, we compared the HMM's state estimate with that obtained from the threshold-based method. It appears that the HMM tends to discover more state transitions than the threshold-based method (see Table 5), some of which might be false alarms. Figure 8 presents an illustrated example.15 In order to determine which estimation result (from both methods) is correct, we might require other available information (such as the cortical EEG or hippocampal EEG) to help determine the “true” state.16 Direct comparison of different methods is difficult for real data since there is no single ground truth. Typically it was found that the HMM method yields more frequent state jumps than the threshold method (see Table 5); this is simply due to the fact that while estimating the hidden state, the algorithm does not consider the neighboring state information and evaluates only the individual likelihood within each single time interval (of 10 ms); this tends to yield many single-state jumps with short durations. In contrast, the threshold-based method is designed to merge those short silent intervals with their neighboring sojourns (see step 3 in appendix B). However, the selection of the threshold is rather ad hoc, and the classification results require many hand-tuned parameter setups (such as kernel smoothing, bin size, and minimum SWS cut-off length), which does not offer a robust and consistent criterion across different data sets.

Table 5.

State Estimation Discrepancy Between the Proposed Algorithms and the Threshold-Based Method for the Real-World Spike Trains Data.

Algorithm Discrepancy Percentage Number of Jumps, n Bin Size
Threshold-based 2986 10 ms
Discrete HMM-EM 4.42% 3223 10 ms
Continuous MCEM 3.04% 2576 1 ms

Table 6.

Estimated Statistics of the UP and DOWN State Durations (in msec) for the Real-World Spike Train Data.

UP State DOWN State


Sojourn Duration Length Threshold Based HMM-EM MCEM Threshold Based HMM-EM MCEM
Minimum 40 40 58 40 20 53
Maximum 8510 4450 5059 270 290 302
Median 510 330 446 60 60 77
Mean ± SD 739 ± 727 507 ± 493 644 ± 638 67 ±31 74 ± 36 83 ± 37

Figure 8.

Figure 8

Real-world MUA spike trains of eight tetrodes recorded from the primary somatosensory cortex of one rat (note that each tetrode might contain varying number of single cells). (a) A selected 5 s segment of the MUA spike trains during SWS and its UP and DOWN state classification via the threshold-based method (segmented by the thick solid line). (b) The hidden state estimation result obtained from the discrete-time HMM (used as the initial state for the continuous-time RJMCMC sampler). (c) The hidden state estimation obtained from the MCEM algorithm. In this example, the MCEM algorithm merged several neighboring sojourns that were decoded differently from the HMM.

In order to choose a proper parametric model for the sojourn time duration for the UP and DOWN states, we used the state classification result from the discrete-time HMM (see Figure 9). Based on the histogram data analysis, we chose the exponential pdf as the probability model for the sojourn duration of the DOWN state and the log-normal pdf as the probability model for the sojourn duration of the UP state. We also computed their sample statistics (mean, SD) and used them as the initial parameters for the continuous-time probabilistic model. The lower bounds for the UP and DOWN state duration lengths are both set as 40 ms.

Figure 9.

Figure 9

Fitting the real-world sojourn-time duration length for the DOWN and UP states, where the UP or DOWN state classification is obtained from the discrete-time HMM estimation result. (Left panels) Histograms. (Right panels) Fitting the sojourn durations with exponential (for the DOWN state) and log-normal (for the UP state) distributions. If the sample data fit the tested probability density, the data points will approximately match the straight line in the plot.

Since the recording time of the real-world spike trains data is rather long (about 60 times longer than the synthetic data), the computational overhead is much greater for the MCEM algorithm. In RJMCMC sampling, 300,000 iterations were run to simulate the Markov chain,17 and the first 3000 iterations were discarded as the burn-in period. After that, we fed the obtained parameter estimates using the complete data set. After an additional 100 MCEM iterations, the algorithm converged (when the iterative log-likelihood increase is sufficiently small), and we obtained the final parameter estimates. With these estimates, we simulated another 1000 realizations of the semi-Markov chain Inline graphic and used them for the final hidden state reconstruction (see equations 3.22 and 3.23). The convergence plots of the semi-Markov chain and the MCEM algorithm are shown in Figure 10.

Figure 10.

Figure 10

Convergence plot of the simulated Markov chain. (a) Trajectory of the number of state jumps (inset: the trajectory within the first 1000 iterations). (b) Trajectory of the log likelihood in running the MCEM algorithm.

Several noteworthy comments are in order:

  • As a comparison, we also used the estimated hidden state {S(t)} to fit a GLM model (using glmfit function in Matlab, Δ = 1 ms) by modeling the history dependence with eight discrete windows (1–5, 5–10, 10–15, 15–20, 20–30, 30–40, 40–50 ms). Upon fitting the GLM model, we obtained the estimated spiking history dependence coefficients for the individual spike trains (see Figure 11); as seen from the results, their curves all have an approximately exponential-decaying shape. Finally, the KS plots and autocorrelation plots are shown in Figures 12 and 13, respectively. Overall, the goodness of fit is quite satisfactory.

  • In comparison with the discrete-time HMM-EM method (see Tables 5 and 6), the continuous-time MCEM method yields less frequent state jumps. As a consequence, the MCEM result is accompanied with less short sojourn durations since it allows a potential merge of neighboring sojourns during the RJMCMC procedure (see move type 3 in appendix A) that considers the joint likelihoods of the neighboring sojourns. Furthermore, in comparison with the threshold-based method, the continuous-time semi-Markov model is more powerful in representing the uncertainty as well as inferring the underlying neural dynamics. Its estimated model parameters (the shape of the transition and duration probability density) might reveal the some neural mechanism or physiology behind the transitory dynamics (e.g., the inhibitory period after the last transition event). In our experiments, the MCEM method obtained the lowest estimate of the number of state transitions (see Table 5), yielding a transition occurrence rate about 82 min−1 (slightly greater than the rate reported in visual cortex; Ji & Wilson, 2007). Despite its flexibility, the MCEM method is much more computationally intensive than the HMM-EM method. The implementation of HMM is simpler and has a faster convergence speed (the EM algorithm typically converged within 100–200 steps, although the local maximum problem remains). In contrast, the MCEM method relies on simulation of state sequences at every iteration and is required to evaluate the point-process joint likelihood (see equation 3.5) for each possible move. The calculation of every single spike's likelihood contribution is time-consuming and is a bottleneck in the computation. In addition, the convergence speed of the MCEM algorithm becomes slower in the end. This is because when the Markov chain gradually approaches the equilibrium, many moves are rejected and a small modification of the hidden state Inline graphic or the parameter θ would not change very much in terms of the joint log likelihood of the data. This can be seen in the flat plateau of the log-likelihood curve near the end of the convergence in Figure 10b. For the real-world spike train data, the simulation of Markov chain needs to be very long in order to pass through all of move possibilities, especially if the number of potential state transitions is large (here, on the order of thousands). Even so, no optimum stop criterion can be provided with a guarantee; hence, the trade-off between the computation cost and the estimation accuracy remains in any Monte Carlo optimization problem.

  • To compare these three classification methods, we also computed the cortical EEG averages (mean ± standard error of mean) triggered by the their UP state starting and ending time stamps, respectively (recall note 15). The results are compared in Figure 14. Although the figures all look similar (due to large timescale), on close examination of the plots, it appears that the EEG averages from the MCEM method result in a more accurate detection of the onset of the UP state.

  • Given the MUA spike train data analyzed for the behaving rat, the latent process St stays longer during the UP state than the DOWN state, indicating that the population neurons remained dominantly active during SWS.18 Whether these neuronal firing patterns contain any “memory replay” compared with the earlier firing pattern during the RUN behavior will be the subject of future investigation.

Figure 11.

Figure 11

(Left panel) Estimated exponential decaying filters e−βcτ for the recorded spike trains shown in Figure 8. (Right panel) Estimated history dependence coefficients estimated for the eight spike trains (based on GLM fit using seven discrete windows of history spike counts: 1–5, 5–10, 10–15, 15–20, 20–30, 30–40, 40–50 ms). The estimated history-dependent firing coefficients exhibit an exponential-like decaying curve (for all eight spike trains).

Figure 12.

Figure 12

Fitted KS plots of the real-world MUA spike trains (dotted lines along the diagonal indicate the 95% confidence bounds).

Figure 13.

Figure 13

Autocorrelation plots for the real-world MUA spike trains (dashed lines indicate the 95% confidence bounds).

Figure 14.

Figure 14

Cortical EEG averages (mean ± standard error of the mean, shown by trace width) triggered by the classified UP state start and end time stamps (for visualization purposes, the standard error of the mean in all plots is amplified by 10 times its original value). From top to bottom: Results from the threshold-based method, the HMM method, and the MCEM method. The start of the UP state is aligned with the K-complex signal that has a biphasic wave switching from a negative dip to a positive peak, which lasts about 200 ms.

4.3 Firing Pattern Analysis Within the UP States

As observed from the analysis of the recorded multiple spike trains, the somatosensory cortical neurons undergo near-synchronous transitions between the UP and DOWN states, from every tens of milliseconds to a few seconds or so. The neuronal firing activities inside the UP state are mainly characterized by duration length and spiking rate. It would be interesting to see if there are any firing “patterns” embedded in these UP-state periods, in either multiunit or single-unit activity (e.g., Luczak et al., 2007; Ji & Wilson, 2007).

On estimating the latent state process, we obtain two features: one is the log (natural basis) of duration, the other the number of spikes per second per tetrode. After collecting these two features from the experimental data shown earlier, we resort to the clustering tool for feature visualization. The soft-clustering algorithm we use here is a greedy clustering algorithm (Verbeek, Vlassis, & Kröse, 2003) based on fitting a gaussian mixture model. In the greedy learning algorithm, the optimal number of mixtures is automatically determined during the learning process. The algorithm was run 20 times, and the best result (with the highest log likelihood) was chosen (see Figure 15). Hence, the neuronal firing pattern can be characterized by a finite number of parameters (mean and covariance), from which we can compare the different neuronal firing across different animals, different days, different brain regions (cortex versus hippocampus), different sleep stages (Pre-RUN versus Post-RUN), and so on. Since this article mainly focuses on the statistical modeling methodology, further quantitative data analysis and its link to neurophysiology will be presented and discussed elsewhere.

Figure 15.

Figure 15

Gaussian mixture clustering for the two firing features (log of duration and number of spikes per second). Here, the optimal number of mixtures is 3; the ellipses represent the two-dimensional gaussian shapes with different covariance structures.

5 Discussion

5.1 Model Mismatch or Misspecification for the Spike Train

When investigating the real-world recording spike trains data, an important task of computational modeling is to identify the functional form of CIF (see equations 2.3 and 3.3 or 3.4). Unlike the simulated spike train data, the CIF of the real-world spike trains is not known in advance and needs to be identified before the probabilistic inference is carried on. In this article, for simplicity, we have presumed that the CIF can be approximated by a GLM (McCullagh & Nelder, 1989; Truccolo et al., 2005) which includes the hidden state and firing history as variables. We have also assumed that the spike trains across tetrodes are mutually independent. Most likely, the neuronal spiking is influenced not only by its own firing history but also by the other spike trains. Despite these simplifications, we think the models presented here still serve as a valuable first step to represent the temporal dynamics of the observed MUA spike trains. Above all, to quote George Box, “All models are wrong, but some are useful.” In addition, theoretically, given sufficient data and under some regular conditions (Pawitan, 2001), the MLE for a GLM is consistent even when the model (e.g., the link function) is chosen incorrectly (Paninski, 2004). From a practical point of view, we have ignored the possible estimation bias here.

For the real-world spike train data, there is no ground truth available for Inline graphic. A common practice is to select a small data set, and the UP and DOWN states are first identified by the threshold-based or the HMM method and reconfirmed by human inspection (with extra help of EEG measurements). Based on that information and the assumption that the CIF might be identified by a GLM, we can use the GLM fit for model selection. The model fit would be shown by the deviance and validated by the KS test. If the KS plot falls inside the 95% confidence intervals, it indicates that the CIF model fits well with the given spike train data. Unfortunately, in practice, this is not always the case given only a limited amount of the observed data and an economical size of parameter space for the GLM, indicating a lack of discrepancy between the model and the data.

5.2 Discrete Probability Model for the Sojourn Time

In this article, the sojourn time survival function for the UP and DOWN states is assumed and modeled as being continuous and parametric. More generally, if the histogram analysis of the data indicates that the true distribution is far away from any parametric (exponential or nonexponential) probability density, we might also employ a discretized probability model for the survival probability of the sojourn time. Specifically, let [a, b] denote the range for the sojourn time; we may split the range evenly into L bins and model the discrete probability at each piece as Pi(x) = Pr{a + (i − 1)(ba)/Lx < a + i(ba)/L} (i = 1, 2, …, L). Then the probabilistic model of the sojourn time will be fully characterized by two sets of the parameters, {Piup} and {Pjdown}, where i=1LPiup=1 and j=1LPjdown=1. In this case, the inference algorithm will be slightly different in that the M-step of the MCEM algorithm will be modified with a reestimation procedure (see equation 3.18), but the E-step remains unchanged.

5.3 Adding Intermediate States

Although in this article, we have exclusively discussed a two-state (0 and 1) Markov model, it is easy to extend the framework to a general N-state Markovian model. Indeed, it is quite possible to add an intermediate state between DOWN and UP as the transitory state. The reason for this argument arises from the observation from the real-world MUA spike trains: in many circumstances, there is no clear evidence that the MUA are either up-modulated or completely silent. Nor does the EEG in any obvious fashion help to differentiate these ambiguous periods. Unfortunately, how to define a transitory state presumably remains a nontrivial problem, and no attempt was made to explore this direction here.

5.4 Fully Bayesian Inference

In the MCEM algorithm discussed above, we consider Monte Carlo sampling only in the E-step, whereas the M-step uses a standard deterministic optimization procedure. Potentially we can use the MCMC procedure for both state and parameter estimation {Inline graphic(k), θ(k)} for k = 1, 2, …, M, from which we can obtain the full posterior distribution p(Inline graphic, θInline graphic) instead of the marginal p(Inline graphicθ, Inline graphic). Take, for example, the parameters associated with the sojourn time pdf; we can define the gamma prior for the exponential distribution or a conjugate prior for the inverse gaussian (Banerjee & Bhattacharyya, 1979); for the parameters associated with the CIF model, we may define a gaussian prior. In this case, the M-step would be replaced by iterative Gibbs sampling. The detailed exploration of such a fully Bayesian inference approach, however, is beyond the scope of this article.

5.5 Limitation of Our Approach

There are several obvious assumptions used in our statistical modeling approach. First, the statistical mutual independence is assumed across neural spike trains, without explicit modeling of the recurrent network activity.19 Second, the observed data are assumed to be stationary in the sense that the state transition and the CIF parameters are estimated from a long period of spike train recordings when those parameters are assumed to remain constant. Finally, an identical CIF model is also assumed across all neural spike trains. Nevertheless, these limitations by no means diminish the value of the models and methods proposed here, since this article can be treated as a pilot study toward the ultimate modeling goal.

5.6 Future Work

We have considered several future investigation efforts in the line with the work reported here. From a computational modeling point of view, we can extend the model by including continuous-valued observations (e.g., Srinivasan, Eden, Mitter, & Brown, 2007). For instance, the LFP or EEG measurements have been simultaneously recorded from both cortical and hippocampal regions. The detection of K-complexes from the cortical EEG and detection of the sharp wave-ripple complexes (SPW-Rs) from the hippocampal EEG would be beneficial to the identification of UP and DOWN states (Siriota et al., 2003; Battaglia et al., 2004; Ji & Wilson, 2007). Furthermore, it is possible to build a more complex SSM by allowing both continuous- and discrete-valued hidden variables—for instance, a switching SSM where the two latent processes interact with each other (e.g., Ghahramani & Hinton, 2000; Srinivasan et al., 2007).

From a neurophysiological point of view, we are also interested in studying the population neuronal firing causality and latency between the cortex and hippocampus, as well as their spike patterns relevant to the rat's RUN behavior. It is well known that sleep is a key factor that may promote the transfer of memory from the hippocampus to the cortex, and during sleep, replays in these two regions occur synchronously (Mehta, 2007; Ji & Wilson, 2007). Based on the extracelluar recordings (MUA and LFP), it would be interesting to investigate the UP and DOWN activities during multiple processing stages and sites in the cortico-hippocampal circuits, and the UP and DOWN state transitions can be used to quantify the functional connectivity of the neural circuits. An in-depth exploration of the LFP (e.g., K-complexes, SPW-Rs), with single- and multiunit firing activities in both cortical and hippocampal regions, would be key to understanding memory consolidation during sleep.

6 Conclusion

We have developed both discrete- and continuous-time probabilistic models and inference algorithms for inferring population neurons' UP and DOWN states, using the MUA spike trains. Compared to the deterministic threshold-based method (see appendix B) used in the literature, our probabilistic paradigms offer a stochastic approach to analyze the spike trains as well as provide a generative model to simulate the spike trains. Furthermore, the hidden state estimate is treated as a random variable with certain uncertainty (encoded by its posterior probability), whereas the threshold-based method cannot represent such uncertainties.

The discrete-time HMM provides a reasonable state estimate with a rather fast computing speed. However, the model is restricted to locate the UP and DOWN state transition with a relatively large time bin size (here, 10 ms). Another drawback of the HMM is that it is prone to get stuck in the local solution; in other words, the number of state transitions typically remains unchanged after a few EM iterations. In contrast, one advantage of the continuous-time probabilistic model is that it allows estimating the exact locations of state jumps. By using the RJMCMC sampling technique, the number of jumps as well as the locations of the jumps are allowed to be modified during the inference procedure, which offers a way to escape from the local minimum and tackle the model selection problem. The only shortcoming of the RJMCMC method is its greater computational complexity and the tremendous demand of computational power. In practice, the number of steps required to reach equilibrium often demands sensible initial conditions and diagnostic monitoring during the convergence process. We found that the inference solution obtained from the discrete-time HMM yields a reasonable initial state sequence to feed into the MCMC sampler. Once the number and the locations of the state jumps are determined, we can use the Monte Carlo statistics to infer the latent process. For practitioners who are more concerned about the processing speed than the accuracy of hidden state estimation, the discrete-time HMM might offer a reasonable guess (depending on the data characteristics). Nevertheless, no claim is made here that our proposed models and algorithms could always produce a correct UP or DOWN state classification result. The final justification might still rely on careful human inspection, but our estimation results certainly provide a good starting point with high confidence for follow-up.

In analyzing the simultaneously recorded spike trains, identifying an accurate CIF model is crucial to the probabilistic inference. However, there is no free-lunch-recipe to obtain the ultimate answer. In practice, it often requires some empirical data analysis (Brown, Kass, & Mitra, 2004; Kass, Ventura, & Brown, 2005), such as the interspike interval histogram, firing-rate trend dependence analysis, or fitting a GLM (Truccolo et al., 2005) or a non-Poisson model (Barbieri, Quirk, Frank, Wilson, & Brown, 2001; Brown et al., 2003).

Finally, we hope that our proposed statistical models can shed some light on developing physiologically plausible mechanistic models. A better understanding of the transition mechanism between the UP and DOWN states would also help to improve the statistical description of the data.

Acknowledgments

The research reported here was supported by NIH/NIDA grant R01-DA015644 to E.N.B. and M.A.W., an NIH Director Pioneer Award DP1-OD003646 to E.N.B., and an NIH/NHLBI grant R01-HL084502 to Z.C. and R.B.S.V. was supported by NIH institutional NRSA grant T32 HL07901. We thank R. Haslinger, D. Ji, and D. Nguyen for some helpful discussions. We also thank two anonymous reviewers for their valuable comments that helped to improve the presentation of this article.

Appendix A: Reversible-Jump MCMC

A.1 Background

Reversible-jump MCMC (RJMCMC) is a Metropolis-Hastings-type sampling algorithm with a transdimensional proposal. The term reversible jump refers to the ability of the Markov chain to “jump” between two parameter spaces that have different dimensions. The sampler explores the state spaces of variable dimensionality by various modifications through the Metropolis-Hastings proposals. Each Metropolis-Hastings proposal has a respective reverse proposal. For every proposal, the acceptance probability is computed according to a certain rule. The goal of the RJMCMC algorithm is to design efficient moves that allow the simulated Markov chain to reach the desired equilibrium (posterior) distribution within a reasonable amount of time. Unlike the fixed-dimensional MCMC algorithms, RJMCMC allows the state transitions to occur between spaces with different dimensions, say, Inline graphicInline graphic′, where dim(Inline graphic) ≠ dim(Inline graphic′).

In this appendix, we present a detailed elaboration of the RJMCMC algorithm in the context of simulating a continuous-time (semi-) Markov chain for the problem. In the following, we use similar notations and formulations of Ball et al. (1999).

A.2 Derivation

Let n denote the number of jumps between two distinct discrete states in the latent process {S(t); 0 ≤ tT}, where S(t) ∈ {0,1}. Let Inline graphic = (n, τ, χ) be a triplet of the (semi-) Markov process, where τ = (τ0, τ1, …, τn) is a vector that contains the duration of the sojourn time of Inline graphic, and χ = (χ0, χ1, …, χn) represents the states visited in these sojourns. Let ν0 = 0, νj=i=0j1τi(j=1,2,,n+1), and νn+1 = T. Furthermore, we assume that for both UP and DOWN states, the sojourn time duration τ has a lower bound a (τa > 0) but no upper bound; consequently, the associated pdf has to be rectified accordingly.

The following seven types of moves are considered in the Metropolis-type proposal:

  1. Move a boundary between two successive sojourns of Inline graphic. First, decide which boundary to move by sampling j uniformly from {0, 1, …, n − 1}. Let χj = i1, χj+1 = i2, and then we have two alternative sampling options:20

    • Sample u from a uniform distribution Inline graphic(ai1, τj + τj+1ai2), where ai1, ai2 denote the lower-bound constraints of the sojourn time of states χj and χj+1, respectively. The proposal Inline graphic′ is obtained by moving the boundary between the jth and (j + 1)st sojourns from νj+1 to νj + u. In this case, Inline graphic′ = (n′, τ′, χ′), where τj=u,τj+1=τj+τj+1u.

    • Sample u from a gaussian distribution Inline graphic(0, σ2), where σ2 denotes the (user-specified) variance parameter. The proposal Inline graphic′ is obtained by moving the boundary between the jth and (j + 1)st sojourns from νj+1 to νj+1 + u. In this case, Inline graphic′ = (n′, τ′, χ′), τj=τj+u,τj+1=τj+1u,

    For both sampling options, n′ = n, τl=τl(lj,j+1), and χl=χl(l=1,2,,n).

  2. Insert a sojourn in one of the existing sojourns of Inline graphic. First, sample l* uniformly from {0, 1, …, n}, and let i = χl*. Determine the state for the inserted sojourn by sampling j from the probability 1(ji, θ) (ji). In the two-state space, 1(ji) = 1 (i.e., deterministic). Sample u from a uniform distribution Inline graphic(ai, τl*ai) (where ai denotes the lower bound of the sojourn time for state i), and then sample v from the censored version of the exponential (or log-normal or inverse gaussian) distribution with parameters θjrj (or θj ≡ {μj, σj} for log normal, or θj ≡ {μj, sj} for the inverse gaussian) truncated at τl*uai, that is, from the distribution that has the following conditional pdf (vu)
    p(vu)=pv(θj;v)Fv(θj;τluai)Fv(θj;aj)×(ajvτluai), (A.1)

    where aj denotes the lower bound of the sojourn time for state j. Then a sojourn in state j (ji) of length v is inserted in the l*th sojourn of Inline graphic. And the new proposal Inline graphic′ = (n′, τ′, χ′) is given by n′ = n + 2, (τl,χl)=(τl,χl)(l=0,1,l1), τl=u, τl+1=v, τl+2=τluv, (τl,χl)=(τl2,χl2)(l=l+3,l+4,,n+2), χl=χl+2=χl(=i), χl+1=j. Note that if τl*u − 2ai < aj, move 2 will not be executed.

  3. Delete an intermediate sojourn of Inline graphic whose two adjacent sojourns are in the same state (namely, merge one sojourn with its neighboring sojourns). Sample l* uniformly from {0, 1, …, n − 1}, and delete the l*th sojourn of Inline graphic. The new Inline graphic′ is given by n′ = n − 2, (τl,χl)=(τl,χl)(l=0,1,,l2), τl1=τl1+τl+τl+1,χl1=χl1, (τl,χl)=(τl+2,χl+2)(l=l,l+1,,n2).

  4. Split the first sojourn of Inline graphic. First, let i = χ0, and sample u from the uniform distribution Inline graphic(aj, τ0ai), where ai and aj denote the lower-bound constraints of the sojourn time for states i and j, respectively (ji; in the two-state case, the choice of j is deterministic). The new proposal Inline graphic′ is given by n′ = n + 1, τ0=u, χ0=j, τ1=τ0u, χ1=i, (τl,χl)=(τl1,χl1)(l=1,2,,n1). Note that if τ0 < ai + aj, move 4 will not be executed.

  5. Delete the first sojourn of Inline graphic. This move is deterministic. The new proposal Inline graphic′ is given by n′ = n − 1, τ0=τ0+τ1, χ0=χ1, (τl,χl)=(τl+1,χl+1)(l=1,2,,n1).

  6. Split the last sojourn of Inline graphic. First, let i = χn, and sample u from a uniform distribution Inline graphic(ai, τn), where a denotes the lower-bound constraint of the sojourn time for state i. Next, sample j from the distribution 3 (ji) (ji; in the two-state state space, the choice of j is deterministic). The new proposal Inline graphic′ is given by n′ = n − 1, (τl,χl)=(τl,χl)(l=1,2,,n1), τn=u, χn=χn, τn+1=τnu, χn+1=j. Note that if τn < ai, move 6 will not be executed.

  7. Delete the last sojourn of Inline graphic. This move type is deterministic, and the new proposal Inline graphic′ is given by n′ = n − 1, (τl,χl)=(τl,χl)(l=1,2,,n2), τn1=τn1+τn, χn1=χn1.

Of the above moves, moves 2 to 7 are discrete and transdimensional, and move 1 is continuous. For the convenience of technical treatment, we classify the seven moves into three classes: A = {1}, B = {2, 4, 6}, C = {3, 5, 7}, which correspond to boundary move, insertion, and deletion, respectively. Specifically, the individual moves in class B are the respective inverses of those in class C.

The move types chosen for the Inline graphic update are obtained by sampling independently from the distribution of qi (i = 1, 2, …, 7) as follows. First, the class of move type is determined by sampling from the distribution (qA, qB, qC), where, for example, qB represents the probability that a class B move is chosen. Second, if a class B or class C move is chosen, the specific individual move is found by sampling from a distribution (1, 2, 3). By this setup, the probability of qi will be determined by q1 = qA, q2 = qB1, q4 = qB2, q6 = qB3, q3 = qC1, q5 = qC2, q7 =qC3. For the problem here, we may use the following setup: (qA,qB,qC)=(12,14,14), and q1n1n+1, q21n+1, q31n+1, such that the normalization constraint i=17qi=1 is satisfied. The 2 and 3 correspond to probabilities for selecting the first and last sojourns, respectively, and 1 for the other sojourns.

To compute the acceptance probabilities: for i = 1, 2, …, 7, let R(i)(Inline graphicInline graphic′; θ) be the density of the transition kernel associated with the proposal Inline graphic′ for the move type (i) and let qi be the probability of choosing the move type (i). Since moves 2 to 7 are discrete (unlike move type 1), their R(i) (Inline graphicInline graphic′; θ) are probabilities instead of probability densities. Hence, the density of the transition kernel for a new proposal Inline graphic′ is given by

R(SS;θ)=i=17qiR(i)(SS;θ).

Among the seven move proposals, move type 1 does not change the dimension of Inline graphic, and its acceptance probability is Inline graphic = min(1, Inline graphic), where

=p(S,θ,y)R(SS;θ)p(S,θ,y)R(SS;θ). (A.2)

On the other hand, move types 2 to 7 change the dimension of Inline graphic, and their acceptance probabilities are given by Inline graphic = min(1, Inline graphic), where

=p(S,θ,y)R(SS;θ)p(S,θ,y)R(SS;θ)|J|, (A.3)

where |Inline graphic| denotes the determinant of the Jacobian. The Jacobian measures the ratio of the volume of two state spaces.

For presentation convenience, we often factorize the acceptance probability as Inline graphic = Inline graphic1Inline graphic2Inline graphic3 (prior ratio × likelihood ratio × proposal probability ratio),21 where

1=p(S|θ)p(S|θ),2=p(y|S,θ)p(y|S,θ),3=R(SS;θ)R(SS;θ). (A.4)

In the context of two-state continuous-time (semi-) Markov chain that is used in this article, we compute these three probability ratios for these seven moves in the following.

A.3 Move Types

A.3.1 Move Type 1

For the first sampling option, let i1 = χj and i2 = χj+1, and let λi1c=νjνj+1λc(t;Sνj:νj+1=i1)dt, λi2c=νj+1νj+2λc(t;Sνj+1:νj+2=i2)dt, λi1c=νjνj+uλc(t;Sνj:νj+u=i1)dt, and λi2c=νj+uνj+2λc(t;Sνj+u:νj+2=i2)dt, and let yi1c, yi2c, yi1c, and yi2c denote the number of spike counts for spike train c during the time intervals [νj, νj+1], [νj+1, νj+2], [νj, νj + u], and [νj + u, νj+2], respectively. The probability ratios for move type 1 are calculated as follows

1(1)=p(θi1;u)p(θi1;τj)p(θi2;τj+1+τju)p(θi2;τj+1)(option1)2(1)=c=1C{exp(λi1c+λi2cλi1cλi2c)×k=1yi1cλc(tk,Sνj:νj+u)k=1yi2cλc(tk,Sνj+u:vj+2)k=1yi1cλc(tk,Sνj:νj+1)k=1yi2cλc(tk,Sνj+1:νj+2)}(option1)3(1)=R(SS;θ)R(SS;θ)=1,

where (θ; ·) is defined by the censored version of the parametric pdf of the sojourn time (for either UP or DOWN state). The ratios for the second sampling option are conceptually similar, and we show only 1(1) here:

1(1)=p(θi1;τj+u)p(θi1;τj)p(θi2;τj+1u)p(θi2;τj+1)(option2).

A.3.2 Move Type 2

Let i1 = i, i2 = j, and let yi1c, yi1c, yi2c, y^i1c denote the number of spike counts for spike train c during the time intervals [νl*, νl*+1], [νl*, νl* + u], [νl* + u, νl* + u + v], and [νl* + u + v, νl*+1], respectively. Let λi1c=νlνl+1λc(t;Sνl:νl+1=i1)dt, λi1c=νlνl+uλc(t;Sνl:νl+u=i1)dt, λi2c=νl+uνl+u+vλc(t;Sνl+u:νl+u+v=i2)dt, and λ^i1c=νl+u+vνl+1λc(t;Sνl+u+v:νl+1=i1)dt. Then

1(2)=p(θi1;u)p(θi2;v)p(θi1;τluv)p(θi1;τl)2(2)=c=1C{exp(λi1cλi1cλi2cλ^i1c)×k=1yi1cλc(tk,Sνl:νl+u)k=1yi2cλc(tk,Sνl+u:νl+u+v)k=1y^i1cλc(tk,Sνl+u+v:νl+1)k=1yi1cλc(tk,Sνl:νl+1)}3(2)=R(SS;θ)R(SS;θ)=q3n+1q2n+11τl2aip(vu)=τl2aip(vu),

where in computing 3(2), we have used q2 = q3 and p(u)1τl2ai, and (vu) is given by equation A.1. Since move type 2 changes the dimension of the Inline graphic, we need to compute the associated Jacobian's determinant. Note that τ′ is obtained from τ from an invertible deterministic function τ′ ← Inline graphic(τ, u, v) = (τ0, τ1, …, τl*−1, u, v, τl*aiuv, τl*+1, τl*+2, …, τn), whose Jacobian is then given by

|J|=|S(u,v,τlaiuv))S(u,v,τl)|=|uτiuuuvvτivuvv(τlaiuv)τl(τlaiuv)u(τlaiuv)v|=|010001111|=1.

A.3.3 Move Type 3

Moves 3 and 2 are inverses of each other. Let i1 = χl*−1 and i2 = χl*, and let λi1c=νl1νlλc(t;Sνl1:νl=i1)dt, λi2c=νlνl+1λc(t;Sνl:νl+1=i2)dt, λ^i1c=νl+1νl+2λc(t;Sνl+1:νl+2=i1)dt, λi1c=νl1νl+2λc(t;Sνl1:νl+2=i1)dt, and let yi1c, yi2c, y^i1, and yi1c denote the numbers of spike counts for spike train c within the intervals [νl*−1, νl*], [νl*, νl*+1], [νl*+1, νl*+2], and [νl*−1, νl*+2], respectively. It is noted that τ′ can be obtained from an invertible deterministic function Inline graphic(τ) = (τ0, τ1, …, τl*−1, τl*+1, τl*+2, …, τn), and |Inline graphic| = 1. The probability ratios are given as

1(3)=p(θi1;τl1+τl+τl+1)p(θi1,τl1)p(θi2,τl)p(θi1,τl+1)2(3)=c=1C{exp(λi1c+λi2c+λ^i1cλi1c)×k=1yi1cλc(tk,Sνl1:νl+2)k=1yi1cλc(tk,Sνl1:νl)k=1yi2cλc(tk,Sνl:νl+1)k=1y^i1cλc(tk,Sνl+1:νl+2)}3(3)=q2n+11(τl1+τl+τl+1)pv(τlτl+1)q3n+1=pv(τlτl+1)τl1+τl+τl+1

where q2 = q3, and v(τl*τl*+1) is determined from

pv(τlτl+1)=p(θi2;τl)ai2τl+τl+1p(θi2;z)dz.

A.3.4 Move Type 4

Let i1 = j, i2 = i, and let λi2c=0ν1λc(t;S0:ν1=i2)dt, λi2c=0uλc(t;S0:u=i2)dt, λi1c=uν1λc(t;Su:ν1=i1)dt, and let yi2c, yi2c, and yi1c denote the number of spike counts for spike train c observed within the intervals [0, ν1], [0, u], and [u, ν1], respectively. Let πi1 and πi2 denote the prior probabilities of the initial sojourn in state i1 and i2, respectively. Then we have

1(4)=πi1πi2p(θi1;u)p(θi2;τ0u)p(θi2;τ0)2(4)=c=1C{exp(λi2cλi2cλi1c)k=1yi2cλc(tk,S0:u)k=1yi1cλc(tk,Su:ν1)k=1yi2cλc(tk,S0:ν1)}3(4)=q5q41τ0ai2ai1=τ0ai2ai1,

where q4 = qB2 = qC2 = q5. Note that τ′ is obtained from τ from an invertible deterministic function τ′ ← Inline graphic(τ, u) = (u, τ0u, τ1, τ2, …, τn). Then it follows that

|J|=|uτ0uu(τ0u)τ0(τ0u)u|=|0111|=1.

A.3.5 Move Type 5

Moves 5 and 4 are inverses of each other. Let i1 = χ0, i2 = χ1, λi1c=0ν1λc(t;S0:ν1=i1)dt, λi2c=ν1ν2λc(t;Sν1:ν2=i1)dt, λi2c=0ν1λc(t;S0:ν2=i2)dt, and let yi1c, yi2c, and yi2c denote the observed number of spike counts for spike train c within the intervals [0, ν1], [ν1, ν2], and [0, ν2], respectively. Then we have

1(5)=πi2πi1p(θi2;τ0+τ1)p(θi1;τ0)p(θi2;τ1)2(5)=c=1C{exp(λi1c+λi2cλi2c)k=1yi2cλc(tk,S0:ν2)k=1yi1cλc(tk,S0:ν1)k=1yi2cλc(tk,Sν1:ν2)}3(5)=q41τ0+τ1q5=1τ0+τ1,

where q4 = q5. Similarly, τ′ ← Inline graphic(τ) = (τ0 + τ1, τ2, …, τn), and |Inline graphic| = 1.

A.3.6 Move Type 6

Let i1 = i, i2 = j, λi1c=νnνn+1λc(t;Sνn:νn+1=i1)dt, λi1c=νnνn+uλc(t;Sνn:νn+u=i1)dt, λi2c=νn+uνn+1λc(t;Sνn+u:νn+1=i2)dt, and let yi1c, yi1c, and yi2c denote the number of spike counts for spike train c observed within the intervals [νn, νn+1], [νn, νn + u], and [νn + u, νn+1], respectively. Then we have

1(6)=p(θi1;u)p(θi2;τnu)p(θi1;τn)2(6)=c=1C{exp(λi1cλi2cλi2c)×k=1yi1cλc(tk,Sνn:νn+u)k=1yi2cλc(tk,Sνn+u:νn+1)k=1yi1cλc(tk,Sνn:νn+1)}3(6)=q7q61τnai1=τnai1,

where q6 = q7. Similarly τ′ ← Inline graphic(τ, u) = (τ0, τ1, …, τn−1, u, τnu), and

|J|=|uuuτn(τnu)u(τnu)τn|=|1011|=1.

A.3.7 Move Type 7

Moves 7 and 6 are inverses of each other. Let i1 = χn−1, i2 = χn, λi1c=νn1νnλc(t;Sνn1:νn=i1)dt, λi2c=νnνn+1λc(t;Sνn:νn+1=i2)dt, λi1c=vn1vn+1λc(t;Svn1:vn+1=i1)dt, and let yi1c, yi2c, and yi1c denote the observed number of spike counts for spike train c within the intervals [νn−1, νn], [νn, νn+1], and [νn−1, νn+1], respectively. Then we have

1(7)=p(θi1;τn1+τn)p(θi1;τn1)p(θi2;τn)2(7)=c=1C{exp(λi1c+λi2cλi1c)Πk=1yi1cλc(tk,Svn1:vn+1)Πk=1yi1cλc(tk,Svn1:vn)Πk=1yi2cλc(tk,Svn:vn+1)}3(7)=q61τn1+τnq7=1τn1+τn,

where q6 = q7. In addition, we have τ′ ← Inline graphic(τ) = (τ0, …, τn−2, τn−1 + τn) and |Inline graphic| = 1.

A.4 Heuristics for Efficient RJMCMC Sampling

The experimental recordings are relatively long (varying 15–30 minutes for different rats or dates), and the MCMC sampling for the continuous-time model (with 1 ms bin size) is quite time-consuming. We need to design an efficient (problem-specific) sampling procedure for tackling this problem. One important issue is the initialization of state. As discussed earlier, this will be obtained from the estimation result of the discrete-time HMM. Another issue is to design data-driven MCMC proposals (e.g., Tu & Zhu, 2002) that “intelligently” select moves that also satisfy the detailed balance condition. Specifically, we use a few heuristics in carrying out RJMCMC sampling:

  • Move type 1: Given a reasonably initialized state, use option 2 instead of option 1.

  • Move type 2: Implement it favorably for those long sojourn time durations, and execute it only for those sojourn time durations with at least four times the minimum length.

  • Move type 3: Implement it favorably for those short sojourn time durations.

  • Move type 4: Execute it only when the initial sojourn time has at least four times the minimum length.

  • Move type 6: Execute it only when the final sojourn time has at least four times the minimum length.

As far as the current UP and DOWN estimation problem is concerned, move types 1 and 3 are the most important ones. When implementing move type 3, we employ a heuristic importance sampling trick. Specifically, the probability of choosing a sojourn time to merge with its neighboring sojourns is inversely proportional to the sojourn length: the shorter the duration, the more likely to be picked out to be merged. Similarly, this trick can be utilized in move type 2 to determine where to split a DOWN state sojourn. The probability of the selected position will be inversely proportional to the observed number of instantaneous MUA spike counts.

A.5 Special Example

In what follows, we derive a special example, in which the sojourn time durations of both UP and DOWN states are modeled by a censored exponential distribution as given in equation 3.12. This example can be viewed as a special case of the result from Ball et al. (1999) in which no constraint was imposed for the pdf. Let r0 and r1 denote the rate parameters associated with the exponential distribution for the DOWN and UP states, respectively. And let a0 > 0 and a1 > 0 denote the lower bounds of the sojourn durations for the DOWN and UP states, respectively. The probability ratios Inline graphic1 and Inline graphic3 for the seven move types are as follows:

  • Move type 1:
    1(1)=c1ri1exp(ri1u)c2ri2exp[ri2(τj+1+τju)]c3ri1exp(ri1τj)c4ri2exp(ri2τj+1)=c1c2exp[(ri1ri2)(τiu)]c3c4(option1)1(1)=c5ri1exp[ri1(τj+u)]c5ri2exp[ri2(τj+1u)]c3ri1exp(ri1τj)c4ri2exp(ri2τj+1)=c5c6exp[(ri1ri2)u]c3c4(option2)3(1)=1(option1and2),
    where c1, c2, c3, c4, c5, c6 are the normalized coefficients (details are ignored here; see the description after equation 3.13)
  • Move type 2:
    1(2)=c1ri1exp(ri1u)c2ri2exp(ri2v)c3ri1exp[ri1(τluv)]c4ri1exp(ri1τl)=c1c2c3ri1ri2exp[(ri1ri2)v]c43(2)=(τl2ai1)(exp(ri2ai2)exp[ri2(τluai1)])ri2exp(ri2v),
    where c1, c2, c3, c4 are the normalized coefficients
  • Move type 3:
    1(3)=c1ri1exp[ri1(τl1+τl+τl+1)]c2ri1exp(ri1τl1)c3ri2exp(ri2τl)c4ri1exp(ri1τl+1)=c1exp[(ri2ri1)τl]c2c3c4ri1ri23(3)=ri2exp(ri2τl)(τl1+τl+τl+1)(exp(ri2ai2)exp[ri2(τl+τl+1)]),
    where c1, c2, c3, c4 are the normalized coefficients
  • Move type 4:
    1(4)=πi1πi2c1ri1exp(ri1u)c2ri2exp[ri2(τ0u)]c3ri2exp(ri2τ0)=πi1πi2c1c2r1exp[(ri2ri1)u]c33(4)=τ0ai1ai2,
    where c1, c2, c3 are the normalized coefficients
  • Move type 5:
    1(5)=πi2πi1c1ri2exp[ri2(τ0+τ1)]c2ri1exp(ri1τ0)c3ri2exp(ri2τ1)=πi2πi1c1exp[(ri1ri2)τ0]c2c3r13(5)=1τ0+τ1,
    where c1, c2, c3 are the normalized coefficients
  • Move type 6:
    1(6)=c1ri1exp(ri1u)c2ri2exp[ri2(τnu)]c3ri1exp(ri1τn)=c1c2r2exp[(ri1ri2)(τnu)]c33(6)=τnai1,
    where c1, c2, c3 are the normalized coefficients
  • Move type 7:
    1(7)=c1ri1exp[ri1(τn1+τn)]c2ri1exp(ri1τn1)c3ri2exp(ri2τn)=c1exp[(ri2ri1)τn]c2c3r23(7)=1τn1+τn,
    where c1, c2, c3 are the normalized coefficients.

Appendix B: Threshold-Based Method for Classifying UP and DOWN States

The standard threshold-based method for determining the UP and DOWN states based on MUA spike trains (Ji & Wilson, 2007) consists of three major steps.

First, we bin the spike trains into 10 ms windows and calculate the raw spike counts for all time intervals. The raw count signal is smoothed by a gaussian window with an SD of 30 ms to obtain the smoothed count signal over time. We then calculate the first minimum (count threshold value) of the smoothed spike count histogram during SWS. As the spike count has been smoothed, the count threshold value may be a noninteger value.

Second, based on the count threshold value, we determine the active and silent periods for all 10 ms bins. The active periods are set to 1, and silent periods are set to 0. Next, the duration lengths of all silent periods are computed. We then calculate the first local minimum (gap threshold) of the histogram of the silent period durations.

Third, based on the gap threshold value, we merge those active periods separated by silent periods in duration less than the gap threshold. The resultant active and silent periods are classified as the UP and DOWN states, respectively. Finally, we recalculate the duration lengths of all UP and DOWN state periods and compute their respective histograms and sample statistics (min, max, median, mean, SD).

In summary, the choices of the spike count threshold and the gap threshold will directly influence the UP and DOWN state classification and their statistics (in terms of duration length and occurrence frequency). However, the optimal choices of these two hand-tuned parameters are rather ad hoc and dependent on several issues (e.g., kernel smoothing, bin size; see Figure 16 for an illustration). In some cases, no minimum can be found in the smoothed histogram, and then the choice of the threshold is problematic. Note that the procedure will need to be repeated for different data sets such that the UP and DOWN states statistics can be compared.

Figure 16.

Figure 16

Threshold-based method for estimating the median duration length of the UP and DOWN states (data from the same rat on a different day) in which two thresholds are chosen by grid search. The abscissa represents the gap threshold (in millisecond), and the ordinate represents the smoothed spike count threshold. The map's units are shown in seconds.

Footnotes

1

The neuronal state sometimes can refer to a single cell level, during which neurons exhibit different lengths or durations of depolarizing shift (e.g., Fujisawa, Matsuki, & Ikegaya, 2005).

2

The CIF is also known as the hazard rate function in survival analysis. The value λtΔ measures the probability of a failure or death of an event in [t, t + Δ) given the process has survived up to time t.

3

In the worst case, the complexity is Inline graphic(NT(N + T)) in time, in contrast to Inline graphic(N2T)) for the HMM, where N denotes the number of discrete state and T denotes the total length of sequences.

4

The term emission probability arose from the HMM literature in the context of speech recognition; it refers to the probability of observing a (finite) symbol given a hidden state (finite alphabet).

5

Here we assume that the individual CIF λc(t) can be modeled as a GLM with log(·) as a link function (Truccolo et al., 2005; Paninski, Pillow, & Lewi, 2007). One can also use other functions as the link function candidate, such as log(1 + exp(·)) or the sigmoidal (bell-shaped) function, whichever better reflects the neuron's firing properties (e.g., Paninski, 2004). The choice of the functional form for the CIF does not affect the inference principle or procedure described below.

6

In addition to the compact representation, another main reason for this reformulation is the efficiency and stability of numerical computation in calculating the observed data likelihood or likelihood ratio.

7

The exponential distribution with mean 1/λ is the maximum entropy distribution among all continuous distributions with nonnegative support that have a mean 1/λ.

8

Censoring is a term used in statistics that refers to the condition that the value of an observation is partially known or the condition that a value occurs outside the range of measuring tool.

9

Alternatively, one can use a discrete nonparametric probability model for the sojourn time, which is discussed in section 5.

10

Note that Ŝ(t) is not the same as χl (l = 1, 2, …, n) in that the former is stochastic and the latter is deterministic.

11

The sleep stage classification was based on the recordings of electromyography (EMG) and hippocampal and cortical EEGs (ripple power, theta and delta power). SWS is characterized as having low EMG, high hippocampal ripple, low hippocampal theta (4–8 Hz), and high cortical delta (2–4 Hz). In practice, we varied the cut-off thresholds of those parameters (via grid search) to obtain a suboptimal SWS classification for a specific rat.

12

The model was empirically verified by model selection based on the GLM fit of a small data set using the glmfit function in Matlab. The model with the lowest deviance (defined by twofold negative log likelihood) or the smallest Akaike's information criterion (AIC) value will be chosen.

13

A sensible initial parameter value will be an important factor for obtaining a fast convergence of the simulated Markov chain. In practice, one can fit a small spike train data set (with preidentified hidden state values) with a GLM.

14

We computed the mean and variance of spike counts given all 10 ms time bins and obtained a mean 2.42 and a variance 4.45. The deviance between the mean and variance statistics suggested that the inhomogeneous Poisson probability model is inaccurate, and this fact motivated us to include history-dependent covariates in the rate parameter.

15

A video demo file is provided online (https://neurostat.mit.edu/zhechen/UpDownDemo.avi) for readers interested in a detailed result comparison for a selected 5 minute recording.

16

The cortical EEG averages have special waveforms triggered by the start and the end times of the UP state; furthermore, ripple events (150–300 Hz) occur much more frequently during the UP state (Ji & Wilson, 2007).

17

In implementation by Matlab version 7.0, that roughly amounts to about 15 hours of CPU time in a personal computer equipped with an Intel Core 2 Du processor.

18

This is in contrast to the anesthetized animals, in which the DOWN states occupy a larger fraction of time than the UP states.

19

Recently, complementary work has been reported in modeling self-organized recurrent network model of excitatory and inhibitory neurons for spontaneous UP and DOWN state transitions (Kang, Kitano, & Fukai, 2008).

20

Given a reasonably accurate initial state, the second sampling option would be more efficient since its rejection rate would be lower. For the current continuous-time estimation problem, the SD parameter σ in the second sampling option is chosen to be 2 ms, twice that of the bin size.

21

To avoid numerical problems, it is more convenient to calculate the ratios in the logarithm domain.

Contributor Information

Zhe Chen, Email: zhechen@mit.edu, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A., and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Sujith Vijayan, Email: vijayan@post.harvard.edu, Program in Neuroscience, Harvard University, Cambridge, MA 02139, U.S.A., and Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Riccardo Barbieri, Email: barbieri@neurostat.mit.edu, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A.

Matthew A. Wilson, Email: mwilson@mit.edu, Picower Institute for Learning and Memory, RIKEN-MIT Neuroscience Research Center, Department of Brain and Cognitive Sciences and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Emery N. Brown, Email: enb@neurostat.mit.edu, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A., Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, U.S.A., and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

References

  1. Abeles M, Bergman H, Gat I, Meilijson I, Seidemann E, Tishby N, et al. Cortical activity flips among quasi-stationary states. Proc Natl Acad Sci USA. 1995;92:8616–8620. doi: 10.1073/pnas.92.19.8616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Achtman N, Afshar A, Santhanam G, Yu BM, Ryu SI, Shenoy KV. Free paced high-performance brain-computer interfaces. Journal of Neural Engineering. 2007;4:336–347. doi: 10.1088/1741-2560/4/3/018. [DOI] [PubMed] [Google Scholar]
  3. Albert PS. A two-state Markov mixture model for a time series of epileptic seizure counts. Biometrics. 1991;47:1371–1381. [PubMed] [Google Scholar]
  4. Ball FG, Cai Y, Kadane JB, O'Hagan A. Bayesian inference for ion-channel gating mechanisms directly from single-channel recordings, using Markov chain Monte Carlo. Proceedings of the Royal Society of London, A. 1999;455:2879–2932. [Google Scholar]
  5. Bannerjee AK, Bhattacharyya GK. Bayesian results for the inverse gaussian distribution with an application. Technometrics. 1979;21(2):247–251. [Google Scholar]
  6. Barbieri R, Quirk MC, Frank LM, Wilson MA, Brown EN. Construction and analysis of non-Poisson stimulus response models of neural spike train activity. Journal of Neuroscience Methods. 2001;105:25–37. doi: 10.1016/s0165-0270(00)00344-7. [DOI] [PubMed] [Google Scholar]
  7. Battaglia FP, Sutherland GR, McNaughton BL. Hippocampal sharp wave bursts coincide with neocortical “up-state” transitions. Learning and Memory. 2004;11:697–704. doi: 10.1101/lm.73504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Baum L. An inequality and associated maximization technique in statistical estimation for probabilistic function of Markov processes. Inequalities. 1972;3:1–8. [Google Scholar]
  9. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. 1970;41(1):164–171. [Google Scholar]
  10. Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press; 1957. [Google Scholar]
  11. Brillinger DR. Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics. 1988;59:189–200. doi: 10.1007/BF00318010. [DOI] [PubMed] [Google Scholar]
  12. Brooks S, Giudici P, Roberts G. Efficient construction of reversible jump MCMC proposal distributions (with discussion) Journal of the Royal Society of London, Series B. 2003;65(1):3–56. [Google Scholar]
  13. Brown EN. Theory of point processes for neural systems. In: Chow CC, Gutkin B, Hansel D, Meunier C, Dalibard J, editors. Methods and models in neurophysics. Amsterdam: Elsevier; 2005. pp. 691–727. [Google Scholar]
  14. Brown EN, Barbieri R, Eden UT, Frank LM. Likelihood methods for neural data analysis. In: Feng J, editor. Computational neuroscience: A comprehensive approach. London: CRC Press; 2003. pp. 253–286. [Google Scholar]
  15. Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. The time-rescaling theorem and its application to neural spike data analysis. Neural Comput. 2002;14(2):325–346. doi: 10.1162/08997660252741149. [DOI] [PubMed] [Google Scholar]
  16. Brown EN, Kass RE, Mitra PP. Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience. 2004;7:456–461. doi: 10.1038/nn1228. [DOI] [PubMed] [Google Scholar]
  17. Buzsáki G. Rhythms of the brain. New York: Oxford University Press; 2006. [Google Scholar]
  18. Chan KS, Ledolter J. Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association. 1995;90:242–252. [Google Scholar]
  19. Chung SH, Krishnamurthy V, Moore JB. Adaptive processing techniques based on hidden Markov models for characterizing very small channel currents buried in noise and deterministic interferences. Philosophical Transactions of the Royal Society, London B. 1991:357–384. doi: 10.1098/rstb.1991.0122. [DOI] [PubMed] [Google Scholar]
  20. Coleman TP, Brown EN. A recursive filter algorithm for state estimation from simultaneously recorded continuous-valued, point process and binary observations. Proc. Asilomar Conference on Signals, Systems, and Computers; Piscataway, NJ: IEEE Press; 2006. pp. 1949–1953. [Google Scholar]
  21. Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association. 1996;91:883–904. [Google Scholar]
  22. Cox DR, Isham V. Point processes. London: Chapman & Hall; 1980. [Google Scholar]
  23. Daley D, Vere-Jones D. An introduction to the theory of point processes, Vol. 1: Elementary theory and methods. New York: Springer-Verlag; 2002. [Google Scholar]
  24. Danóczy MG, Hahnloser RHR. Efficient estimation of hidden state dynamics from spike trains. In: Weiss Y, Schölkopf B, Platt J, editors. Advances in neural information processing systems, 18. Cambridge, MA: MIT Press; 2006. pp. 227–234. [Google Scholar]
  25. Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39(1):1–38. [Google Scholar]
  26. Deng L, Mark JW. Parameter estimation for Markov modulated Poisson processes via the EM algorithm with time discretization. Telecommunication Systems. 1993;1(1):321–338. [Google Scholar]
  27. Durbin R, Eddy S, Krough A, Mitchison G. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press; 1998. [Google Scholar]
  28. Eden UT, Brown EN. Mixed observation filtering for neural data. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08); Piscataway, NJ: IEEE Press; 2008. pp. 5201–5203. [Google Scholar]
  29. Ephraim Y, Merhav N. Hidden Markov processes. IEEE Transactions on Information Theory. 2002;48:1518–1569. [Google Scholar]
  30. Escola S, Paninski L. Hidden Markov models for the complex stimulus-response relationship of multi-state neurons. Conference abstract, Frontiers in Computational Neuroscience, Bernstein Symposium.2008. [Google Scholar]
  31. Forney GD. The Viterbi algorithm. Proceedings of the IEEE. 1973;61(3):268–278. [Google Scholar]
  32. Fredkin DR, Rice JA. Maximum likelihood estimation and identification directly from single channel recordings. Proceedings of the Royal Society of London, B. 1992;249:125–132. doi: 10.1098/rspb.1992.0094. [DOI] [PubMed] [Google Scholar]
  33. Fujisawa S, Matsuki N, Ikegaya Y. Single neurons can induce phase transitions of cortical recurrent networks with multiple internal states. Cerebral Cortex. 2005;16(5):639–654. doi: 10.1093/cercor/bhj010. [DOI] [PubMed] [Google Scholar]
  34. Gat I, Tishby N, Abeles M. Hidden Markov modeling of simultaneously recorded cells in the associated cortex of behaving monkeys. Networks: Computation in Neural Systems. 1997;8(3):297–322. [Google Scholar]
  35. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Sciences. 1992;7:457–472. [Google Scholar]
  36. Geyer CJ. Practical Markov chain Monte Carlo. Statistical Science. 1992;7:473–511. [Google Scholar]
  37. Ghahramani Z, Hinton GE. Variational learning for switching state-space models. Neural Comput. 2000;12(4):831–864. doi: 10.1162/089976600300015619. [DOI] [PubMed] [Google Scholar]
  38. Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov chain Monte Carlo in practice. London: Chapman & Hall/CRC; 1995. [Google Scholar]
  39. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732. [Google Scholar]
  40. Guon Y. Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational Graphical Statistics. 2003;12:604–639. [Google Scholar]
  41. Haider B, Duque A, Hasenstaub AR, McCormick DA. Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. 2006;26:4535–4545. doi: 10.1523/JNEUROSCI.5297-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Haider B, Duque A, Hasentaub AR, Yu Y, McCormick DA. Enhancement of visual responsiveness by spontaneous local network activity in vivo. Journal of Neurophysiology. 2007;97:4186–4202. doi: 10.1152/jn.01114.2006. [DOI] [PubMed] [Google Scholar]
  43. Haslinger R, Ulbert I, Moore CI, Brown EN, Devor A. Analysis of LFP phase predicts sensory response of barrel cortex. Journal of Neurophysiology. 2006;96:1658–1663. doi: 10.1152/jn.01288.2005. [DOI] [PubMed] [Google Scholar]
  44. Ji D, Wilson MA. Coordinated memory replay in the visual cortex and hippocampus during sleep. Nature Neuroscience. 2007;10(1):100–107. doi: 10.1038/nn1825. [DOI] [PubMed] [Google Scholar]
  45. Jones LM, Fontanini A, Sadacca BF, Katz DB. Natural stimuli evoke analysis dynamic sequences of states in sensory cortical ensembles. Proc Natl Acad Sci USA. 2007;104:18772–18777. doi: 10.1073/pnas.0705546104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Kang S, Kitano K, Fukai T. Structure of spontaneous UP and DOWN transitions self-organizing in a cortical network model. PLoS Computational Biology. 2008;4(3):e1000022. doi: 10.1371/journal.pcbi.1000022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kass RE, Ventura V, Brown EN. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology. 2005;94:8–25. doi: 10.1152/jn.00648.2004. [DOI] [PubMed] [Google Scholar]
  48. Kemere C, Santhanam G, Yu BM, Afshar A, Ryu SI, Meng TH, et al. Detecting neural-state transitions using hidden Markov models for motor cortical prostheses. Journal of Neurophysiology. 2008;100:2441–2452. doi: 10.1152/jn.00924.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Le ND, Leroux BG, Puterman ML. Exact likelihood evaluation in a Markov mixture model for time series of seizure counts. Biometrics. 1992;48:317–323. [PubMed] [Google Scholar]
  50. Luczak A, Barthó P, Marguet SL, Buzsáki G, Harris KD. Sequential structure of neocortical spontaneous activity in vivo. Proc Natl Acad Sci. 2007;104:347–352. doi: 10.1073/pnas.0605643104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. McCullagh P, Nelder J. Generalized linear models. London: Chapman & Hall; 1989. [Google Scholar]
  52. McLachlan GJ, Krishnan T. The EM algorithm and extensions. Hoboken, NJ: Wiley; 1996. [Google Scholar]
  53. Mehta MR. Cortico-hippocampal interaction during up-down states and memory consolidation. Nature Neuroscience. 2007;10(1):13–15. doi: 10.1038/nn0107-13. [DOI] [PubMed] [Google Scholar]
  54. Paninski L. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems. 2004;15:243–262. [PubMed] [Google Scholar]
  55. Paninski L, Pillow J, Lewi J. Statistical models for neural encoding, decoding, and optimal stimulus design. In: Cisek P, Drew T, Kalaska J, editors. Computational neuroscience: Theoretical insights into brain function. Amsterdam: Elsevier; 2007. [DOI] [PubMed] [Google Scholar]
  56. Pawitan Y. In all likelihood: Statistical modelling and inference using likelihood. New York: Oxford University Press; 2001. [Google Scholar]
  57. Prerau MJ, Smith AC, Eden UT, Yanike M, Suzuki W, Brown EN. A mixed filter algorithm for cognitive state estimation from simultaneously recorded continuous-valued and binary measures of performance. Biological Cybernetics. 2008;99(1):1–14. doi: 10.1007/s00422-008-0227-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257–286. [Google Scholar]
  59. Radons G, Becker JD, Dülfer B, Krüger J. Analysis, classification, and coding of multielectrode spike trains with hidden Markov models. Biological Cybernetics. 1994;71(4):359–373. doi: 10.1007/BF00239623. [DOI] [PubMed] [Google Scholar]
  60. Robert CP, Rydén T, Titterington DM. Bayesian inference in hidden Markov models through reversible jump Markov chain Monte Carlo method. Journal of the Royal Statistical Society, Series B. 2000;62:57–75. [Google Scholar]
  61. Roberts WJJ, Ephraim Y, Dieguez E. On Rydén's EM algorithm for estimating MMPPs. IEEE Signal Processing Letters. 2006;13(6):373–376. [Google Scholar]
  62. Rydén T. An EM algorithm for estimation in Markov-modulated Poisson processes. Computational Statistics and Data Analysis. 1996;21:431–447. [Google Scholar]
  63. Sanchez-Vives MV, McCormick DA. Cellular and network mechanisms of rhythmic recurrent activity in neocortex. Nature Neuroscience. 2000;3:1027–1034. doi: 10.1038/79848. [DOI] [PubMed] [Google Scholar]
  64. Sirota A, Csicsvari J, Buhl D, Buzsáki G. Communication between neocortex and hippocampus during sleep in rodents. Proc Nat Acad Sci USA. 2003;100:2065–2069. doi: 10.1073/pnas.0437938100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Comput. 2003;15:965–991. doi: 10.1162/089976603765202622. [DOI] [PubMed] [Google Scholar]
  66. Srinivasan L, Eden UT, Mitter SK, Brown EN. General-purpose filter design for neural prosthetic devices. Journal of Neurophysiology. 2007;98:2456–2475. doi: 10.1152/jn.01118.2006. [DOI] [PubMed] [Google Scholar]
  67. Stjernqvist S, Rydén T, Sköld M, Staaf J. Continuous-index hidden Markov modelling of array CGH copy number data. Bioinformatics. 2007;23(8):1006–1014. doi: 10.1093/bioinformatics/btm059. [DOI] [PubMed] [Google Scholar]
  68. Takagi K, Kumagai S, Matsunaga I, Kusaka Y. Application of inverse gaussian distribution to occupational exposure data. Annals of Occupational Hygiene. 1997;41(5):505–514. [Google Scholar]
  69. Tanner MA. Tools for statistical inference. 3rd. New York: Springer-Verlag; 1996. [Google Scholar]
  70. Truccolo W, Eden UT, Fellow M, Donoghue JD, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble and covariate effects. Journal of Neurophysiology. 2005;93:1074–1089. doi: 10.1152/jn.00697.2004. [DOI] [PubMed] [Google Scholar]
  71. Tu Z, Zhu S. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(5):657–673. [Google Scholar]
  72. Tuckwell HC. Stochastic processes in the neurosciences. Philadelphia: SIAM; 1989. [Google Scholar]
  73. Verbeek JJ, Vlassis N, Kröse B. Efficient greedy learning of gaussian mixture models. Neural Comput. 2003;15(2):469–485. doi: 10.1162/089976603762553004. [DOI] [PubMed] [Google Scholar]
  74. Vijayan S. Unpublished doctoral dissertation. Harvard University; 2007. Characterization of activity in the primary somatosensory cortex during active behavior and sleep. [Google Scholar]
  75. Viterbi J. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory. 1967;13:260–269. [Google Scholar]
  76. Volgushev M, Chauvette S, Mukovski M, Timofeev I. Precise long-range synchronization of activity and silence in neocortical neurons during slow-wave sleep. Journal of Neuroscience. 2006;26:5665–5672. doi: 10.1523/JNEUROSCI.0279-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Wolansky T, Clement EA, Peters SR, Palczak MA, Dickson CT. Hippocampal slow oscillation: A novel EEG state and its coordination with ongoing neocortical activity. Journal of Neuroscience. 2006;26:6213–6229. doi: 10.1523/JNEUROSCI.5594-05.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Xi J, Kass RE. Detection of bursts in neuronal spike trains using hidden semi-Markov point process models. 2008 doi: 10.1007/s10827-009-0182-2. Unpublished manuscript. [DOI] [PubMed] [Google Scholar]

RESOURCES