Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jul 2.
Published in final edited form as: Adv Neural Inf Process Syst. 2018 Dec;31:6690–6699.

Model-based targeted dimensionality reduction for neuronal population data

Mikio C Aoi 1, Jonathan W Pillow 2
PMCID: PMC6605062  NIHMSID: NIHMS1033329  PMID: 31274967

Abstract

Summarizing high-dimensional data using a small number of parameters is a ubiquitous first step in the analysis of neuronal population activity. Recently developed methods use “targeted” approaches that work by identifying multiple, distinct low-dimensional subspaces of activity that capture the population response to individual experimental task variables, such as the value of a presented stimulus or the behavior of the animal. These methods have gained attention because they decompose total neural activity into what are ostensibly different parts of a neuronal computation. However, existing targeted methods have been developed outside of the confines of probabilistic modeling, making some aspects of the procedures ad hoc, or limited in flexibility or interpretability. Here we propose a new model-based method for targeted dimensionality reduction based on a probabilistic generative model of the population response data. The low-dimensional structure of our model is expressed as a low-rank factorization of a linear regression model. We perform efficient inference using a combination of expectation maximization and direct maximization of the marginal likelihood. We also develop an efficient method for estimating the dimensionality of each subspace. We show that our approach outperforms alternative methods in both mean squared error of the parameter estimates, and in identifying the correct dimensionality of encoding using simulated data. We also show that our method provides more accurate inference of low-dimensional subspaces of activity than a competing algorithm, demixed PCA.

1. Introduction

Neuroscience has recently seen a massive expansion in the number of neurons that can be recorded from a single animal, largely due to transformative technological advancements in electrode design and two-photon imaging. One of the effects of our increased measurement capacity is an increased interest in the properties of the activity of groups of neurons (i.e. population activity), as opposed to analyzing the activity of single-neurons independently [1]. One goal of analyzing population activity is to characterize the ways in which groups of neurons coordinate to perform task-relevant computations.

Dimensionality reduction is central to the analysis of population activity [1]. Concomitant with the broader use of classical dimensionality reduction methods like PCA and ICA come the recognition that these methods often do not take full advantage of well-characterized properties of neuronal population data such as tensor structure or temporal correlations in spike rates and a number of recent data analysis techniques have been developed to improve our ability to meet such specific challenges [27]. Of particular interest have been methods of dimensionality reduction for population data that distinguish between the effects of various inputs and outputs, or “task variables,” such as a stimulus strength, an experimental context, or a behavioral outcome [811]. We will refer to these methods collectively as “targeted” methods.

Although several targeted methods of dimensionality reduction exist, two recent methods stand out among existing methods: demixed principle components analysis (dPCA) [8] and targeted dimensionality reduction (TDR) [9]. Both of these methods were developed for the analysis of neuronal population data that inherently have observations of neuronal activity structured as matrices (ex. neurons by row, time by columns) and both attempt to identify low-dimensional subspaces that best describe the population responses to an individual task variable.

The most recent version of dPCA [8] is a general method with relatively weak modeling assumptions, arbitrary dimensionality, and a fast estimation algorithm based on low-rank regression [12]. However, dPCA requires that all observed neurons display firing rates for all possible combinations of task variables, a condition that may be too strict to be applicable for complex experiments. In contrast, TDR [9] utilizes a linear regression-based approach that circumvents the need to have observed every neuron at every combination of task variables by imposing an explicit relationship between regressors and outputs. However, the TDR method is limited to a one-dimensional subspace per task variable. It is not clear that only one dimension is sufficient to describe the population activity associated with a given task variable. For example, sequential activation of neurons during decision making has been observed in rodents, where the precise ordering of activations depends on which decision the animal makes [13] and population code “morphing” has been observed in monkeys where decision encoding changes over time [14]. These types of dynamic encoding schemes are inherently high-dimensional and any method constrained to too-few dimensions will fail to fully characterize such activity. Lastly, none of the existing methods have principled approaches to identifying the dimensionality of the data, making post hoc analysis particularly difficult.

Here we propose a model-based method for targeted dimensionality reduction based on an extension of the framework proposed by [9]. Our approach overcomes a number of the drawbacks of existing methods. Using a probabilistic generative model of the data, we can infer the optimal dimensionality of the low-dimensional subspaces required to faithfully describe underlying latent structure in the data. In the following, we describe the model, which we call model-based targeted dimensionality reduction (MBTDR), its assumptions, and an efficient estimation procedure for model parameters and dimensionality. We then demonstrate the accuracy of our estimation algorithm against alternative methods of estimation.

2. Explicitly low-dimensional model of population activity

2.1. High-dimensional description

Our model begins with a description of trial-by-trial neuronal activity in terms of a linear regression with respect to the task variables. We assume that the activity yi,k(t) of the ith neuron at time t on trial k can be described by a linear combination of P task variables xk(p),p=1,,P (ex. stimulus variables, behavioral outcomes, and nonlinear combinations), such that

yi,k(t)=xk(1)βi,1(t)+xk(2)βi,2(t)++xk(P)βi,P(t)+ϵi,k(t).

where the values of the P task variables xk(p) are known, the βi,p(t) are unknown coefficients, and ϵi,k(t) is noise. This basic model structure is identical to that of the regression model used in [9] and has been successfully employed in characterizing neuronal activity of single neurons [15].

To represent all neurons simultaneously, we simply concatenate all i=1,,n responses into a vector and write

yk(t)=xk(1)β1(t)+xk(2)β2(t)++xk(P)βP(t)+ϵk(t),

where yk(t)=(y1,k(t),,yn,k(t)),βp(t)=(β1,p(t),,βn,p(t)), and ϵk(t)=(ϵ1,k(t),,ϵn,k(t)).

Neuronal recordings are often performed in experiments where trial epochs are of fixed duration. We can take advantage of this structure by regarding the observation on each trial to be a matrix, Yk=(yk(1),,yk(T)), which is a linear combination of P coefficient matrices Bp=(βp(1),,βp(T)) giving the observation model

Yk=xk(1)B1+xk(2)B2++xk(P)BP+Ek, (1)

where Ek=(ϵk(1),,ϵk(T)). A schematic illustration of this basic setting is shown in Figure 1.

Figure 1:

Figure 1:

A: Schematic illustration of low-rank regression model. The n × T response matrix can be decomposed into two response matrices (β1,β2), each corresponding to one task variable (upper panel). Each response matrix can be factorized into a small number of row and columns vectors, making the population response a linear combination of a small number of common basis functions weighted differently for each neuron. B: Results of simulation study evaluating parameter estimation accuracy for different estimation procedures. Legend indicates the method used. Abscissa indicates the number of trials used for the simulations. Error bars indication 95% confidence intervals over 100 runs. C: Duration of computation for methods and trials used in B.

In general, not all neurons are observed simultaneously. Most often they are observed sequentially or in sequential blocks. Suppose we do not observe all rows of Yk on all trials but instead observe nkn neurons. If we let Yk be a latent matrix of all recorded neurons from all trials, then we can describe the observed neurons on any given trial by Zk = HkYk, where Hk is a nk × n matrix where each row is a 1-hot vector providing the index of an observed neuron.

2.2. Low-dimensional description of observations

With no additional constraints our observation model (1) is extremely high dimensional and is effectively a separate linear regression for each neuron at every time point. This would only be a sensible model if we believed that neurons were not in fact coordinating activity between each other or across time. We would like to be able to express the prior belief that there are correlations across the population but that correlations in activity due to different values of stimuli are not necessarily the same as those due to the behavior of the animal.

To accomplish this we can describe each characteristic response matrix Bp by a low-rank factorization, i.e. Bp = WpSp, where Wp and Sp are n × rp and rp × T respectively, where rp = rank(Bp). Equivalently, we can say that rp is the dimensionality of the encoding of task variable p. This formulation has an intuitive interpretation, illustrated schematically in Figure 1A: the characteristic response βip(t) of each neuron to the pth task variable can be expressed as a linear combination of rp weighted basis functions βip(t)=j=1rpwi,j(p)sj(p)(t), where rp is the dimensionality of the encoding, {sj(p)(t)}j=1rp are a common set of time-varying basis functions, and {wi,j(p)}j=1rp are neuron-dependent mixing weights.

The example in Figure 1A displays a model with two task variables (x1, x2), where the x1 subspace is 1D and the x2 subspace is 2D. The columns of Wp’s weight each time-varying basis function differently for each neuron. Collectively, these weights define the subspace of activity that encodes task variable xp. For x1, the encoding is 1D because only one basis function is needed to describe the population response to x1. The x2 response is slightly more complex, with different responses at different times, requiring at least two basis functions.

3. Model estimation

The goal of inference is to estimate the factors of Bp and the ranks rp. Our proposed estimation strategy is to estimate one set of factors ({Wp} or {Sp}) while integrating out the other. For example, if we define a prior over the mixing weights {Wp} denoted by p(W), and a data likelihood p(Z|W,S) then the marginal likelihood of the matrix of time-varying basis functions S can be obtained by

p(Z|S,λ)=p(Z|W,S,λ)p(W)dW.

In principle, either set of factors may be selected. In practice however the set of factors with lowest dimension should be selected to keep computational costs low. In this paper we focus on the case where Tn and we therefore will estimate {Sp} while integrating over {Wp}. The fact that either set of factors may be determined in this way means that there is a duality between rows and columns imposed by this model that is similar in principle to the duality between factors and latent states for probabilistic principle components analysis [16].

In practice inference can be considerably simplified if we let the noise distribution and prior distribution of W both be Gaussian, which permits closed-form expression of the marginal and posterior densities. We will let all elements of W to be independent standard normal, (i.e., w~N(0,Ir˜n) where r˜=prank(Bp)). In addition, we let the noise covariance on all trials be given by Ek~MN(0,D1,IT), where MN(M,A,B) denotes the matrix normal distribution with row covariance A and column covariance B, and Ddiag(λ1,,λn) where λi is the inverse noise variance of neuron i. We therefore assume that the weights are a priori independent and that the noise variance is independent across both neurons and time. In principle, our framework supports the application of more structured priors and noise covariances but we will save the exploration of more elaborate models for future work.

3.1. Marginal likelihood of timecourses S

Since our model is linear and Gaussian, the marginalized density p(Z|S,λ) is also Gaussian and can be easily derived using standard Gaussian identities [17]. However, a naive derivation of the marginal likelihood requires the log determinant and inverse of a matrix which is N˜T×N˜T, where N˜=iNi, such that Ni is the number of observed trials for neuron i. Thus, if all neurons are observed on all trials, then the dimensions of the marginal covariance will be nNT×nNT, which can be prohibitively large for even moderately sized datasets since the determinant and inverse in general will have computational complexity O(N˜3T3). Luckily, the expression for the marginal likelihood can be dramatically reduced by exploiting the factorization of regression parameters.

If we let Sblkdiag(S1,,SP),andλ=(λ1,,λn) then we can derive (see Supplementary Material for details) the following expression for the marginal likelihood in terms of S and λ,

l(s,λ)=12(n˜Tlog2π+i=1n(NiTlogλi+λiyiyi+log|Ci|λi2Trace[RiSCi1S])). (2)

where the matrices Ri and Ci are defined by

Ri=(XiIT)yiyi(XiIT),Ci=λiS(AiIT)S+Ir˜, (3)

respectively, where Xi is the Ni × P design matrix that only includes trials where neuron i was observed Ai=XiXiandyi=(yi1,,yiN), with yik being the length-T response of neuron i on trial k.

The expression in (2) reveals two things about the structure of dependencies within the model. First, we notice that the likelihood factorizes over neurons, making evaluation of the likelihood potentially highly parallelizable. Second, the trace term is remenicent of the quadratic term of a matrix normal model, indicating that we can intuitively think of the posterior covariance Ci and the rank-1 matrix Ri as the neuron-dependent contributions to the row and column covariances of S, respectively.

Maximum marginal likelihood (MML) estimates for S and λi can be obtained by directly maximizing (2) by gradient ascent.

3.2. Posterior distribution of neural weights W

Once an estimate of S and λ is obtained we can do posterior inference on W. Because our model is linear and Gaussian, the posterior density p(W|Z,S,λ) is also Gaussian and admits closed-form expressions for the posterior expectation and variance of W. Because of our low-rank model structure, the posterior of the weight matrices {Wp} factorizes over neurons and we can estimate the weights W for each neuron separately and achieve computational savings relative to joint estimation over all neurons simultaneously.

We can define a r˜×1 vector ωi that contains all of the weights for neuron i. Collectively, the ωi can be expressed as

(ω1ωn)=vec(W1WP).

This notation allows us to do efficient posterior inference over ωi, where the posterior expectation and covariance of ωi are given by

Eωi|S,Z[ωi]=λiCi1S(XiIT)ζi,Covωi|S,Z[ωi]=Ci1,

where Ci is defined as in (3).

3.3. Decoding

Once estimates of Bp are obtained we can decode new trials using the observation likelihood. This is a distinct feature of our method that is not available to dPCA and TDR. The former methods are used for estimation of the encoding but must learn a separate decoder to decode task variables from the activity. Because of the probabilistic formulation of our model we can do encoding and decoding within the same framework, allowing us to directly ask questions about how the structure of the encoding influences the ability of down-stream populations to decode the information in the recorded population. While we do not pursue decoding further in this paper we included a description of the optimal decoder in the Supplementary Material.

4. ECME algorithm for parameter estimation

In general, maximization of the marginal likelihood (2) can be relatively slow when the number of parameters is large. We therefore derive a “expectation-conditional maximization, either” algorithm (ECME) [18] where parameters are block-wise estimated by either maximizing the conditional expectation of the complete data log likelihood or the marginal likelihood. Our algorithm has closed-form updates for each parameter block.

Note that, for Bayesian linear regression with Gaussian likelihood and prior, an otherwise unstructured model would have, for M parameters, an ECME update with computational complexity of O(M3). In contrast, due to the additional low-rank structure of our model, and despite each M-step updating r˜T+n parameters, our M-step updates have computational complexity O(r˜3), where there are typically r˜min{n,T}. This means that the actual computational cost of ECME is limited only by the underlying dimensionality of the data, and not to the total number of parameters per se.

As we demonstrate in Section 6.1, while our EM algorithm provides parameter estimates that are only slightly worse in mean-squared error as maximizing the marginal likelihood directly, this small additional error has a serious impact on dimensionality estimation. We therefore use our ECME algorithm to provide fast, high-quality initialization for maximizing the marginal likelihood by gradient descent.

5. A greedy algorithm for rank estimation

While our model can identify subspaces of any dimension up to Dmax=min{n,T}, the dimensionality of each subspace must be specified a priori. Although we may use standard model selection techniques to compare the goodness of fit between models with alternative configurations an exhaustive search over all possible models would require searching over DmaxP possible configurations. We therefore developed a greedy algorithm for estimating the optimal dimensionality. A summary of the procedure is presented in Algorithm 1.

Recall that the dimensionality of each task-variable encoding corresponds to the rank of each Bp. We begin the algorithm by first estimating the model parameters with rank rp = 1 for all p (although in principle we may start at rp = 0, denoting the null model for all elements of Bp), giving us a model with total dimensionality r˜=p=1Prp and at the first iteration r˜1=P. At the jth iteration, we estimate the parameters of P models, where each model has the dimension of one of the task variables increased by 1, while keeping all other dimensionalities the same as in the previous iteration. We then have P models, each with total dimensionality r˜j+1=r˜j+1. We then evaluate the AIC of each of these P models and keep the model that displayed the greatest decrease in AIC relative the the previous iteration for the next iteration. In this way we grow the total dimensionality of the model by one on each iteration. The algorithm is formally outlined in Algorithm 1.3

Algorithm 1.

Estimation of dimensionality

Let r(r1,,rP),ep is the elemental vector, AIC(r) is the Akaike information criterion for a model with ranks r
1: procedure DIMEST(r0,Data)
2:   rr0,AIC0AIC(r0) Initialize
3:   repeat
4:     for p=1,,Pdo ⊳ Calculate AIC for +1 rank for each task variable
5:       AICpAIC(r+ep)
6:     end for
7:     if There is no p s.t. AICp<AIC0 then Break
8:     end if
9:     p*argminpAICp,rp*rp*+1 ⊳ + 1 rank for variable that most decreases AIC
10:     r0r,AIC0AICp*
11:   until There is no p s.t. AICp<AIC0 ⊳ Stop when AIC can no longer be decreased
12:   return r
13: end procedure

6. Simulation studies

6.1. Evaluation of parameter estimation with simulated data

We applied our greedy algorithm on simulated data in order to determine if it could accurately recover the true ranks using n = 100 neurons and T =15 time points. For each run of our simulations we first selected a random dimensionality between 1–6 for each of P = 3 task variables (two graded variables with values drawn from {−2, −1,0,1,2} and one binary task variable with values {−1,1}). Using these dimensionalities, the elements of Wp and Sp were drawn independently from a N(0,1) distribution. To give us heterogeneous noise variances, the noise variance for each neuron was drawn from an exponential distribution with mean parameter σ2 = 50. The resulting average SNR for any one task variable was −0.26 (±0.75, log10 units). We then simulated observations according to our model with varying numbers of trials (N{50,200,500,1000,1500,2000}). In order to simulate incomplete observations, we set the probability of observing any given neuron on any given trial to πobs=.4. While we conducted experiments with varying numbers of trials and observation probabilities we generally found that decreased observation probabilities acted effectively as a decrease in sample size with a concomitant decrease in estimation accuracy. The results were not particularly sensitive to the precise observation probability in this regime and we report only the results for the settings listed above.

For each set of observations, we estimated the parameters of the model in one of three ways, which we describe below.

We consider the following four methods for parameter optimization:

  1. Linear regression and SVD. The elements of Bp for all p were estimated by linear regression for each neuron and time point independently. Each estimate of the complete matrix Bp could then be expressed by its singular value decomposition (SVD) as Bp=UpDpVp, where Dp is the n × T diagonal matrix of d=min{n,T} singular values. We then set the smallest drp singular values to zero with the resulting matrix of rp nonzero singular values denoted by Dp(rp). The rank-rp estimates of Wp and Sp are then given by Wp(rp)=UpDp(rp)1/2andSp(rp)=Dp(rp)1/2Vp, with the corresponding rank-rp estimate of Bp given by Bp(rp)=Wp(rp)Sp(rp).

    The corresponding likelihood is given by
    l(Bp|Z,H,D^)kTrace[(Zkpx(p)Bp)HDH′(Zkpx(p)Bp] (4)
  2. Bilinear optimization. After initializing with the rank-rp estimates of Wp and Sp from the SVD method, the parameters can be further refined by bilinear regression. On each iteration, the values of Wp’s are fixed, which leads to closed-form updates for conditional maximum likelihood estimates of Sp’s and vice versa. Thus, the algorithm will alternate between estimating Wps and Sps until convergence. The bilinear regression method uses the same likelihood as shown in (4).

  3. ECME. As described in the Supplementary Material.

  4. Maximum marginal likelihood (MML). After initializing with the ECME estimates of Wp and Sp, we estimate Sp by maximizing the marginal likelihood given by (2). No estimation of the Wp factors is required since the marginal likelihood only depends on Sp.

For each setting of trial number N, we repeated this process 100 times and evaluated how well our algorithm estimated the true model parameters. The results are presented in Figure 1B,C.

We found that ECME and MML both produced mean-squared error (MSE) that was substantially smaller at all sample sizes that either the SVD or bilinear methods. While the differences in MSE between the ECME algorithm and MML were small, Figure 1C shows that the ECME algorithm was substantially faster than either MML or bilinear regression.

6.2. Evaluation of dimension estimation with simulated data

For each of the 100 runs of our simulation experiments we also evaluated how well our algorithm estimated the dimension of the characteristic responses by evaluating the difference between the true and estimated dimension of each task variable and counting the number of times that difference was observed. The results are presented in Figure 2A.

Figure 2: Simulation studies.

Figure 2:

A: Results of simulation study evaluating performance of Algorithm 1 for dimensionality estimation by means of different parameter estimation procedures. The legend indicates the sample size. Abscissa indicates the error in dimensionality estimate. Ordinate gives the number of estimated subspaces that obtained the corresponding error. Dashed line indicates model-mismatch experiment with Poisson observations and sample size 2000. B: Results of subspace estimation by our MML method compared with dPCA. The MML method out-performs dPCA at all but the highest SNR, where performance is similar.

We found that all four methods tended to under-estimate the dimensionality as the number of trials decreased but that this underestimation was less pronounced for the ECME and MML methods, for which the vast majority of estimates resulted in the correct ranks even in the case of N = 50. Note that not only is this half the number of trials as neurons but since each neuron was only observed on about 40% of the trials this gives an average of only 20 trials per neuron. Therefore, our procedure recovers the true rank of the model the vast majority of the time even under conditions of vary small trial number relative to the size of observations.

We were surprised that despite the modest difference in MSE between the ECME and MML estimation algorithms, the dimension estimation seemed to be sensitive to these differences, with the ECME performing worse than MML despite the fact that these methods are in theory maximizing the same objective function. Nevertheless we propose that, due to the ECME algorithm’s superior speed, ECME be used as an efficient initializer for MML estimation. We found that initializing the rank estimates this way limits the use of MML for rank estimation to just a few iterations.

For neuroscience applications, the observed spike counts are better described by a Poisson distribution than by a Gaussian. We therefore evaluated the robustness of our algorithm to this type of model misspecification by performing the same dimensionality estimation experiment with 2000 trials with observations drawn from Poisson(y(t)) distribution at each time bin. Our results are virtually indistinguishable from experiments using Gaussian observations (Fig. 2B, dashed line).

7. Comparison with dPCA

7.1. Simulation experiments

The central goal of both our method and dPCA is to recover a basis that defines a set of low-dimensional subspaces that describe how the population varies with respect to each task variable (or pre-defined combination of task variables). In order to compare the quality of the subspaces identified by each method, we conducted a simple simulation study. The simulation setting was identical to those described in Section 6.1 using 100 trials per run except that, to keep the simulations as simple as possible, we defined just two binary task variables that were drawn randomly on each trial. The experiment was repeated for 100 runs.

We performed both dPCA and estimated the parameters on each run using MML and then compared the % mean-squared error between the true subspace and the estimated subspaces. We defined the true subspace based on the left singular vectors of the Bp matrices used for simulating the data. If U is the true subspace and U^ is the estimated subspace then the % mean squared error is given by

UU^U^U22/U22.

The basis for the subspace estimated by MBTDR can be obtained by first estimating each Bp = WS, where S was estimated by MML and W was estimated by its posterior mean. We then used the left singular vectors of the estimated Bp to define the estimated basis. For the dPCA estimate, the analogous subspace is defined by their “encoding” subspace [8]. For both methods we assumed the correct dimensionality. We used the version of dPCA that is for non-sequential estimation and uses cross-validated regularization parameters. The results are presented in Figure 2B.

When the subspace is recoverable (i.e. principle angle is significantly less than 90 degrees), our method is virtually always closer to the true subspace. It is notable that the principle angle is an extremely sensitive measure of errors between subspaces and that both methods provide reasonable results when checked by eye. It is also notable that any differences are observable at all, which give us confidence that these results are quite strong.

We analyzed data from a somatosensory working memory task analyzed previously using dPCA [? 15? ]. A monkey was presented with two vibratory stimuli, one at the beginning of the trial, and another one after at XX second delay. The monkey was then to report whether the first or second stimulus had the higher frequency.

8. Concluding remarks

We have introduced a new, model-based method to identify low-dimensional subspaces of neuronal activity that describe the response of neuronal populations to variations in task variables. We have also introduced a procedure of estimating both the parameters of this model and the dimensionality of each of the corresponding subspaces of activity. We compared our method in simulations to dPCA and showed that our method better recovers the low-dimensional subspace of activity for noisy data.

There are a number of additional advantages to using a model-based method for dimension reduction. The first is that our modeling framework is general enough that we could include even more structure to the model such as structured prior covariances and noise covariance. Our modeling approach also allows us to answer otherwise elusive questions about what quantities of the data are important. For example, virtually all other targeted methods effectively use peri-stimulus time histograms (PSTH’s) as the sufficient statistics for subspace estimation. One interesting revelation of our model is that the PSTHs are not sufficient statistics. The sufficient statistics of our model are (Ri,Ai,yiyi,Ni) and these sufficient statistics can not be derived directly from the PSTHs. This suggests that methods that rely solely on the PSTHs may not be capturing important characteristics of the data.

Supplementary Material

Aoi18_SI

Acknowledgments

This work was supported by grants from the Simons Foundation (SCGB AWD1004351 and AWD543027), the NIH (R01EY017366, R01NS104899) and a U19 NIH-NINDS BRAIN Initiative Award (NS104648-01).

Footnotes

3

Demonstration code is available for download at the first author’s website at http://www.mikioaoi.com/samplecode/RDRdemo.zip

Contributor Information

Mikio C. Aoi, Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544.

Jonathan W. Pillow, Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544.

References

  • [1].Cunningham John P and Byron M Yu. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11):1500–1509, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Seely Jeffrey S, Kaufman Matthew T, Ryu Stephen I, Shenoy Krishna V, Cunningham John P, and Churchland Mark M. Tensor analysis reveals distinct population structure that parallels the different computational roles of areas m1 and v1. PLoS Comput Biol, 12(11):e1005164, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Morcos Ari S and Harvey Christopher D. History-dependent variability in population dynamics during evidence accumulation in cortex. Nature Neuroscience, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, and Sahani M Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity.Journal of Neurophysiology, 102(1):614, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Zhao Yuan and Park Memming Il. Variational latent gaussian process for recovering single-trial dynamics from population spike trains. arXiv preprint arXiv:1604.03053, 2016. [DOI] [PubMed] [Google Scholar]
  • [6].Afshar Afsheen, Santhanam Gopal, Yu Byron M, Ryu Stephen I, Sahani Maneesh, and Shenoy Krishna V. Single-trial neural correlates of arm movement preparation. Neuron, 71(3):555–564, August 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Churchland Mark M., Yu Byron M., Sahani Maneesh, and Krishna V Shenoy. Techniques for extracting single-trial activity patterns from large-scale neural recordings. Curr Opin Neurobiol, 17(5):609–618, October 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Kobak Dmitry, Brendel Wieland, Constantinidis Christos, Claudia E Feierstein Adam Kepecs, Zachary F Mainen Xue-Lian Qi, Romo Ranulfo, Uchida Naoshige, and Machens Christian K. Demixed principal component analysis of neural population data. eLife, 5:e10989, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Mante Valerio, Sussillo David, Shenoy Krishna V, and Newsome William T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 503(7474):78–84, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Machens Christian K. Demixing population activity in higher cortical areas. Frontiers in Computational Neuroscience, 4(0), 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Machens CK, Romo R, and Brody CD Functional, but not anatomical, separation of “what” and “when” in prefrontal cortex. The Journal of Neuroscience, 30(1):350–360, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Izenman AJ Reduced-rank regression for the multivariate linear model. Journal of multivariate analysis, 5(2):248–264, 1975. [Google Scholar]
  • [13].Harvey Christopher D, Coen Philip, and Tank David W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature, 484(7392):62–68, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Parthasarathy Aishwarya, Herikstad Roger, Bong Jit Hon, Medina Felipe Salvador, Libedinsky Camilo, and Yen Shih-Cheng. Mixed selectivity morphs population codes in prefrontal cortex. Nature neuroscience, 20(12):1770–1779, 2017. [DOI] [PubMed] [Google Scholar]
  • [15].Brody Carlos D, Hernández Adrián, Zainos Antonio, and Romo Ranulfo. Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. Cerebral cortex, 13(11):1196–1207, 2003. [DOI] [PubMed] [Google Scholar]
  • [16].Lawrence N Probabilistic non-linear principal component analysis with gaussian process latent variable models. The Journal of Machine Learning Research, 6:1816, 2005. [Google Scholar]
  • [17].Bishop CM Pattern recognition and machine learning. Springer; New York:, 2006. [Google Scholar]
  • [18].Liu Chuanhai and Donald B Rubin. The ecme algorithm: a simple extension of em and ecm with faster monotone convergence. Biometrika, 81(4):633–648, 1994. [Google Scholar]
  • [19].Romo Ranulfo, Carlos D Brody Adrián Hernández, and Lemus Luis. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature, 399(6735):470–473, June 1999. [DOI] [PubMed] [Google Scholar]
  • [20].Romo Ranulfo, Brody Carlos D, Hernández Adrián, and Lemus Luis. Single-neuron spike train recordings from macaque prefrontal cortex during a somatosensory working memory task. In 10.6080/K0V40S4D.CRCNS.org, 2016. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Aoi18_SI

RESOURCES