Skip to main content
Journal of Neurophysiology logoLink to Journal of Neurophysiology
. 2016 Dec 7;117(3):919–936. doi: 10.1152/jn.00698.2016

Revealing unobserved factors underlying cortical activity with a rectified latent variable model applied to neural population recordings

Matthew R Whiteway 1,, Daniel A Butts 1,2
PMCID: PMC5338625  PMID: 27927786

The rapid development of neural recording technologies presents new opportunities for understanding patterns of activity across neural populations. Here we show how a latent variable model with appropriate nonlinear form can be used to identify sources of input to a neural population and infer their time courses. Furthermore, we demonstrate how these sources are related to behavioral contexts outside of direct experimental control.

Keywords: latent variable models, dimensionality reduction, two-photon imaging, barrel cortex

Abstract

The activity of sensory cortical neurons is not only driven by external stimuli but also shaped by other sources of input to the cortex. Unlike external stimuli, these other sources of input are challenging to experimentally control, or even observe, and as a result contribute to variability of neural responses to sensory stimuli. However, such sources of input are likely not “noise” and may play an integral role in sensory cortex function. Here we introduce the rectified latent variable model (RLVM) in order to identify these sources of input using simultaneously recorded cortical neuron populations. The RLVM is novel in that it employs nonnegative (rectified) latent variables and is much less restrictive in the mathematical constraints on solutions because of the use of an autoencoder neural network to initialize model parameters. We show that the RLVM outperforms principal component analysis, factor analysis, and independent component analysis, using simulated data across a range of conditions. We then apply this model to two-photon imaging of hundreds of simultaneously recorded neurons in mouse primary somatosensory cortex during a tactile discrimination task. Across many experiments, the RLVM identifies latent variables related to both the tactile stimulation as well as nonstimulus aspects of the behavioral task, with a majority of activity explained by the latter. These results suggest that properly identifying such latent variables is necessary for a full understanding of sensory cortical function and demonstrate novel methods for leveraging large population recordings to this end.

NEW & NOTEWORTHY The rapid development of neural recording technologies presents new opportunities for understanding patterns of activity across neural populations. Here we show how a latent variable model with appropriate nonlinear form can be used to identify sources of input to a neural population and infer their time courses. Furthermore, we demonstrate how these sources are related to behavioral contexts outside of direct experimental control.


the sensory cortex not only represents information from the sensory periphery but also incorporates input from other sources throughout the brain. In fact, a large fraction of neural activity in the awake sensory cortex cannot be explained by the presented stimulus and has been related to a diversity of other factors such as stimulation of other sensory modalities (De Meo et al. 2015; Ghazanfar and Schroeder 2006), location within the environment (Haggerty and Ji 2015), and numerous aspects associated with “cortical state” (Harris and Thiele 2011; Marguet and Harris 2011; Pachitariu et al. 2015) including attention (Harris and Thiele 2011; Rabinowitz et al. 2015), reward (Shuler 2006), and state of arousal (Niell and Stryker 2010; Otazu et al. 2009). Activity in sensory cortex linked to such nonsensory inputs can result in variability in the responses of neurons to identical stimulus presentations, which has been the subject of much recent study (Amarasingham et al. 2015; Cui et al. 2016; Goris et al. 2014; Rabinowitz et al. 2015). This suggests that a full understanding of sensory cortical function will require the ability to characterize nonsensory inputs to sensory cortex and how they modulate cortical processing.

However, such nonsensory inputs are typically not under direct experimental control or directly observed, in which case their effects can only be inferred through their impact on observed neural activity. For example, shared but unobserved inputs can lead to noise correlations observable in simultaneously recorded neurons (Cohen and Kohn 2011; Doiron et al. 2016), which can serve as a means to predict one neuron's activity from that of other neurons (Pillow et al. 2008; Schneidman et al. 2006; Vidne et al. 2012). Noise correlations thus demonstrate one approach to understanding neural variability, and other recent extensions of this idea have used the summed activity of simultaneously recorded neurons (Okun et al. 2015; Schölvinck et al. 2015) and local field potentials (Cui et al. 2016; Rasch et al. 2008) to capture the effects of nonsensory inputs. Notably, these approaches all focus on the effects of shared variability on single neuron activity, and thus do not fully leverage the simultaneous recordings from multiple neurons to infer shared sources of input.

An alternative is to jointly characterize the effects of unobserved, nonsensory inputs on a population of simultaneously recorded neurons. This approach is embodied in a class of methods known as latent variable models (Cunningham and Yu 2014), which aim to explain neural activity over the population of observed neurons with a small number of factors, or “latent variables.” Latent variable models evolved from classic dimensionality reduction techniques like principal component analysis (PCA) (Ahrens et al. 2012; Kato et al. 2015) and encompass a wide range of methods such as factor analysis (FA) (Churchland et al. 2010), independent component analysis (ICA) (Freeman et al. 2014), Poisson PCA (Pfau et al. 2013), demixed PCA (dPCA) (Kobak et al. 2016), locally linear embedding (Stopfer et al. 2003), restricted Boltzmann machines (Köster et al. 2014), state-space models (Archer et al. 2014; Kulkarni and Paninski 2007; Macke et al. 2011; Paninski et al. 2010; Smith and Brown 2003), and Gaussian process factor analysis (GPFA) (Lakshmanan et al. 2015; Semedo et al. 2014; Yu et al. 2009).

Here we propose a new latent variable approach called the rectified latent variable model (RLVM). This approach leverages two innovations over previous methods. First, it constrains the latent variables to be nonnegative (rectified), which is hypothesized to be a fundamental nonlinear property of neural activity (McFarland et al. 2013) that can lead to important differences in the resulting descriptions of population activity (Lee and Seung 1999). Indeed, using simulations, we show that rectification is necessary for the RLVM to recover the true activity of nonnegative latent variables underlying population activity. The second innovation is that the RLVM avoids several statistical constraints on the latent variables that are necessary in other methods; for example, it does not require them to be uncorrelated (like PCA), independent (like ICA), or follow Gaussian distributions (like FA). To enable such unconstrained estimation of model parameters, we base solutions of the RLVM on an autoencoder (Bengio et al. 2013), which allows the RLVM to efficiently scale up to large data sets from both electrophysiological and optical recordings.

We first describe the RLVM and demonstrate its application to a synthetic data set generated to resemble typical large-scale recordings produced by two-photon experiments. This synthetic data set gives us “ground truth” with which to compare RLVM performance with a range of other latent variable approaches. We demonstrate that the RLVM outperforms these alternatives across a range of conditions because of the innovations described above. We then apply the RLVM to a large two-photon data set recorded in mouse barrel cortex during a decision making task (Peron et al. 2015). The relationship between the latent variables inferred by the RLVM and the behavioral observations related to the task revealed that a large proportion of cortical activity is related to nonvibrissal aspects of the behavioral task. Furthermore, consistent with the results on the synthetic data set, the RLVM had the ability to match or outperform the other tested latent variable approaches and also identified latent variables most correlated with individual observed aspects of the experiment. These results were consistent across many neural populations and animals sampled from this data set and thus identify consistent types of latent variables governing the diverse set of neurons recorded over many experiments. In total, this demonstrates that the RLVM is a useful tool for inferring latent variables in population recordings and how it might be used to gain significant insights into how and why sensory cortex integrates sensory processing with nonsensory variables.

METHODS

Fitting the RLVM

The goal of the RLVM is to accurately predict observed neural activity ytN using a smaller set of latent variables zt0M. Here yt and zt are vectors describing the activity at each time point t, and the matrices Y = {yt}t=1T and Z = {zt}t=1T are all the observed data and latent variables, respectively, across time. The RLVM then tries to predict the observed activity yt with the zt as follows:

y^t=f(Wzt+b) (1)

where f(.) is a parametric nonlinearity and the model parameters are the coupling matrix WN×M and the bias vector bM, collectively referred to as θ = {W, b}.

Estimation of model components.

We estimate the model parameters θ and infer the latent variables Z using the maximum marginal likelihood (MML) algorithm, following Paninski et al. (2010) and Vidne et al. (2012). The MML algorithm is closely related to the expectation-maximization (EM) algorithm, as both maximize an approximation to the true log-likelihood function. The MML algorithm first infers the latent variables Z, using initial model parameters θ̂(0), and then updates the model parameters, using the newly inferred latent variables. Each of these steps corresponds to a maximum a posteriori estimate of the latent variables and model parameters, respectively, in which we maximize the sum of the data log-likelihood and a log-prior distribution (Vidne et al. 2012):

Z^(k+1)=argmaxZ0logp(Y|Z,θ^(k))+logp(Z) (2)
θ^(k+1)=argmaxθlogp(Y|Z^(k+1),θ)+logp(θ) (3)

The algorithm continues to alternate between these two steps until a convergence criterion is met. Although Eq. 2 is a constrained optimization problem, it can be transformed into an unconstrained optimization problem as described below, and thus we solve both Eqs. 2 and 3 with an unconstrained limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method. Like the EM algorithm, the MML algorithm is only guaranteed to find a local, rather than global, optimum. Thus proper initialization (described below) can in principle be important.

The MML algorithm presented in Eqs. 2 and 3 is a general procedure that can be specifically implemented for different types of data by properly defining the probability distribution in the data log-likelihood term log p(Y|Z,θ), which describes the probability of the observations given the current estimates of the latent variables and the model parameters. This term refers to the form of the expected noise distribution when considering what the model predicts vs. what is observed. For example, in what follows we use a Gaussian distribution for two-photon data but could instead use a Poisson distribution for spiking data. The forms of the log-prior terms log p(Z) and log p(θ) are in general independent of the form of the data log-likelihood term. Because this work is focused on the analysis of two-photon data we discuss the implementation of the MML algorithm that is specific to modeling two-photon data, including a discussion of our treatment of the log-prior terms.

We first address the data log-likelihood terms of the form log p(Y|Z,θ). For two-photon data, we model the observed fluorescence traces as a linear combination of the latent variables plus a bias term, ŷt = Wzt + b [Eq. 1, with linear f(.)]. Furthermore, we assume a Gaussian noise model so that p(yt|zt, W,b)N(y^t,Σ) and

logp(yt|zt,W,b)=N2log2π12logdet(Σ)(yt(Wzt+b))TΣ1(yt(Wzt+b))2 (4)

for a given time point t. The Gaussian noise model captures measurement noise that corrupts the true fluorescence signal and is commonly used in models of two-photon data (Vogelstein et al. 2010). We validated this choice by measuring the distribution of residuals from the RLVM model fits to the experimental data used in Figs. 46, which are well described by a Gaussian distribution (data not shown). For computational convenience we do not try to fit the noise covariance matrix Σ but rather model it as a constant times the identity matrix. This constant can be incorporated into the log-prior terms and hence does not explicitly show up in the final MML equations (Eqs. 9 and 10 below). By modeling the noise covariance matrix as a multiple of the identity matrix we are making the assumption that the Gaussian noise has the same variance for each neuron (isotropic noise). Although not true in general, the advantage of this simplification is that we do not need to estimate the variance parameter, and Eqs. 2 and 3 become penalized least squares problems when using L2 regularization, which can be solved analytically. Constraining the noise covariance matrix to be diagonal (anisotropic noise) leads to solving a penalized weighted least squares problem, which must be solved iteratively.

Fig. 4.

Fig. 4.

Latent variable methods applied to a 2-photon imaging data set recorded in mouse barrel cortex. A: the performance of each model in reproducing the observed data depends on the number of latent variables used, measured by R2 between the measured and predicted activity. The relative performance of the different methods is ordered the same as their applications to the simulated data (Fig. 2C, right). Because there was no clear saturation point of the R2 values, models with 6 latent variables (dashed black line) were used for subsequent analyses. B: the coupling weights of the RLVM between each neuron and each latent variable, with neuron number assigned to aid the visualization of neuron clusters associated with each latent variable (see methods). C: the spatial positions of neurons coupled to each latent variable. Here neurons whose coupling strength is >15% of the maximum coupling strength for the latent variable are shown and color-coded to show the magnitude of their coupling. The imaged neurons were within a single barrel (of mouse primary somatosensory cortex), and the coupling to latent variables exhibited no clear spatial pattern.

Fig. 6.

Fig. 6.

Latent variables inferred by PCA and a linear RLVM show a weaker relationship to individual trial variables. PCA (A and B) and a linear RLVM (where latent variables were not constrained to be nonnegative) (C and D) were fit to the same experimental data as in Fig. 5. A: latent variable time courses inferred by PCA over the same interval as in Fig. 5A, ordered from bottom to top by variance explained. There is a clear mixing of information relative to the RLVM latent variable time courses (Fig. 5A). Latent variable 3, for example, has positive deflections aligned with whisker touches (similar to RLVM latent variable 3) combined with negative deflections aligned with the onset of the reward period (opposite sign relative to RLVM latent variable 4). B, top: shaded boxes indicate which trial variables are related to the latent variables. Middle: coupling matrix between latent variables and each neuron (neurons are ordered the same as those in Fig. 5C). This illustrates how the first few principal components mix inputs from several sources, likely because PCA is based on explaining the greatest fraction of variance with each principal component rather than separating the underlying causes. Bottom: summed influence of each latent variable on the population activity (matching measures in Fig. 5C, bottom). C: latent variables inferred by a linear RLVM. D: same measures as those calculated in B. Both PCA and the linear RLVM latent variables mix features from the RLVM latent variables, which is apparent in their coupling matrices (B and D, middle).

We also make the assumption that data at different time points are conditionally independent, so that the full log-likelihood term can be written as

logp(Y|Z,θ)=log(tp(yt|zt,W,b))=tlogp(yt|zt,W,b)=12tyt(Wzt+b)22+const (5)

where x22=xi2 is the squared L2 norm of a vector x. The assumption of conditional independence is common practice when dealing with data log-likelihood terms (Bishop 2006) and allows us to factorize the full conditional distribution log p(Y|Z,θ); without this assumption the resulting data log-likelihood term would be intractable.

To further constrain the types of solutions found by the model, we choose a particular form of the log-prior term log p(Z) (Eq. 2). Many different priors are used for Z in the neuroscience literature on latent variable models, including latent dynamical systems priors (Paninski et al. 2010) and Gaussian process priors (Rabinowitz et al. 2015; Yu et al. 2009). Here we use a simple smoothing prior that penalizes the second derivative of the time course of each latent variable zi (where zi represents the entire time course of latent variable i), which can be written as

logp(zi)Dzi22 (6)

where D is the discrete Laplace operator,

D=[210121121012] (7)

The matrix-vector multiplication Dzi computes a discrete version of the second derivative of latent variable i, such that Dzi22 will be large when zi is highly varying (i.e., noisy) and small when zi is smooth. The full log-prior term log p(Z) is the sum of these terms for each individual latent variable.

The log-prior term log p(θ) in Eq. 3 likewise allows for the incorporation of additional constraints on model parameters. We use a standard zero-mean Gaussian prior on both the coupling matrix W and the biases b, so that

logp(θ)αWF2+βb22 (8)

where WF2=i,jwij2 is the Frobenius norm of a matrix W and α and β are constants that scale the relative weight of each term. This prior has the effect of preventing the model parameters from growing too large, which can hurt model performance (see Fig. A1).

Using the expressions in Eqs. 5, 6, and 8, the two-photon implementation of the general MML algorithm in Eqs. 2 and 3 becomes

Z^(k+1)=argminZ012tyt(W(k)zt+b(k))22+λZ2iDzi22 (9)
θ^(k+1)=argminθ12tyt(Wz^t(k+1)+b)22+λW2WF2+λb2b22 (10)

The λ values in front of the log-prior terms are hyperparameters that are chosen by hand (see Model Fitting Details).

The nonnegativity constraint on the latent variables Z is the defining feature of the RLVM. Although it is possible to use explicitly constrained optimization techniques, we take a different approach that is more in line with the autoencoder optimization we use to obtain initial values for the MML algorithm (see below). Instead of optimizing nonnegative latent variables zi, we substitute them with unconstrained latent variables xi that are passed through a rectified linear (ReLU) function g(.):

zti=g(xti)={0ifxti0xtiifxti>0 (11)

The model of neural activity (Eq. 1) then becomes

y^t=f[Wg(xt)+b] (12)

and unconstrained optimization techniques can be used in Eqs. 2 and 9 to solve for X instead of Z. Although the ReLU function is not differentiable at zero, we use the subdifferential approach common in the neural networks literature and define the derivative to be zero at zero (Hara et al. 2015).

Initialization of model components using an autoencoder.

In the inference of Z (Eqs. 2 and 9), there are T × M parameters to estimate (where T is the number of time steps in the experiment and M is the number of inferred latent variables), which is a very high-dimensional space to search. The prior distribution we place on the latent variables is not a strong one, and as a result this optimization step tends to get stuck in poor local minima. To avoid poor local minima, we initialize the MML optimization algorithm using initial estimates of both the model parameters and the latent variables from the solution of an autoencoder (Bengio et al. 2013; Boulard and Kamp 1989; Japkowicz et al. 2000). An autoencoder is a neural network model that attempts to reconstruct its input using a smaller number of dimensions, and its mathematical formulation is similar to the RLVM, so similar, in fact, that the model parameters in the RLVM have direct analogs in the autoencoder, as shown below. Furthermore, the optimization routine for the autoencoder is faster and better behaved than the MML algorithm, which makes it an attractive model for finding initial RLVM values.

The autoencoder takes the vector of neural activities ytN and projects it down onto a lower-dimensional space M with an encoding matrix W1M×N. A bias term b1M is added to this projected vector, so that the resulting vector xtM is given by xt = W1yt + b1. W1 is said to encode the original vector yt in the lower-dimensional space with the vector xt, which is analogous to the unconstrained latent variables xt in the RLVM (Eq. 12). Following the RLVM, we enforce the nonnegativity constraint on xt by applying the ReLU function:

zt=g(xt)=g(W1yt+b1) (13)

As with the unconstrained latent variables xt, there is a direct correspondence between the autoencoder's rectified latent variables zt in Eq. 13 and the RLVM's rectified latent variables zt in Eq. 1. The autoencoder (again like the RLVM) then reconstructs the original activity yt by applying a decoding matrix W2N×M to zt and adding a bias term b2N. The result is passed through a parametric nonlinearity f(.) so that the reconstructed activity y^tN is given by

y^t=f(W2zt+b2) (14)

which matches the RLVM model structure in Eq. 1. The weight matrices and bias terms, grouped as Θ = {W1,W2,b1,b2}, are simultaneously fit by minimizing the reconstruction error L(yt,ŷt) between the observed activity yt and the predicted activity ŷt:

Θ^=argminΘL(yt,y^t) (15)

Once this optimization problem has been solved with standard gradient descent methods, we initialize the RLVM model parameters in Eq. 2 with θ̂(0) = {W2,b2}. A notable advantage of the autoencoder is that there is no need to alternate between inferring latent variables and estimating model parameters, as in Eqs. 2 and 3: once the model parameters have been estimated with Eq. 15, the latent variables can be explicitly calculated with Eq. 13.

For modeling two-photon data (as above), the noise distribution is Gaussian and the nonlinearity f(.) in Eq. 14 is assumed to be linear. The reconstruction error L(yt,ŷt) for Gaussian noise is the mean square error (again assuming equal noise variances across neurons), so in this special case of Eq. 15 the autoencoder estimates for the weights and biases are given by

Θ^=argminΘ12tyt-y^t22=argminΘ12tyt-(W2zt+b2)22=argminΘ12tyt-(W2g(W1yt+b1)+b2)22 (16)

and we perform this optimization using an L-BFGS routine to obtain the weights and biases.

We also include regularization terms for the model parameters, which prevent overfitting to the training data and can improve the model's ability to generalize to new data (Bishop 2006). As we saw previously, these regularization terms can also be interpreted as log-prior distributions on the model parameters in the probabilistic setting. A more general optimization problem for the autoencoder that includes both the reconstruction error and these regularization terms is

Θ^=argminΘL(yt,y^t)+λ12W1F2+λ22W2F2+λ32b122+λ42b222 (17)

Large values of λi will encourage small values in the corresponding set of parameters. Furthermore, the use of regularization on the weight matrices helps to break a degeneracy in the autoencoder: because the reconstructed activity ŷt involves the product between the weights W2 and the latent variables zt (Eq. 14), an equivalent solution is given by the product of c × W2 and (1/c) × zt for any positive constant c. Applying a regularization penalty to the weights W2 limits the range of values W2 can take and thus helps to set a scale for both the weights and the latent variables.

We also use “weight-tying” (Bengio et al. 2013), where the encoding and decoding weight matrices are constrained to be transposes of each other, i.e., W2 = (W1)T. This has the effect of nearly halving the number of model parameters that need to be estimated, which speeds up the model fitting procedure (see Fig. A2). Not enforcing this weight-tying constraint commonly results in qualitatively similar solutions (see Fig. A3), and as a result all models in this report initialized with the autoencoder employ weight-tying.

Model Fitting Details

Fitting the model parameters and latent variable time courses with the MML algorithm requires alternating between inferring latent variables and estimating model parameters. We monitored the log-likelihood values throughout this procedure and ensured that the algorithm stopped only after additional iterations brought no further improvement. We compared the fitting behavior of the MML using random initializations vs. autoencoder initializations (see results). For these tests, we used the same regularization values for the latent variables (λZ = 1) and for the model parameters (see below) to facilitate model comparisons.

The latent variable models we used to analyze the simulated and experimental data were the RLVM [regularization parameters set as λ1 = λ2 = 1,000/(number of latent variables), λ3 = λ4 = 100, lambdas numbered as in Eq. 17; code available for download at www.neurotheory.umd.edu/code], PCA (using MATLAB's built-in function pca), FA (using MATLAB's built-in function factoran; default settings), and ICA (using FastICA, available for download at http://research.ics.aalto.fi/ica/fastica/; default settings). Autoencoder fitting was performed with a MATLAB implementation of the L-BFGS routine by Mark Schmidt, available for download at http://www.cs.ubc.ca/~schmidtm/Software/minFunc.html. The FA results reported with the data set from Peron et al. (2015) used a PCA-based algorithm (available for download at http://www.mathworks.com/matlabcentral/fileexchange/14115-fa) rather than the maximum likelihood-based algorithm factoran, which proved prohibitively inefficient on such a large data set.

Evaluating Model Performance

Unless otherwise noted, model fitting was performed with fivefold cross-validation (data are divided into 5 equally sized blocks, with 4 used for training and 1 for testing, with 5 distinct testing blocks). Because of the different natures of the simulated and experimental data, we use different measures to assess the quality of model fits on these different types of data sets.

To assess the quality of model fits on the simulated data we employed two different measures, one to evaluate the quality of the latent variables produced by the model and another to evaluate how well neural activity could be predicted. First, we measured the ability of each model to infer the true latent variables. For all models (RLVM, PCA, FA, ICA), model parameters were fit with the training data. Then, to calculate the latent variables on the testing data, we used the activity of the neurons and the encoding matrices of each model (e.g., Eq. 13 for the RLVM) that were learned from the training data. Given that {zi}i=1M are the M true latent variables and {j}j=1P are the P latent variables inferred by a given model, the latent variable “maxcorr” measure is defined to be

maxcorr=1Mi=1Mmaxj  {1,P}|corr(zi,z^j)| (18)

where corr is the Pearson correlation coefficient. The maxcorr simply measures the correlation between the inferred latent variable that best matches each true latent variable, averaged over the true latent variables. There is no restriction on the relationship between M and P; for example, if M < P, then the maxcorr measure will be close to 1 if each true latent variable is captured by at least one inferred latent variable.

The second measure of model performance for simulated data assessed how well the models could predict the overall population activity. For all models (RLVM, PCA, FA, ICA), model parameters were fit with the training data for all neurons. Then, the activity predicted by the models on the testing data was calculated using the encoding and decoding matrices of each model. The R2 values reported are those obtained by comparing the true activity yti of neuron i at time t with the activity predicted by the various methods ŷti and averaging over all N neurons:

R2=1Ni=1N[1t(ytiy^ti)2t(ytiy¯i)2] (19)

where ȳi is the average activity of neuron i.

For experimental data we do not have access to the “true” underlying latent variables, and thus cannot calculate a measure similar to the maxcorr measure presented above. We can, however, assess how well the models predict the overall population activity as in Eq. 19, with one caveat. It is possible with experimental data that a latent variable will capture the activity of a single neuron and thus inflate the resulting R2 measure calculated above. To address this issue, we employed a procedure that is similar to the leave-one-out prediction error introduced in Yu et al. (2009). For all models (RLVM, PCA, FA, ICA), model parameters were fit with the training data for all neurons. Then, to determine how well each model was able to capture the activity of a single neuron with the testing data, we used the activity of all other neurons to calculate the activity of the latent variables (by setting the encoding weights of the left-out neuron to 0). We then performed a simple linear regression using the activity of the latent variables to predict the activity of the left-out neuron and used this prediction in the calculation of the R2 measure in Eq. 19. Note that for this leave-one-out procedure if just a single neuron is contributing to the activity of a latent variable, this procedure will result in a small R2 value for that neuron during cross-validation.

Simulated Data Generation

We evaluated the performance of the RLVM using simulated data sets, which were generated with five nonnegative latent variables that gave rise to the observed activity of 100 neurons. Note that these choices reflect our core hypotheses of the properties of latent variables in the cortex, and also match the assumptions underlying the RLVM model structure. Latent variables were generated by creating vectors of random Gaussian noise at 100-ms time resolution and then smoothing these signals with a Gaussian kernel. To enforce the nonnegativity constraint on the latent variables, a positive threshold value was subtracted from each latent variable, and all resulting negative values were set to zero. Correlations between different latent variables were established by multiplying the original random vectors (before smoothing) by a mixing matrix that defined the correlation structure. Although smoothing and thresholding the correlated latent variables changed the correlation structure originally induced by the mixing matrix, the new correlation structures obtained by this procedure were qualitatively similar to those seen in experimental data.

The latent variables thus obtained acted as inputs to a population of neurons (Fig. 1). To calculate the coupling weights between the latent variables and the neurons in the population, a coupling matrix was created to qualitatively match the coupling matrices found in experimental data (compare Fig. 4B and Fig. 1C). Since the experimental data used below in this article come from a two-photon imaging experiment, we chose to simulate data resembling two-photon fluorescence traces. To compute simulated fluorescence traces for each neuron, first the firing rate of the neuron was computed as the weighted sum of the latent variables, with the weights defined in the coupling matrix. The resulting firing rate was used to produce a spike train with a Poisson spike generator. The spike train was then convolved with a kernel to create the calcium signal, and finally Gaussian random noise was added to generate a simulated fluorescence signal. To ensure that our results were not dependent on a particular set of parameters used to generate the data, we also simulated data while varying the number of latent variables, the number of nonzero weights in the coupling matrix, and the signal-to-noise ratio (see Fig. 3).

Fig. 1.

Fig. 1.

RLVM structure. A: the RLVM predicts the observed population response yt = [yt(1) yt(2)yt(N)]T at a given time point t (dashed line, right) using a smaller number of nonnegative latent variables zt = [zt(1) zt(2)zt(M)]T (dashed line, left). The latent variables are weighted by a matrix W such that wij is the weight between neuron i and latent variable j, and the resulting weighted inputs are summed and passed through a nonlinearity f(.). There are additional offset terms for each neuron, not pictured here. B–D: the hypothesized structure of the cortical network motivating the RLVM formulation is used to generate synthetic data, using 5 latent variables. B: factors underlying cortical activity will often be correlated with each other, and our simulation of cortical activity used the correlation matrix shown between latent variables in generating simulated activity. C: the weight matrix between latent variables and each neuron, generated to approximate the coupling matrices found with experimental data (compare to Fig. 4B). D: the measured pairwise correlation matrix between neurons, computed from simulated data. The correlations predicted by the RLVM arise solely from shared latent variable input and their correlations with each other, rather than pairwise coupling.

Fig. 3.

Fig. 3.

Performance of latent variable methods across a range of simulations. Simulated data sets are generated as described in Fig. 2, using a range of the number of true latent variables (A), the coupling matrix density (B), and the signal-to-noise ratio (SNR; C). When generating coupling matrices with different numbers of latent variables (A), each method was fit using the true number of latent variables. For the coupling matrices with different densities (B), each neuron had a nonzero weight to at least 1 latent variable and the coupling matrix density is defined as the proportion of nonzero weights beyond this 1-per-neuron baseline. The performance of each method was characterized by the latent variable maxcorr measure (see methods) (top) and the population activity R2 (bottom). Error bars represent SE over 20 randomly generated data sets. The parameter values used for the simulated data in Fig. 2 are indicated on each plot (dashed black lines).

Experimental Data

Experimental protocol and data preprocessing.

We evaluated the RLVM on data from the Svoboda lab (Peron et al. 2015), which have been made publicly available at http://dx.doi.org/10.6080/K0TB14TN. In this experiment mice performed a tactile discrimination task with a single whisker. During a given trial, the activity from neurons in layers 2/3 of barrel cortex expressing the GCaMP6s calcium indicator was recorded with two-photon imaging with three imaging planes set 15 μm apart. These imaging planes constituted a subvolume, and eight subvolumes were imaged during a given session. Furthermore, those same subvolumes were imaged across multiple experimental sessions, and the resulting images were later registered so that activity of individual regions of interest (ROIs) could be tracked across the multiple sessions. Raw fluorescence traces were extracted from each ROI and neuropil corrected. For each ROI a baseline fluorescence F0 was determined with a 3-min sliding window and used to compute ΔF/F = (FF0)/F0.

Data selection.

The publicly available data set contains the ΔF/F fluorescence traces of tens of thousands of neurons imaged over multiple sessions for eight different mice. To select subsets of this data for analysis with the latent variable models, we restricted our search to volume imaging experiments in which somatic activity was imaged in trained mice. We then looked for subsets of simultaneously imaged neurons that maximized the number of neurons times the number of trials imaged, selecting nine different sets of imaged neurons, three sets from each of three different mice. Within each set, neurons were removed from this selection if they met one or both of the following criteria: 1) >50% of the fluorescence values were missing (indicated by NaNs); 2) the fluorescence trace had a signal-to-noise ratio (SNR) < 0.1. To estimate the SNR, a smoothed version of the fluorescence trace was estimated with a Savitzky-Golay filter (using MATLAB's built-in function sgolayfilt). The noise was estimated using the residual between the original trace and the smoothed trace. The SNR was then calculated as the variance of the smoothed trace divided by the variance of the noise. After removal of individual neurons according to the above procedure, we then removed individual trials from the remaining data selection if NaN values existed in the whisker measurements or in one or more of the fluorescence traces. See Table A1 in appendix for more information about the specific subpopulations of neurons analyzed.

Alignment of fluorescence traces across sessions.

As described above, the data from each experiment we used consisted of imaging the population activity over several recording sessions. Although fluorescence traces for each neuron were corrected for different baseline fluorescence levels in the online data set, we found it necessary to recalculate session-specific baseline fluorescence levels in order to concatenate traces across different sessions. [Unlike the analyses in the original work (Peron et al. 2015), the models considered here were particularly sensitive to this baseline level because all fluorescence traces were analyzed jointly.] In the original work, baseline fluorescence level was calculated using the skew of the distribution of raw fluorescence values, under the assumption that more active neurons will have more highly skewed distributions. However, this monotonic relationship breaks down for very active neurons, whose distributions are not as skewed since there are very few values near the baseline level. Because we found many neurons in the data set that fell into this last category, we recalculated baseline fluorescence levels on a session-by-session basis.

Using basic simulations of how the distribution of fluorescence values of a Poisson neuron depends upon its mean firing rate and SNR, we could match this with the data from each neuron to unambiguously infer its baseline fluorescence level. Specifically, for each neuron and each session, we measured the SNR of its fluorescence (described above) and also measured the skewness of its distribution of fluorescence (using MATLAB's built-in function skewness). We simulated neural activity with the same SNR while varying the mean firing rate until the resulting distribution of values matched the measured skewness. Once the optimal mean firing rate was determined, we could then use the simulation to determine the best estimate of the baseline fluorescence level on a session-by-session basis. This procedure led to improved model estimation for all latent variable methods.

Sorting of coupling matrices.

The ordering of simultaneously recorded neurons is arbitrary, so we chose the ordering for the display of the coupling matrices W to highlight groups of neurons that share similar coupling patterns. We first sorted the rows, using the coupling weights to the first latent variable (first column) for all neurons with a weight higher than a predefined threshold value. We then sorted all remaining neurons with coupling weights to the second latent variable (second column) above the same threshold. This procedure is repeated for each column of W and produces a block diagonal structure (e.g., Fig. 4B). The last column is sorted without using the threshold so that it contains all remaining neurons.

Quantifying influence of individual latent variables on population activity.

To determine the proportion of population activity driven by each latent variable in the experimental data sets, for each neuron we calculated the variance of the latent variable weighted by the neuron's coupling strength to that latent variable, divided by the variance of the neuron's measured activity, which was smoothed with a Savitzky-Golay filter to remove noise variance (implemented with the MATLAB function sgolayfilt). We considered a neuron to be driven by that latent variable if the proportion of total measured activity exceeded 0.10. To determine the proportion of predicted population activity, we performed a procedure similar to that above but divided by the variance of the predicted activity rather than the smoothed measured activity. [Note that, because latent variables can be correlated, these proportions will not add to 1 since we ignored cross-covariances.] These values were then averaged over all neurons to obtain a measure of the proportion of predicted activity driven by the given latent variable.

RESULTS

Model Formulation

The goal of latent variable modeling is to describe the activity of many simultaneously recorded neurons with a small number of latent variables. Consider a population of N neurons, with the ensemble of observed activity at time t represented by a vector yt; this observed activity could, for example, be spike counts from multielectrode recordings or fluorescence values from two-photon microscopy. The M latent variables will also have a vector of activity zt at each time point, where M is a free parameter of the model (Fig. 1). The RLVM then attempts to predict the population activity yt as a function f(.) of a linear combination of the latent variables zt:

y^t=f(Wzt+b)

where W is a matrix of weights that describes how each neuron is coupled to each latent variable and b is a vector of bias terms that account for baseline activity. For two-photon data, it is appropriate to use a linear function for f(.), while for spiking data one can use a function that results in nonnegative predicted values to match the nonnegative spike count values (Paninski 2004).

The vector of latent variables zt will in principle represent all factors that drive neural activity, including both stimulus-locked and non-stimulus-locked signals. These factors may or may not be related to observable quantities in the experiment. For example, they could be related to “external” observables like motor output (Churchland et al. 2012) and pupil dilation (Vinck et al. 2015) or “internal” observables like the local field potential (Cui et al. 2016), population rate (Schölvinck et al. 2015), or amount of dopamine release (Schultz et al. 2015). However, while latent variables might be related to experimental observables, here we make no assumptions on such relationships in determining them.

The nonnegative assumption on these latent variables is a key distinction between the RLVM and other latent variable models. This assumption is motivated by the nonnegativity of neuronal firing rates and spike counts, which presumably underlie the sources of input being represented by the latent variables. This is not to imply that neural activity cannot represent negative variables; for example, neurons that have high spontaneous firing rates can represent negative quantities as a decrease below that baseline rate. The RLVM does in fact allow for this situation when a latent variable has a nonzero baseline value, because it can have positive and negative deviations from that baseline. Furthermore, the extent to which a latent variable takes advantage of the rectification is learned by the model and does not need to be specified a priori. Thus, by explicitly incorporating rectification, the RLVM finds solutions that are not easily generated by other approaches.

Indeed, rectified latent variables will often be zero, and thus also generate sparse responses, which serves as a second motivation for the nonnegativity of the latent variables. Sparse processes could, for example, represent episodic inputs into cortex from the environment or from other cortical areas. Many models of neural activity cannot in principle find sparse latent variables: for example, they will rarely explain large amounts of variance (PCA) and will never have a Gaussian distribution (FA). One of the advantages of the RLVM is that even though it does not require distributional assumptions, it is in fact able to capture sparse latent variables (see, for example, Fig. 5A, latent variable 3), since rectification can force many values to be zero.

Fig. 5.

Fig. 5.

Relationship of the latent variables inferred by the RLVM to experimentally observed trial variables. A: an 80-s sample of the predicted activity of 6 latent variables (extending over 8 task repetitions), demonstrating the relationship between the latent variables and the following “trial variables” observed during the experiment: the auditory cue that signals the animal to make its choice (blue vertical lines), the onset of reward delivery when the animal makes the correct choice (red vertical lines), the timing of whisker touches against the pole (bottom, green), and the timing of licks (bottom, purple). Latent variables are ordered (bottom to top) based on the magnitude of their variance. B: the trial variables at different time lags were used to predict the activity of each latent variable by linear regression, with the relative weights color-coded. The trial variable with the strongest relationship (measured by R2) is marked with an asterisk, and the corresponding R2 value is displayed. C, top: shaded boxes indicate which trial variables are capable of predicting each latent variable. Middle: fraction of measured activity of each neuron accounted for by each latent variable. The resulting matrix is related to a weighted version of the coupling matrix (Fig. 4B) and demonstrates the relative contribution of each latent variable to the observed population activity. Bottom: fraction of observed neurons driven by each latent variable (red) and relative fraction of predicted neural activity explained by each latent variable (blue).

The nonnegative assumption also addresses the “rotational degeneracy” characteristic of any model that contains a matrix-vector multiplication, as in Eq. 1. In such cases, there is no unique solution because, for any orthogonal matrix U, the two solutions Wzt and (WUT)(Uzt) are equivalent since UTU = I (the identity matrix). To address this issue, different latent variable models considered in this report impose different constraints. For example, PCA assumes that the latent variables are uncorrelated and explain the greatest amount of variance, FA assumes that the latent variables are independent and normally distributed, and ICA assumes that the latent variables are as independent as possible. The RLVM employs a different constraint—nonnegativity of the latent variables—that we expect to be more faithful to the true constraints of neural processing, as described above.

A second key innovation of the RLVM is how it is fit to data sets. We initially used the MML algorithm, which is similar to the EM algorithm for fitting latent variable models (Paninski et al. 2010). However, inferring the time course of latent variables is challenging because of their high dimensionality, as they have a different value for each time point in the experiment. Fitting the model using random initializations for both the latent variables zt and model parameters {W, b} is unlikely to find the best solutions given such a high-dimensional space. As a result, we used an autoencoder framework (Bengio et al. 2013) to fit all model components simultaneously. The autoencoder optimizes both zt and {W, b} by minimizing the mean square error (or any appropriate cost function) between the true activity and the activity predicted by Eq. 14. The resulting autoencoder parameters provide both a reasonable initialization for the MML algorithm as well as a good solution to the RLVM without further fitting (detailed below).

Validation of the RLVM with Simulated Data

To understand the solutions found by the RLVM relative to other latent variable models, we generated simulated data with five latent variables that provided input to 100 neurons (Fig. 1A). These data were generated under the assumption that the input to each neuron is a weighted combination of a small number of correlated, nonnegative latent variables. The latent variable activity was filtered by each neuron's coupling and passed through a spiking nonlinearity to produce its firing rate, which was then used to randomly generate spike counts with a Poisson process. Because in this study we are considering application to two-photon imaging data, we then further processed each neuron's activity by convolving the generated spike trains with a kernel to simulate calcium dynamics and finally adding Gaussian noise. It should be noted that while this method of generating neural population activity reflects the mathematical form of the RLVM, both this simulation and the form of the RLVM are designed to describe the types of latent variables driving real neural data (as motivated above).

Evaluation of RLVM fitting methods.

We first consider the RLVM applied to these simulated data. The RLVM is fit in two stages. In the first stage, an autoencoder is fit to the data and efficiently converges to solutions for the latent variables and parameters. In the second stage, the MML algorithm is initialized with the autoencoder solutions and can then explore solutions to the latent variables that can have more general forms because of the less restrictive constraints (see methods). Here we compare the quality of the model fits resulting from this two-stage procedure (autoencoder initialization) to those resulting from a random initialization of the MML algorithm. To quantify the goodness of fit for each model type (random initialization vs. autoencoder initialization) we calculated the Pearson correlation coefficients (r) between the true and inferred latent variables. Using random initializations led to poor solutions for the latent variables (r = 0.781 ± 0.020; mean r ± SE over 20 initializations), whereas the two-stage procedure led to far more accurate solutions (r = 0.971 ± 0.001). The superior results achieved by initializing with the autoencoder solution (itself initialized randomly) are due to the high dimensionality of the problem—in the MML algorithm employed here there are relatively few constraints imposed on the latent variables (nonnegativity and some degree of smoothness, see methods), which results in many local minima. In contrast, the latent variables of the autoencoder are constrained to be a linear combination of the recorded population activity, and this constraint results in a much smaller space of model solutions.

In fact, we found that the latent variables resulting from the two-stage procedure were extremely similar to the initial values found by the autoencoder itself (r = 0.994 ± 0.000). The main difference between these solutions is that the MML optimization, which starts from the autoencoder solutions, smooths the time course of the latent variables, whereas the autoencoder latent variables are not generally smooth. As a result, except where otherwise stated, we use the autoencoder solution—forgoing the MML step of the algorithm—as a proxy for the full RLVM performance. Because the latent variables from the autoencoder do not have to be separately inferred for cross-validation data, this choice also provides a more direct comparison of the RLVM with the other latent variable models considered below.

We also tested how the performance of the RLVM depends on parameters governing both the simulated data generation and the fitting procedure to better understand the autoencoder's sensitivity to these variables. We found that the autoencoder can accurately recover the latent variables and coupling matrix even with small amounts of data (see Fig. A1A) and low SNR (see Fig. A1B). We explored the sensitivity of the autoencoder to different values of the regularization parameter on the encoding and decoding weights (λ1 and λ2, respectively, in Eq. 17), and found that the results obtained by the autoencoder are constant across several orders of magnitude (see Fig. A1C). In practice, we also found that the autoencoder solutions were robust given random initializations of the autoencoder parameters, suggesting that the model was not prone to local minima. These experiments suggest that the autoencoder is a robust fitting method for the RLVM that does not need large amounts of data or precise tuning of optimization parameters to produce accurate results.

We also tested whether the RLVM's nonnegativity constraint is essential for recovering the correct latent variables from the simulated data. Again using r as a goodness-of-fit measure for the inferred latent variables, we fit the RLVM to the simulated data (using the autoencoder) with different functions for g(.) in Eq. 13. We found that using the rectified nonlinearity (ReLU function) led to much more accurate solutions (r = 0.963 ± 0.002; mean r ± SE over 20 initializations) than using a nonrectified (linear) version of the RLVM (r = 0.573 ± 0021). The linear version places no constraints on either the latent variables or the coupling matrix and thus cannot resolve the rotational degeneracy described above, which results in infinitely many ways for this linear model to reconstruct the observed activity [i.e., the 2 solutions Wzt and (WUT)(Uzt) are equivalent for any orthogonal matrix U]. Although this implies that the linear version could in principle find the correct nonnegative latent variables, the lack of constraints makes it unlikely that the linear version will identify the true latent variables. This illustrates the importance of using the nonlinearity to enforce the nonnegativity of latent variables, in order for the RLVM to recover the latent variables generated with such a nonnegative constraint.

Comparison of RLVM with other latent variable methods.

To understand how the RLVM compares with other latent variable methods, we also fit PCA, FA, and ICA models to the simulated data (Fig. 2). We first compared the latent variables inferred by the different models (Fig. 2A), using a measure, maxcorr, that identifies the maximum correlation between each predicted latent variable and the true latent variables (see methods). The RLVM and FA outperform PCA and especially ICA and are able to largely predict the true latent variable activity once the number of inferred latent variables matches the true number of latent variables. An important feature of the RLVM and FA fits is that these two methods still infer meaningful latent variables even when the number of inferred latent variables is incorrectly specified: when the number of inferred latent variables is larger than the true number, both methods infer one latent variable that is highly correlated with each of the true latent variables and the remaining inferred latent variables just capture noise in the data. Because of this behavior, the performance of the RLVM and FA with respect to the maxcorr measure does not decline. The good performance of the RLVM was expected, given that the data were generated according to the assumptions of that model. The good performance of FA in reproducing the latent variables was somewhat surprising, given that it assumes that the latent variables are independent Gaussian variables. However, this assumption only applies in determining the initial coupling matrix, and the FA performance results from how the final coupling matrix is determined through varimax rotation (MATLAB default). The varimax rotation criterion maximizes the variance of the squared entries in each column of the coupling matrix, summed across all columns (Koch 2013). This has the effect of changing the weights in each column so that only a few weights are of large magnitude, while the rest are close to zero. Because the true coupling matrix mostly has this structure, FA is able to accurately capture that structure by varimax rotation. Once this final coupling matrix has been found, the resulting latent variables are then determined by linear regression (MATLAB default), which makes no assumptions about their distribution. PCA and ICA do not infer the correct latent variables because they make assumptions about the latent variables being uncorrelated (PCA) or independent (ICA), neither of which is true of the simulated data.

Fig. 2.

Fig. 2.

Comparisons between latent variable methods applied to simulated data. Four different latent variable methods were fit to data that were simulated with 5 latent variables (Fig. 1, B–D) and evaluated with cross-validated model performance measures. All error bars represent the SE over cross-validation folds. A, C, and D, left, demonstrate results from models with the correct number of latent variables, but performance measures (right) explore different numbers of latent variables. A, left: time course of a representative latent variable compared with the equivalent inferred latent variable from each method. Note that the FA and RLVM methods are both highly overlapping with the true latent variable. a.u., Arbitrary units. Right: latent variable maxcorr, which measures the correlation between the true and inferred latent variables (see methods), plotted against the number of latent variables specified during the fitting procedure. Model performance in each case plateaus for the true number of latent variables, indicating that even when overspecifying the number of latent variables the RLVM and FA still infer latent variables that match the true ones. B: matrices of coupling weights between neurons and latent variables, inferred by each method. For comparison, the coupling matrix used to generate the simulation is shown on left (reproducing Fig. 1C). C, left: representative simulated fluorescence trace of 1 neuron compared with the corresponding trace predicted by each method. Despite their performance in predicting the latent variables (A), here FA does poorly and PCA does well, as does the RLVM. Right: R2 values (median across neurons) between true and predicted fluorescence traces. D, left: cross-correlograms between 2 example pairs of simulated neurons compared with the corresponding cross-correlograms based on traces predicted by each method. Right: ability of each method to reproduce the pairwise cross-correlations between neurons at zero time lag, measured as the normalized inner product between the true correlation matrix and those calculated from predicted traces for each method.

A second aspect of the performance of the different latent variable models was based on how well each method captured the coupling weights between these latent variables and each neuron. In this regard, the RLVM and FA performed much better than PCA or ICA (Fig. 2B). Because the RLVM simultaneously infers the latent variables and estimates the coupling matrix, the accurate inference of the latent variables (Fig. 2A) necessarily implies an accurate estimation of the coupling matrix (assuming the overall population activity is well predicted; see next paragraph). PCA and ICA also estimate both model components simultaneously, but again the strong assumptions these methods place on the latent variables prohibit their accurate estimation of the coupling matrices. For FA the initial coupling matrix resembled that of PCA, but the final coupling matrix resulting from varimax rotation bears a much closer resemblance to the true coupling matrix. However, the varimax rotation was not able to accurately capture gradients in the magnitude of the weights (Fig. 2B; compare the red diagonal blocks in the FA coupling matrix with the true coupling matrix).

For all four models considered here, the predicted activity of an individual neuron is given by a weighted sum of the latent variables (Fig. 2A), with weights given by the proper row of the coupling matrix (Fig. 2B). To quantify the accuracies of the resulting model predictions, we used the coefficient of determination (R2; see methods) between the true and predicted activity (Fig. 2C). Interestingly, even though the RLVM and FA produced similar latent variables and coupling matrices, FA did not predict the population activity as well as the RLVM. This was mostly due to many large weights in the FA coupling matrix, which result from the varimax rotation step. Because the final latent variables from the FA algorithm are determined with linear regression by using the varimax-rotated coupling matrix as a predictor for the population activity, the population activity predictions are constrained by an improperly scaled coupling matrix, and the result is that neurons with estimated coupling weights that are too large in magnitude are not well predicted.

Perhaps surprisingly, PCA performed just as well as the RLVM in predicting the observed activity, even though PCA did not infer the correct latent variables or estimate the correct coupling weights. The reason for this is that the RLVM and PCA both minimize the reconstruction error in their cost function (explicitly and implicitly, respectively); however, because PCA does not constrain the latent variables to be positive, it reconstructs the population activity using both positive and negative values. This leads to differences in the latent variables (Fig. 2A, left) and coupling matrices (Fig. 2B) but can result in an equivalent prediction of activity (Fig. 2C, left). This difference between the RLVM and PCA in their descriptions of the population activity is a crucial point that we return to when evaluating the PCA solutions on real data.

Finally, we evaluated each method on its ability to account for observed correlations between neurons. Many previous approaches have focused on explaining pairwise correlations directly (Ohiorhenuan et al. 2010; Pillow et al. 2008; Schneidman et al. 2006), which requires parameters for each pair of neurons. However, as demonstrated by our example simulation, even just five latent variables can produce a complex pattern of pairwise interactions (Fig. 2D, left). Thus latent variable methods offer the ability to explain such correlations using many fewer parameters (Vidne et al. 2012). To quantify each model's ability to capture these correlations, we compare the cross-correlogram at the zero-time-lag point between data and prediction from each neuron pair (which forms the correlation matrix). This agreement was measured using the overlap between the true correlation matrix and the predicted correlation matrix (Fig. 2D, right). The results mirror the ability of each method to predict the population activity (Fig. 2C), with the RLVM and PCA capturing more of the correlation structure than FA and ICA.

To demonstrate how the above results generalize to different data sets, we performed a range of simulations (e.g., Fig. 2) while varying the number of latent variables (Fig. 3A), the number of nonzero elements in the coupling matrices (Fig. 3B), and SNR (Fig. 3C). We characterize the performance of each method by its ability to recover the true latent variables (Fig. 3, top) and by its ability to reconstruct the population activity (Fig. 3, bottom). For all data set variations, the comparison between the RLVM and FA is similar to that seen in Fig. 2: they perform roughly equivalently in their ability to recover the true latent variables, but FA is not able to reconstruct the population activity as well as the RLVM (for the same reasons discussed above). Comparison between the RLVM and PCA reveals the opposite trend: the two perform equivalently in their ability to reconstruct the population activity, but PCA is not able to recover the true latent variables as well as the RLVM (also mirroring the conclusions drawn from Fig. 2).

Application of RLVM to Large-Scale Two-Photon Experiments in Primary Somatosensory Cortex

We next applied the RLVM to the experimental data set from Peron et al. (2015). We selected this data set because it involves a complex task with several “observables” related to behavior and task context, many of which are outside of direct experimental control but potentially related to cortical activity. Additionally, these data included a large number of neurons recorded over long periods of time, which is useful for the performance of any latent variable model. In this experiment, mice performed a pole localization task, in which a pole was lowered at a distal or proximal location next to the animal's snout. All but one whisker was trimmed on that side of the snout, with the single remaining whisker corresponding to the imaged barrel in primary somatosensory cortex (S1). The animal had to signal the location of the pole after a delay period by licking one of two lick ports after the onset of a brief auditory cue. For the analyses in this work, we used a particular subset of available data sets corresponding to different imaged populations of S1 neurons (see Table A1, appendix), selected on the basis of the size of the neural population imaged, the length of time imaged, and its SNR (see methods).

For a given imaged population of neurons, we first determined how well the different latent variable methods predicted the observed population activity using different numbers of latent variables (Fig. 4A). The relative performance of the methods was similar to their performance on the simulated data (Fig. 2C, right). For the RLVM, PCA, and FA, there was at first a rapid increase in prediction performance as the number of latent variables increased, with the performance beginning to plateau between 5 and 10 latent variables. Because each additional latent variable adds performance, there is no clear number of “true” latent variables that generated the data. However, it is important to note that relatively few are needed before the performance plateaus. Because there is no explicit point where this occurs, we selected a point where there was only a marginal increase in performance (6 latent variables) for all subsequent analyses.

Once the latent variables are determined, the coupling matrix of the RLVM demonstrates how each neuron combines these variables to produce its predicted activity (Fig. 4B). One of the advantages of using two-photon data is that it provides the spatial locations of the neurons, and we can use that information to determine whether there is any spatial structure in the coupling weights to the latent variables. To look for spatial structure of the neurons coupled to each latent variable, we plotted the spatial locations of the neurons with an absolute magnitude of coupling weight to a given latent variable >15% of the maximum absolute magnitude for each (Fig. 4C). The positive and negative weights are intermingled in these plots, and no discernible spatial structure exists. This is expected in part because these neurons were imaged within a single barrel, and thus all belong to a single cortical column. Nevertheless, this illustrates how latent variables can in principle provide a new way to investigate the functional organization of cortex.

While with simulated data we were able to directly compare the latent variables inferred by each method with the ground truth, the experimental data provide no direct way to validate the latent variables that each method detected. Instead, we hypothesized that latent variables will be related to factors that might be directly observed in the experiment. In this case, there were four “trial variables” measured in this data set: the timing of whisker touches against the pole, the onset of the auditory cue that signals the animal to make its choice, the onset of reward delivery when the animal makes the correct choice, and the timing of licks. We compared the time course of latent variables discovered by the RLVM to these different elements of the experiment, demonstrating clear relationships (Fig. 5A). For example, the activity of latent variable 3 is only active in the same periods where there were whisker touches. In the meantime, both latent variables 1 and 2 and 4 appear to be correlated with the choice cue and/or following reward—note that the animal had high performance, so was rewarded on all trials in this case.

To quantify these relationships, we used linear regression to predict the activity of each latent variable with the four observed trial variables, using R2 as a goodness-of-fit measure. Linear regression was used in place of a simpler correlation measure because the coefficients for linear regression can include lagged time points, which allowed the regression model to capture the extended temporal response of fluorescence transients (Fig. 5B). A separate linear regression was performed for each trial variable, which did not take into account the correlations that exist among the trial variables. This approach is useful for determining how well latent variables are able to represent single trial variables, as opposed to mixing the influences from multiple trial variables (see below).

The resulting quantitative measures (Fig. 5B) were consistent with the example traces shown: latent variables 1, 2, and 4 are well predicted by the reward portion of the trial, latent variable 3 is well predicted by whisker touches, and latent variables 5 and 6, which do not have any discernible trial-locked patterns, are not well predicted by any of these four trial variables. With these quantitative measures, we can label each latent variable with the set of trial variables that best predict it. To do so, we required that 1) the R2 value using that trial variable was >0.10 and 2) the R2 value was greater than one-half the largest R2 value among all trial variables. If both these conditions were met, we considered the latent variable to be “driven” (although perhaps not exclusively) by that trial variable (Fig. 5C, top).

Another important question to address was how strongly each latent variable influenced the population response. First, we looked at how strongly a latent variable influenced the population by measuring the proportion of neurons driven by that latent variable. For each neuron we calculated the fraction of the neuron's activity explained by each latent variable (Fig. 5C, middle; see methods) and considered a neuron to be driven by that latent variable if the fraction exceeded 0.10 (Fig. 5C, bottom, red bars). We also looked at how each latent variable contributes to the overall proportion of predicted activity, to see if there were any differences between how the latent variables influenced measured vs. predicted responses. We computed a measure similar to that described above but calculated the fraction of the neuron's predicted (rather than measured) activity explained by each latent variable (Fig. 5C, bottom, blue bars; see methods). Both measures performed similarly and show that latent variable 1, which was identified with the reward portion of the trial (see above), affected the largest proportion of neurons; latent variable 3, which is the only latent variable identified with the stimulus, affected the third largest proportion; and latent variables 5 and 6, which are not identified with any trial variables, affected the smallest proportions of neurons.

A fundamental feature of the RLVM is its ability to identify “sparsely active” variables, which is enabled by the rectification imposed on latent variable activity. For example, a source of neural activity that is episodically active (such as whisker touches in this case) should mostly have a small magnitude (and explain little variance) except during these events (e.g., latent variable 3 in Fig. 5A). Without rectification, such solutions are challenging for latent variable methods to identify. To demonstrate this, we performed the same analyses as in Fig. 5 using PCA (Fig. 6, A and B). The latent variables inferred by PCA (Fig. 6A) do in fact contain features that are correlated with the trial variables, but these features were more mixed than in the RLVM latent variables. Indeed, latent variable 3 in PCA over the same period does respond to whisker touches but also has activity timed to the choice cue.

A similar observation holds for RLVM latent variables 1 and 2, which are associated with suppressive and excitatory activity during the reward phase, respectively (Fig. 5A). While the RLVM cleanly separates these two subpopulations, they are mixed together in the first principal component of PCA (Fig. 6B, middle; neurons ∼1–100 and ∼250–300, respectively). PCA mixes these two subpopulations because such a combination into a single principal component explains the greatest amount of variance in the data, and this combination is possible because PCA is not restricted to using nonnegative latent variables. These apparent mixtures of latent variables are also reflected in the coupling matrix between latent variable and neural activity (compare Fig. 6B, middle, to Fig. 4B).

To determine whether the nonnegativity constraint on the RLVM is responsible for the differences between the PCA and RLVM solutions, we fit the RLVM on the same data without constraining the latent variables to be nonnegative. The latent variables inferred by this nonrectified version of the RLVM were qualitatively similar to PCA's latent variables (Fig. 6C), and indeed this model's latent variables exhibit the same type of mixing as the PCA latent variables. This demonstrates that the RLVM's ability to separate these subpopulations of neurons is mainly due to the rectified nonlinearity and is not just an artifact of PCA's constraint that the latent variables must be uncorrelated.

This example also illustrates that—although the RLVM and PCA are able to explain the same amount of population activity (Fig. 4A)—the underlying latent variables can differ dramatically due to rectification (similar results were seen with FA; data not shown). This same result was seen in the simulated data, with both the comparison between the RLVM and PCA (Fig. 2A, left) and the comparison between the RLVM and nonrectified version of the RLVM. This suggests that—if population activity is indeed composed of nonnegative latent variables—the structure of the RLVM makes it a more appropriate method for studying neural population activity.

To demonstrate that the above results from the RLVM (Fig. 4 and Fig. 5) are consistent across different populations of neurons and different animals, we repeated these analyses using nine different populations of neurons from the S1 data set (see Table A1 in appendix for more detailed information). The nine populations contain anywhere from 356 to 831 neurons, and the prediction performance of the RLVM for each population is plotted in Fig. 7A. It is interesting to note that all of these curves mostly plateau before reaching 10 latent variables, despite the wide range in the number of neurons in these populations, a result that may be related to the complexity of the behavioral task (Gao and Ganguli 2015).

Fig. 7.

Fig. 7.

Consistent classifications of latent variables detected across experiments. A: R2 between the measured activity and the activity predicted by the RLVM for different imaged populations of neurons. Highlighted plot corresponds to the population of neurons analyzed in Figs. 46 and reproduces the RLVM values in Fig. 4A. Across experiments there is a similar dependence of R2 on the number of latent variables, although the overall magnitude of R2 values depends on the number of neurons and the noise level of each experiment. B, top: amount of variability accounted for by each “type” of latent variable across all 9 populations, using 6 latent variables per population (same measures as calculated in Fig. 5C, averaged across the number of latent variables of each type). Even though all imaged populations were located in primary somatosensory cortex, across experiments most of the neural activity was related to nontactile sources. Middle: latent variables are classified by the combination of trial variables each is related to (same criteria as used in Fig. 5C). Bottom: red bars indicate the total number of latent variables in each class (out of 54 total latent variables), and blue bars indicate the total number of populations that contain at least 1 example of the latent variable class (out of 9 total populations). Latent variable types identified with whisker touches (1, 5, and 8) comprise a smaller proportion of the latent variables than types identified with the reward portion of the trial (2–4, 6, and 7). Latent variables that were not identified with any trial variables (9) were present in every population and had an influence on the population activity comparable to the other latent variables.

To repeat the analyses in Fig. 5, we used six latent variables for each population (Fig. 7B). Values were calculated as before (Fig. 5C) and averaged over latent variables from all nine populations. This meta-analysis shows that the results from Fig. 4 and Fig. 5 broadly hold across different populations in different animals: the latent variables associated with the reward portion of the trial are found in all but one population and account for the largest proportion of the predicted activity in the populations; latent variables associated with the stimulus are found in the majority of populations; and variables that are not identified with any trial variables are found in all populations. Together, these results (Figs. 47) demonstrate the usefulness of the RLVM as a tool for studying population responses in cortical areas and suggest that latent variable models will be crucial to arriving at a deeper understanding of cortical function.

DISCUSSION

Recordings of the activity from large numbers of cortical neurons provide opportunities to gain insight into the underlying factors driving cortical activity. Given that there are fewer variables underlying the activity than the number of neurons being recorded, latent variable approaches provide a way to infer the time course of these underlying factors and their relationship to neural activity. Here we presented the RLVM, which is unique in that it places a constraint on the latent variables that is appropriate for neural activity, namely, that underlying factors are nonnegative (rectified). The RLVM can be fit without relying on a number of statistical assumptions characteristic of past latent variable models, such as the specification of particular distributions for the latent variables. The RLVM is robust to many aspects of data acquisition and model fitting (see Fig. A1) and scales well with increasing numbers of neurons and recording length (see Fig. A2).

The results from the simulated data experiments demonstrate that the RLVM is able to recover the true latent variables (Fig. 2A) as well as each neuron's coupling weights to those latent variables (Fig. 2B). This guarantees that the method is able to predict single neuron activity well (Fig. 2C) and thus implies that the method is able to accurately capture the structure of the pairwise correlation matrix (Fig. 2D). Importantly, results from the simulated data (Fig. 2 and Fig. 3) also show how the standard variants of PCA, FA, and ICA can recover erroneous model parameters when fitting nonnegative activity generated from nonnegative latent variables. The manner in which these methods fail is important to consider when using them to analyze and interpret nonnegative data such as two-photon fluorescence traces.

Our results on experimental data demonstrate the utility of the RLVM as a tool for addressing questions about the structure of joint responses in large neural populations. Some of the latent variables inferred by the RLVM have clear relationships with measured trial variables, suggesting potentially meaningful interpretations of these variables. We also demonstrated that the rectification in the RLVM leads to important distinctions in the description of the population activity compared with a method like PCA, which has consequences for further understanding the role these latent variables play in cortical function.

Relationships to Other Latent Variable Models

Latent variable models can be classified into two broad categories: static models, which do not take temporal dynamics of the latent variables into account, and dynamic models, which do. The RLVM has elements of both, although it is more directly comparable to static models like PCA, FA, and ICA. These models are also known as linear factor models, so termed because there is a linear transformation from latent variables to predicted activity. While this need not be the case in the general RLVM framework, the formulation of the RLVM for two-photon data uses this assumption as well [since f(.) in Eq. 1 is linear]. One advantage of the RLVM over these other linear factor models is that the RLVM does not specify any statistical constraints on the latent variables, which allows it to accurately capture correlated latent variables. Furthermore, because of the nonnegativity constraint on the latent variables, the RLVM is able to identify latent variables that more closely resemble the form of expected inputs into the cortex and does not have multiple equivalent solutions that arise from orthogonal transformations like some linear factor models.

There is a close relationship between the RLVM and PCA. If the nonlinearities f(.) and g(.) in Eq. 12 of the RLVM are linear, and the mean square error cost function is used, then the autoencoder solution of the RLVM lies in the same subspace as the PCA solution (Boulard and Kamp 1989). The only difference is that the components of the RLVM can be correlated, whereas PCA requires them to be uncorrelated. However, using nonlinear functions for the activity of the neurons f(.) and/or underlying latent variables g(.) allows the RLVM to capture more complex structure in the data than a linear model like PCA (Japkowicz et al. 2000).

The RLVM structure also contains elements of dynamic latent variable models, because of its ability to impose constraints on the time course of latent variables via the log-prior term log p(Z) in Eq. 2. We used a general smoothing prior when using the full MML algorithm (the results of which are shown in Fig. 5A), which allows latent variable values at time points t − 1 and t + 1 to influence the value at time t. This is similar to the smoothing prior of GPFA (Yu et al. 2009), which allows a latent variable value at time point t to have a more complex dependence on past and future time points. However, as the name implies, GPFA is based on FA and imposes similar statistical constraints on the latent variables that we avoided with the RLVM for reasons mentioned above. Another class of dynamic latent variable models are the state-space models (Paninski et al. 2010), which constrain each latent variable at time t to be a linear combination of all latent variables at time t − 1. Such models allow the dynamics to be fit to the data directly, whereas the RLVM specifies a fixed relationship between the time points in the dynamics model. State-space models allow one to model the causal relationship between latent variables but come at the expense of making a strong assumption about the form of that relationship (namely, that latent variables are only determined by their values at the previous time step). Which type of model is most appropriate might then depend on whether dynamics are generated by the latent variables (and/or observed neurons) or by processes extrinsic to the system, such as with the trial variables we considered here. For the applications to the two-photon data set in S1, we found that the solutions for the static and dynamic versions of the RLVM were similar, in part because of the simple dynamics model we imposed. However, the nature of two-photon data does not lend itself to more restrictive dynamics models (like the state-space models) because of the slow timescales. The investigation of more complex dynamics models in the RLVM is a direction for future work.

The analysis performed with the data set from Peron et al. (2015) demonstrates the ability of the RLVM to find latent variables that are correlated with individual task parameters. This “demixing” of task parameters is not an explicit goal of the method but rather results from the rectification of the latent variables (Fig. 5 and Fig. 6). dPCA (Kobak et al. 2016) is a dimensionality reduction technique that is explicitly formulated to find dimensions that capture variance related to individual task parameters and as such is a mix between supervised and unsupervised dimensionality reduction. The RLVM, in contrast, is a fully unsupervised method, as task parameter information is not used for model fitting. An important restriction of the dPCA method is that it requires neural activity that has been averaged over trial conditions of the same type, which prevents it from being used to address variability at the level of single trials. Another consequence is that dPCA cannot be used with continuous trial variables (because it requires averaging over similar trial types), and hence we were not able to compare the RLVM to dPCA on the somatosensory data set because of the uncontrolled nature of the stimulus presentation.

The RLVM is thus an unsupervised dimensionality reduction technique and has several advantages as a general method for interpreting population recordings relative to other latent variable approaches, as described throughout this article. However, there are particular situations in which other approaches may yield important insights over the RLVM. If one only wants to estimate the dimensionality of the data or visualize it in two or three dimensions, PCA may be a more appropriate choice. As described in the above paragraph, if one wants to visualize low-dimensional representations of activity that correspond to specific, discrete trial conditions, a targeted dimensionality reduction technique such as dPCA is most appropriate. Such a supervised approach is in contrast to the unsupervised approach we demonstrate here with the S1 data sets, although it is possible to incorporate such conditional dependencies into the RLVM framework, as discussed below.

Model Extensions

The RLVM is a flexible model framework because there are many possible extensions that can be incorporated, depending on the desired application area. For example, the RLVM can be fit to spiking data by using a negative log-likelihood cost function that assumes Poisson noise and specifying f(.) in Eq. 12 (the output nonlinearity) to be a rectifying function (see Fig. A4). This version of the RLVM then becomes comparable against a different set of dimensionality reduction approaches specific to spiking data such as Poisson PCA (Pfau et al. 2013), Poisson FA (Santhanam et al. 2009), and Poisson linear dynamical system (Macke et al. 2011), which also use a rectifying nonlinearity on the model output. However, like the other methods considered in this article, these methods do not place a rectifying nonlinearity on the latent variables, which would still be a defining feature of the RLVM.

One limitation of the RLVM is that it is only able to model additive interactions between the latent variables. Although there is evidence to support the existence of additive interactions in cortex (Arieli et al. 1996), and although they are commonly used for modeling (Cui et al. 2016; Okun et al. 2015; Schölvinck et al. 2015), there has been recent interest in modeling multiplicative interactions (Goris et al. 2014; Lin et al. 2015; Rabinowitz et al. 2015). It is possible to extend the RLVM to model nonadditive interactions by adding more hidden layers to the model. This approach effectively allows a neural network to transform the latent variables into the observed data in a nonlinear manner and is the basis of “stacked” autoencoders (Bengio et al. 2013), which we leave as a direction for future work.

For some analyses, it may not be desirable to have the effects of the stimulus represented in the activity of the latent variables. In this case it is possible to incorporate a stimulus model into the RLVM, such that the activity of each neuron is driven by the weighted sum of the latent variables plus terms that capture the stimulus response. This model formulation would then allow for the investigation of the relationship between stimulus processing and ongoing cortical activity. In a similar manner, the RLVM can also incorporate additional trial information, so that the inferred latent variables only capture variation in the population response that is independent of this information. Such a model is then more closely related to the dPCA approach discussed above.

The recent development of new recording technologies like high-density multielectrode arrays and two-photon microscopy is leading to increasingly large and rich neural data sets. We have shown here that the RLVM can be used effectively to analyze two-photon data sets and that it is also possible to apply this model to spiking data. The RLVM is thus a simple and extendable model that can be used to analyze both types of large population recordings, and in doing so can help uncover neural mechanisms that may not be obvious when studying the responses of single neurons.

GRANTS

This work was supported in part by National Institute for Deafness and Communicative Disorders Training Grant DC-00046 (M. R. Whiteway) and National Science Foundation IIS-1350990 (D. A. Butts).

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

ENDNOTE

At the request of the authors, readers are herein alerted to the fact that additional materials related to this manuscript may be found at the institutional website of one of the authors, which at the time of publication they indicate is: www.neurotheory.umd.edu/code. These materials are not a part of this manuscript, and have not undergone peer review by the American Physiological Society (APS). APS and the journal editors take no responsibility for these materials, for the website address, or for any links to or from it.

AUTHOR CONTRIBUTIONS

M.R.W. analyzed data; M.R.W. and D.A.B. interpreted results of experiments; M.R.W. prepared figures; M.R.W. drafted manuscript; M.R.W. and D.A.B. edited and revised manuscript; M.R.W. and D.A.B. approved final version of manuscript.

ACKNOWLEDGMENTS

The authors thank the Svoboda lab and the CRCNS database for providing the mouse barrel cortex data. We also thank J. McFarland for helpful discussions.

APPENDIX

The flexibility and accuracy of the RLVM in predicting population activity stems from the use of an autoencoder neural network to initialize its parameters and latent variables. Here we demonstrate the robustness of the autoencoder stage using simulated data generated over a range of parameters and model hyperparameters.

First, we tested the robustness to the amount and quality of data. We found that the autoencoder can accurately reconstruct the population activity even for experiments of relatively short duration (Fig. A1A) and low SNR in the fluorescence trace of each neuron (Fig. A1B). These results did not depend on precise choice of regularization parameters on the encoding and decoding weights (λ1 and λ2, respectively, in Eq. 17), which influence the final magnitudes of these weights. We found that the results obtained by the autoencoder are constant across several orders of magnitude for these parameters (Fig. A1C), suggesting that there is no need for precise tuning of optimization parameters to produce accurate results.

Fig. A1.

Fig. A1.

Sensitivity analysis of the autoencoder using simulated data. Data sets are generated as in Fig. 2 using varying numbers of latent variables. A–C: an autoencoder is fit to each data set using the correct number of latent variables. Plotted points represent the mean R2 value between the true and predicted population activity averaged over 20 such data sets; error bars are omitted for ease of interpretation. Plots show the result of varying the amount of data used for fitting (using 10 Hz sampling rate) (A); the signal-to-noise ratio of the data used for fitting (using 30 min of simulated data) (B); or the regularization parameter on the encoding and decoding weight matrices, which were constrained to be equal through weight-tying (again using 30 min of simulated data) (C).

Because experimental data sets are becoming ever larger, we also tested how the fitting time of the autoencoder scales with the amount of data. We found that the amount of time required to fit the RLVM parameters scaled linearly with both experiment length (Fig. A2, A and B) and number of neurons (Fig. A2, C and D), demonstrating that the autoencoder will not be constrained to small amounts of data.

Fig. A2.

Fig. A2.

Linear scaling properties of the autoencoder. A and B: data are generated as in Fig. 2 with 100 neurons and varying recording lengths (with a 10 Hz sampling rate). Autoencoders are fit with and without weight-tying (A and B, respectively). The fitting time scales roughly linearly with the experiment time. C and D: data are generated as in Fig. 2 with a 30-min experiment time and varying the number of neurons. Autoencoders are fit with and without weight-tying (C and D, respectively). The fitting time scales roughly linearly with the number of neurons. Comparison of A and C (weight-tying) with B and D (no weight-tying) shows that while weight-tying approximately halves the number of estimated parameters, it leads to >2-fold speedup in fitting time with a small number of latent variables. As the number of latent variables increases this speedup advantage from weight-tying is lost. Plotted values are mean fitting times ± SE over 20 data sets. These results were obtained on a desktop machine running Ubuntu 14.04 LTS with 16 Intel Xeon E5-2670 processors and 126 GB of RAM; the MATLAB implementation of the autoencoder has not been optimized for this particular architecture.

We also analyzed whether weight-tying (Bengio et al. 2013), where the encoding and decoding matrices are constrained to be transposes of each other, was beneficial. It has the desirable property that it can nearly halve the fitting time for models with a small number of latent variables (Fig. A2) but potentially overconstrains possible solutions. In fact, we found that weight-tying resulted in qualitatively similar solutions (Fig. A3), allowing for faster model-fitting times without affecting qualitative observations made about the encoding and decoding matrices.

Fig. A3.

Fig. A3.

Effect of weight-tying using simulated data. We compared the effects of weight-tying the autoencoder on the resulting weight matrix by fitting models with and without weight-tying to the simulated data (Fig. 2). A: weights learned by the autoencoder when encoding and decoding matrices are constrained to be the same. B: encoding (left) and decoding (right) weights learned by the autoencoder without the weight-tying constraint, demonstrating a pattern very similar to the weight-tied solution in A.

Finally, we demonstrate that the RLVM can be easily adapted to fit to spiking data (Fig. A4). To do so, we use a negative log-likelihood cost function that assumes Poisson noise and specify the output nonlinearity to be a rectifying function (Fig. A4). Thus the RLVM can be easily adapted to be used on a much wider variety of data sets.

Fig. A4.

Fig. A4.

Using the RLVM for spiking data. Left: coupling matrix used to generate synthetic data, as described in methods (reproducing Fig. 1C). Center: the estimated coupling matrix when the autoencoder variant of the RLVM is fit to the simulated 2-photon data with a Gaussian noise loss function (mean square error). Right: estimated coupling matrix when the autoencoder variant of the RLVM is fit to the simulated spiking data using a Poisson noise loss function (negative log-likelihood). The simulated data contained spikes binned at 100-ms resolution. The good agreement of both estimated coupling matrices with the true coupling matrix demonstrates that the RLVM can recover the same model parameters when fit using 2 different types of data. This result suggests that the RLVM will perform similarly on multielectrode data, without the need for data smoothing or averaging across trials (which are common preprocessing steps used with spiking data when attempting to use latent variable models not suited for discrete count data, such as PCA).

The experimental data used in this paper are from Peron et al. (2015), and detailed information about the subsets of data used for the analysis in Figs. 4–7 can be found in Table A1.

Table A1.

Experimental selection

Original Data
Analyzed Data
Animal ID Cell ID range ROI count Trial count ROI count Trial count
an229716 21000–23464 1,395 157 831 124
33000–35351 1,051 163 466 142
18000–20377 1,099 155 760 108
an229717 45000–47679 1,695 173 406 146
09000–11474 1,364 156 805 106
39000–41437 1,313 136 466 89
an229719 09000–11423 1,358 162 685 98
15000–17478 1,332 158 517 113
18000–20489 1,250 162 356 126

All experimental data used in Figs. 4–7 are from Peron et al. (2015) and are publicly available at http://dx.doi.org/10.6080/K0TB14TN. Data analysis was performed on subsets of the data that contained a large number of neurons simultaneously imaged over many trials (see methods). The Analyzed Data column shows the amount of data retained from the Original Data column after removal of neurons that had >50% missing values in their fluorescence traces or had an estimated SNR < 0.1, as well as removal of trials that had missing values for any of the remaining fluorescence traces. The row in boldface corresponds to the data set used for the analyses shown in Figs. 4–6.

REFERENCES

  1. Ahrens MB, Li JM, Orger MB, Robson DN, Schier AF, Engert F, Portugues R. Brain-wide neuronal dynamics during motor adaptation in zebrafish. Nature 485: 471–477, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Amarasingham A, Geman S, Harrison MT. Ambiguity and nonidentifiability in the statistical analysis of neural codes. Proc Natl Acad Sci USA 112: 6455–6460, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Archer EW, Köster U, Pillow JW, Macke JH. Low-dimensional models of neural population activity in sensory cortical circuits. In: Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, vol. 27, 2014. [Google Scholar]
  4. Arieli A, Sterkin A, Grinvald A, Aertsen A. Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses. Science 273: 1868–1871, 1996. [DOI] [PubMed] [Google Scholar]
  5. Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35: 1798–1828, 2013. [DOI] [PubMed] [Google Scholar]
  6. Bishop CM. Pattern Recognition. New York: Springer, 2006. [Google Scholar]
  7. Boulard H, Kamp Y. Autoassociative memory by multilayer perceptron and singular values decomposition. Biol Cybern 59: 291–294, 1989. [DOI] [PubMed] [Google Scholar]
  8. Churchland MM, Cunningham JP, Kaufman MT, Foster JD, Nuyujukian P, Ryu SI, Shenoy KV. Neural population dynamics during reaching. Nature 487: 51–56, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Churchland MM, Yu BM, Cunningham JP, Sugrue LP, Cohen MR, Corrado GS, Newsome WT, Clark AM, Hosseini P, Scott BB, Bradley DC, Smith MA, Kohn A, Movshon JA, Armstrong KM, Moore T, Chang SW, Snyder LH, Lisberger SG, Priebe NJ, Finn IM, Ferster D, Ryu SI, Santhanam G, Sahani M, Shenoy KV. Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nat Neurosci 13: 369–378, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nat Neurosci 14: 811–819, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cui Y, Liu LD, McFarland JM, Pack CC, Butts DA. Inferring cortical variability from local field potentials. J Neurosci 36: 4121–4135, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cunningham JP, Yu BM. Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17: 1500–1509, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. De Meo R, Murray MM, Clarke S, Matusz PJ. Top-down control and early multisensory processes: chicken vs. egg. Front Integr Neurosci 9: 17, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Doiron B, Litwin-Kumar A, Rosenbaum R, Ocker GK, Josić K. The mechanics of state-dependent neural correlations. Nat Neurosci 19: 383–393, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Freeman J, Vladimirov N, Kawashima T, Mu Y, Sofroniew NJ, Bennett DV, Rosen J, Yang CT, Looger LL, Ahrens MB. Mapping brain activity at scale with cluster computing. Nat Methods 11: 941–950, 2014. [DOI] [PubMed] [Google Scholar]
  16. Gao P, Ganguli S. On simplicity and complexity in the brave new world of large-scale neuroscience. Curr Opin Neurobiol 32: 148–155, 2015. [DOI] [PubMed] [Google Scholar]
  17. Ghazanfar A, Schroeder C. Is neocortex essentially multisensory? Trends Cogn Sci 10: 278–285, 2006. [DOI] [PubMed] [Google Scholar]
  18. Goris RL, Movshon JA, Simoncelli EP. Partitioning neuronal variability. Nat Neurosci 17: 858–865, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Haggerty DC, Ji D. Activities of visual cortical and hippocampal neurons co-fluctuate in freely moving rats during spatial behavior. Elife 4: e08902, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hara K, Saito D, Shouno H. Analysis of function of rectified linear unit used in deep learning. In: 2015 International Joint Conference on Neural Networks. Bandera, TX: International Neural Network Society 2015.
  21. Harris KD, Thiele A. Cortical state and attention. Nat Rev Neurosci 12: 509–523, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Japkowicz N, Hanson SJ, Gluck MA. Nonlinear autoassociation is not equivalent to PCA. Neural Comput 12: 531–545, 2000. [DOI] [PubMed] [Google Scholar]
  23. Kato S, Kaplan HS, Schrödel T, Skora S, Lindsay TH, Yemini E, Lockery S, Zimmer M. Global brain dynamics embed the motor command sequence of Caenorhabditis elegans. Cell 163: 656–669, 2015. [DOI] [PubMed] [Google Scholar]
  24. Kobak D, Brendel W, Constantinidis C, Feierstein CE, Kepecs A, Mainen ZF, Qi XL, Romo R, Uchida N, Machens CK, van Rossum MC. Demixed principal component analysis of neural population data. Elife 5: e10989, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Koch I. Analysis of Multivariate and High-Dimensional Data. Cambridge, UK: Cambridge Univ. Press, 2013. [Google Scholar]
  26. Köster U, Sohl-Dickstein J, Gray CM, Olshausen BA. Modeling higher-order correlations within cortical microcolumns. PLoS Comput Biol 10: e1003684, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kulkarni JE, Paninski L. Common-input models for multiple neural spike-train data. Network 18: 375–407, 2007. [DOI] [PubMed] [Google Scholar]
  28. Lakshmanan KC, Sadtler PT, Tyler-Kabara EC, Batista AP, Yu BM. Extracting low-dimensional latent structure from time series in the presence of delays. Neural Comput 27: 1825–1856, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature 401: 788–791, 1999. [DOI] [PubMed] [Google Scholar]
  30. Lin IC, Okun M, Carandini M, Harris KD. The nature of shared cortical variability. Neuron 87: 644–656, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Macke JH, Buesing L, Cunningham JP, Yu BM, Shenoy KV, Sahani M. Empirical models of spiking in neural populations. In: Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, vol. 24, 2011. [Google Scholar]
  32. Marguet SL, Harris KD. State-dependent representation of amplitude-modulated noise stimuli in rat auditory cortex. J Neurosci 31: 6414–6420, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. McFarland JM, Cui Y, Butts DA. Inferring nonlinear neuronal computation based on physiologically plausible inputs. PLoS Comput Biol 9: e1003143, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Niell CM, Stryker MP. Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65: 472–479, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ohiorhenuan IE, Mechler F, Purpura KP, Schmid AM, Hu Q, Victor JD. Sparse coding and high-order correlations in fine-scale cortical networks. Nature 466: 617–621, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Okun M, Steinmetz NA, Cossell L, Iacaruso MF, Ko H, Barthó P, Moore T, Hofer SB, Mrsic-Flogel TD, Carandini M, Harris KD. Diverse coupling of neurons to populations in sensory cortex. Nature 521: 511–515, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Otazu GH, Tai LH, Yang Y, Zador AM. Engaging in an auditory task suppresses responses in auditory cortex. Nat Neurosci 12: 646–654, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Pachitariu M, Lyamzin DR, Sahani M, Lesica NA. State-dependent population coding in primary auditory cortex. J Neurosci 35: 2058–2073, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Paninski L. Maximum likelihood estimation of cascade point-process neural encoding models. Network 15: 243–262, 2004. [PubMed] [Google Scholar]
  40. Paninski L, Ahmadian Y, Ferreira DG, Koyama S, Rahnama Rad K, Vidne M, Vogelstein J, Wu W. A new look at state-space models for neural data. J Comput Neurosci 29: 107–126, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Peron SP, Freeman J, Iyer V, Guo C, Svoboda K. A cellular resolution map of barrel cortex activity during tactile behavior. Neuron 86: 783–799, 2015. [DOI] [PubMed] [Google Scholar]
  42. Pfau D, Pnevmatikakis EA, Paninski L. Robust learning of low-dimensional dynamics from large neural ensembles. In: Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, vol. 26, 2013. [Google Scholar]
  43. Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chichilnisky EJ, Simoncelli EP. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454: 995–999, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Rabinowitz NC, Goris RL, Cohen M, Simoncelli EP. Attention stabilizes the shared gain of V4 populations. Elife 4: e08998, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Rasch MJ, Gretton A, Murayama Y, Maass W, Logothetis NK. Inferring spike trains from local field potentials. J Neurophysiol 99: 1461–1476, 2008. [DOI] [PubMed] [Google Scholar]
  46. Santhanam G, Yu BM, Gilja V, Ryu SI, Afshar A, Sahani M, Shenoy KV. Factor-analysis methods for higher-performance neural prostheses. J Neurophysiol 102: 1315–1330, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440: 1007–1012, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Schölvinck ML, Saleem AB, Benucci A, Harris KD, Carandini M. Cortical state determines global variability and correlations in visual cortex. J Neurosci 35: 170–178, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Schultz W, Carelli RM, Wightman RM. Phasic dopamine signals: from subjective reward value to formal economic utility. Curr Opin Behav Sci 5: 147–154, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Semedo J, Zandvakili A, Kohn A, Machens CK, Yu BM. Extracting latent structure from multiple interacting neural populations. Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, vol. 27, 2014. [Google Scholar]
  51. Shuler MG. Reward timing in the primary visual cortex. Science 311: 1606–1609, 2006. [DOI] [PubMed] [Google Scholar]
  52. Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Comput 15: 965–991, 2003. [DOI] [PubMed] [Google Scholar]
  53. Stopfer M, Jayaraman V, Laurent G. Intensity versus identity coding in an olfactory system. Neuron 39: 991–1004, 2003. [DOI] [PubMed] [Google Scholar]
  54. Vidne M, Ahmadian Y, Shlens J, Pillow JW, Kulkarni J, Litke AM, Chichilnisky EJ, Simoncelli E, Paninski L. Modeling the impact of common noise inputs on the network activity of retinal ganglion cells. J Comput Neurosci 33: 97–121, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Vinck M, Batista-Brito R, Knoblich U, Cardin JA. Arousal and locomotion make distinct contributions to cortical activity patterns and visual encoding. Neuron 86: 740–754, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Vogelstein JT, Packer AM, Machado TA, Sippy T, Babadi B, Yuste R, Paninski L. Fast nonnegative deconvolution for spike train inference from population calcium imaging. J Neurophysiol 104: 3691–3704, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. J Neurophysiol 102: 614–635, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society

RESOURCES