Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 May 18;16(5):e1007614. doi: 10.1371/journal.pcbi.1007614

Stimulus-choice (mis)alignment in primate area MT

Yuan Zhao 1, Jacob L Yates 2, Aaron J Levi 3, Alexander C Huk 3, Il Memming Park 1,*
Editor: Daniele Marinazzo4
PMCID: PMC7259805  PMID: 32421716

Abstract

For stimuli near perceptual threshold, the trial-by-trial activity of single neurons in many sensory areas is correlated with the animal’s perceptual report. This phenomenon has often been attributed to feedforward readout of the neural activity by the downstream decision-making circuits. The interpretation of choice-correlated activity is quite ambiguous, but its meaning can be better understood in the light of population-wide correlations among sensory neurons. Using a statistical nonlinear dimensionality reduction technique on single-trial ensemble recordings from the middle temporal (MT) area during perceptual-decision-making, we extracted low-dimensional latent factors that captured the population-wide fluctuations. We dissected the particular contributions of sensory-driven versus choice-correlated activity in the low-dimensional population code. We found that the latent factors strongly encoded the direction of the stimulus in single dimension with a temporal signature similar to that of single MT neurons. If the downstream circuit were optimally utilizing this information, choice-correlated signals should be aligned with this stimulus encoding dimension. Surprisingly, we found that a large component of the choice information resides in the subspace orthogonal to the stimulus representation inconsistent with the optimal readout view. This misaligned choice information allows the feedforward sensory information to coexist with the decision-making process. The time course of these signals suggest that this misaligned contribution likely is feedback from the downstream areas. We hypothesize that this non-corrupting choice-correlated feedback might be related to learning or reinforcing sensory-motor relations in the sensory population.

Author summary

In sensorimotor decision-making, internal representation of sensory stimuli is utilized for the generation of appropriate behavior for the context. Therefore, the correlation between variability in sensory neurons and perceptual decisions is sometimes explained by a causal, feedforward role of sensory noise in behavior. However, this correlation could also originate via feedback from decision-making mechanisms downstream of the sensory representation. This cannot be resolved by analyzing single unit responses, but requires a population level analysis. Area MT contains both sensory and choice information and is known to be the key sensory area for visual motion perception. Thus the decision-making process may be corrupting the sensory representation. However, we find that the sensory stimuli and choice variables are separate at the population level, contradicting the previous interpretations based on single unit recordings. This new insight postulates how neural systems can maintain a mixed representation while allows learning and adaptation.

Introduction

Sensory cortical neurons exhibit substantial variability to repeated presentations of the same stimulus [1, 2]. This variability depends on the specifics of the sensory stimulus and task being performed [37], and is often correlated with the trial-by-trial perceptual report of the animal [811]. This trial-by-trial correlation between neural responses and perceptual reports, often quantified as choice probability (CP), has long been of interest for its potential to reveal the mechanisms by which downstream areas read out the response of relevant population of sensory neurons [1214]. However, this interpretation is complicated by the presence of interneuronal correlations [15], top-down feedback [9, 16] and also depends on assumptions about the readout mechanisms of downstream brain areas [12, 14, 16, 17].

Several models of perceptual decision-making have been proposed to explain the empirical relationships between stimuli, neural responses, and behavioral choices [12, 14, 16]. Existing proposals come in two basic flavors: those that posit an optimal readout that is limited by shared neural variability [14, 18, 19] and those that assert that choice-related feedback modifies the signals in sensory areas [16, 20]. Several recent experimental results support the feedback hypothesis [7, 9, 20]. Although feedback can be interpreted in terms of probabilistic inference [16], the resulting pattern of variability in sensory areas will reduce the information about the stimulus [16, 19, 21] and impair performance on the task [20]. Why would the brain bother to feedback a choice or decision that corrupts the sensory information and make it do worse on the task? Here, we propose an alternative hypothesis: the feedback can be non-corrupting, effectively multiplexing choice signals in a sensory population without diminishing information about the stimulus.

To visualize the space of hypotheses and how they can be distinguished, it is helpful to summarize the joint activity of a population of neurons with respect to the stimulus driven activity. Fig 1 demonstrates this alignment conceptually and the effect of each type of choice models in this space. Specifically, for a population of only two neurons, the joint activity of the population can be represented as point clouds in a 2-dimensional space where each axis represents an individual neuron’s activity (Fig 1A). For a 1-dimensional stimulus (as is typically used in discrimination paradigms), different values of the stimulus (red and black) drives activity that falls along a 1-dimensional “stimulus axis”. For simplicity, in this toy example, we assume there is no structure in the co-variability that can be exploited for decoding. In this case, increased variability along the stimulus axis (the so-called information-limiting noise) will change the amount of information about the stimulus, while, importantly, variability orthogonal to the stimulus axis will not [19, 2225]. We call this variability direction the “non-stimulus axis” (Fig 1A). In larger populations, the stimulus may reside in a subspace of higher dimension, however, we can use statistical classification methods to determine the stimulus subspace and non-stimulus subspaces in general.

Fig 1. Hypotheses on the sources of choice correlations in sensory area.

Fig 1

(A) Joint activity of the population. The point cloud represents neuronal activities colored by stimulus direction. The neural space can be divided into stimulus and non-stimulus axes. (B) Noise correlation is any elongation of the joint activity point cloud for repeats of the same stimulus. (C) Optimal readout. The optimal decision boundary is a criterion line orthogonal to the stimulus axis. All CP is due to readout and there is no CP in the non-stimulus axis. (D) Suboptimal readout. The decision boundary is not orthogonal to the stimulus axis. CP exist in both axes. (E) Corrupting feedback. The choice is fed back and pushes variability along the stimulus axis. This increases CP along the stimulus axis without affecting the non-stimulus axis, and causes more variability along the stimulus axis. (F) Non-corrupting feedback. Feedback pushes choice information in the non-stimulus axis and increases CP in the non-stimulus axis without adding CP in the stimulus axis.

By realigning the population activity to the “stimulus axis”, the effect of noise correlations and feedback can be visualized clearly. Noise correlation (or trial-to-trial co-variability) is the joint activity distribution for repeats of the same stimulus (Fig 1B). The downstream decision making process may solely rely on the MT activity on the stimulus-axis to generate behaviors or may also utilize the signal from the non-stimulus axis as well. Meanwhile, the downstream behavior-relevant neural activity that builds up over time can be fed back to MT on either axis. The combination gives rise to the following four hypotheses (corresponding to Fig 1C, 1D, 1E and 1F):

  1. The classic feedforward hypothesis: the decision process optimally reads out MT population, and noise correlation in MT induces CP greater than chance (i.e., 0.5) along the stimulus axis. In this model, all CP in MT is due to readout and there is no CP in the non-stimulus axis.

  2. The suboptimal readout hypothesis: an alternative feedforward mechanism that reads out MT population information such that it inherits the variability in the non-stimulus axis giving rise to above-chance CP in the non-stimulus axis (Fig 1D).

  3. The corrupting feedback hypothesis [16, 20, 21]: the choice is fed back along the stimulus axis. If the feedback is positively signed, this increases the measured CP and causes more variability along the stimulus axis without affecting the non-stimulus axis, and may bias the performance on the task for weak stimuli (Fig 1E).

  4. The non-corrupting feedback hypothesis: feedback could avoid interfering with the stimulus representation by pushing choice information only in the non-stimulus axis (Fig 1F). This increases CP in the non-stimulus axis without adding CP in the stimulus axis and does not influence the optimal stimulus readout.

To test these different hypotheses requires an analysis of the joint statistics of populations of sensory neurons while subjects perform a discrimination task. Here, we apply the recent developments in statistical dimensionality reduction of single-trial population recordings [26] to examine how information about the stimulus and choice are encoded jointly in small populations of simultaneously recorded MT neurons during perceptual reports about integrated motion direction [27]. The effects of stimulus, choice, and trial-to-trial variability present in the population activity are decomposed into shared low-dimensional latent factors and noise that is private to each neuron. Unsurprisingly, low-dimensional shared signals capture a majority of the variability in these data as seen previously in other areas [6, 26, 28, 29]. By aligning the latent signals to the stimulus and task variables, we were able to investigate how stimulus and choice are encoded by neurons collectively.

We found that the task variable (visual motion) was primarily captured by a single latent factor, indicating that the high-dimensional visual stimulus was represented in a low-dimensional, task-relevant manner across the MT population. Additionally, we found that the choice-correlated variability in the population was mainly captured by the latent subspace orthogonal to the task dimension and appears slowly during the stimulus presentation. These results suggest the choice signal is fed back to sensory cortex in the null space of the stimulus—multiplexing choice signals in sensory areas without corrupting information about the stimulus. This feedback signal could be critical for adapting sensory representations while learning new tasks or in non-stationary environments [30, 31].

Materials and methods

Electrophysiology, task, and behavioral data

Data were recorded from three adult rhesus macaque monkeys (two males, P & L, and one female, N) performing perceptual decision-making task for multiple sessions (P: 9, L: 13, N: 10) as reported in [27, 32] (with additional sessions added). Spike trains from area MT were obtained via linear electrode arrays. All procedures were performed in accordance with US National Institutes of Health guidelines, were approved by The University of Texas at Austin Institutional Animal Care and Use Committee. Briefly, each trial began with a fixation at the center of the monitor. Then the visual stimuli were presented to the monkey. The monkey must hold fixation during the presentation (Fig 2). In each trial, 7 consecutive motion pulses were presented to the monkeys, each lasting 150 ms. Each motion pulse consisted of a hexagonal grid of drifting or flickering Gabor patches. The strength (controlled by the number of drifting patches) and direction of each pulse was randomly drawn from Gaussian distribution and rounded. The monkeys made a saccade to one of the two targets after the fixation point disappear. Rewards were given for making a correct choice if the total sum of motion pulses was greater in the corresponding direction or at random with probability 0.5 on the zero sum trials. We keep the recordings from 100 ms before the visual stimuli onset to 350 ms after the visual stimuli offset. We analyze sessions with at least 10 neurons in order to extract latent factors (for a total of 14 sessions). Length of sessions ranged from 245 to 1000 good trials.

Fig 2. Experimental setup: Motion discrimination task.

Fig 2

Trials started with a fixation at the center of the monitor. 7 consecutive motion pulses were presented to the monkey while monkeys hold the fixation. Each motion pulse consists of drifting and flickering Gabor patches and lasts 150 ms. Random, signed motion strength was determined by changing the proportion of drifting vs. flickering Gabors patches. Monkeys reported their choice depending on the net direction by making a saccade to one of the two choice targets after the fixation point to disappear.

Single-trial latent dynamics of population

To understand how stimulus and perceptual choice are encoded across the population, we employed the variational latent Gaussian process (vLGP) method [26] to extract single-trial low-dimensional latent factors from population recordings in area MT. We used the recording of the period from 100 ms before stimulus onset to 350 ms after offset, and binned the spike counts at 1 ms resolutions. Let xk denote the k-th dimension of the latent factors. We assumed that the spatial dimensions of latent factors are independent and imposed a Gaussian Process (GP) prior to the temporal correlation of each dimension,

xkN(0,K). (1)

To obtain smoothness, we used the squared exponential covariance function and respective covariance matrix K in the case of discrete time. Let ytn denote the occurrence of a spike of the nth neuron at time t, ytn = 1 if there was a spike at time t and ytn = 0 otherwise at this time resolution. Then yt is the vector of length N, total number of neurons in a session, that concatenates all neurons at time t. The spikes yt are assumed to be a point-process generated by the latent state xt at that time via a linear-nonlinear model,

ytPoisson(exp(Axt+b)). (2)

To infer the latent factors (xt for each trial) and the model parameters (A and b), we used variational inference technique, as the pair of prior and likelihood do not have an tractable posterior. We assumed parametric variational posterior distribution of the latent factors,

q(xk)=N(μk,Σk). (3)

We analyze the mean {μk} as the latent factors in this study. The detail of inference is described in the supplementary materials S1 Text. The dimensionality of the latent factors was determined to be 4 by leave-one-neuron-out cross-validation on the session with the largest population (3). All the sessions with more than 10 simultaneously recorded units were included in this study.

Pulse-triggered average

To measure the relationship between the time-varying pulse strength and the inferred latent factors, we measured the contribution of pulses to the latent factors. The pulse-triggered average (PTA) measures the change in latent factors resulting from an additional pulse at a particular time of unit strength. To calculate the PTA, we used the pulse stimulus and latent response at 1 ms resolution. For each session, let si denote the value of the i-th motion stimulus, and let xtk denote the k-th dimension of the latent factors at time t. All trials were concatenated such that the latent factors X is a matrix of length T × 4, where T is the total time. For the i-th pulse, si is the number of Gabors pulsing, with si > 0 for pulses in one direction and si < 0 for pulses in the other direction. To calculate the temporal lags of the PTA, we built design matrices, D = [D1, D2, …, D7]. For the i-th pulse, the design matrix Di is a T × 28 matrix that consists of 4 cosine basis functions at the 4i + 1, 4i + 2, …, 4i + 4-th columns and 0 elsewhere. These basis functions starts at 0 ms, 50 ms, 100 ms and 150 ms after the onset, lasts 100 ms each and spans the rows of Di. The magnitude of the bases is equal to the corresponding pulse value si. We calculated a separate Di for each of the seven pulses and concatenated them to obtain a design matrix for all seven pulses and estimated the weights with 2-regularization,

X=DW+EW=argminWXDW22+γW22 (4)

where W is the weight matrix to estimate and E is the Gaussian noise matrix and the regularization hyperparameter γ was chosen by the generalized cross-validation (GCV) [33]. The PTA was calculated with the design matrices of unit-strength pulse and the estimated weights W. We smoothed the PTA with a temporal Gaussian kernel (40 ms kernel width).

Subject to arbitrary rotations, a latent trajectory forms an equivalence class of which the members have the same explanatory power in the vLGP model. We seek a particular rotation for each session that makes the encoded task signal concentrate in the first few dimensions. By singular value decomposition, W = USV, we rotate the factors x to Ux.

Choice decoder

Since there were some recording sessions with less than ideal number of frozen trials (identical visual motion trials) for the calculation of choice probability, we instead analyzed the “weak” trials of which the monkeys’ correct rate was below a threshold (65%). We started at the trials of zero pulse coherence and gradually increased the magnitude of coherence (absolute value) until the correct rate reached the threshold. One of the sessions containing less than 100 weak trials was excluded in this analysis.

We removed the stimulus directions that are encoded in the latent factors and raw population activity of weak trials by regressing out the pulses and analyzed the residuals. The latent factors and population activity were re-binned at 100 ms resolution where the value of each bin is the sum of latent state xt or spike counts yt over the bin for t = 1, 2, …, T. For each t, we assumed a linear model to predict its value

xt=i=17wtisi+e, (5)

where si denote the strength of the i-th pulse, wti is the weight vector corresponding to the bin and pulse, and e is the homogeneous Gaussian noise across all bins. We estimated the weight vector by least-squares with 2-regularization to prevent over-fitting,

wti=argminwtixti=17wtisi22+γwti22. (6)

Again, the hyperparameter of regularization was chosen by GCV. For the raw population activity, we did the same regression, replacing xt with the spike count yt. We then analyzed the contribution of behavioral choice on the residuals

rt=xti=17wtisi. (7)

For the whole trial we used the sum residual of the windows r = ∑t rt. The range of t depends on the period of interest.

We trained logistic models, to which we refer to as choice decoders, to predict the subject’s choice on each trial using either latent factors or population responses. The weights β and bias β0 were estimated by maximum likelihood with 2-regularization,

β,β0=argmaxβ,β0logL(choicer;β,β0)γβ,β022. (8)

Due to small sample sizes, the hyperparameter of regularization was chosen via 3-fold stratified (balanced classes in test set) cross-validation for every session individually.

Choice mapping

The conventional choice probability only applies to univariate variables. However, either the latent factors or population activity is multivariate. We transformed the multivariate variables mentioned above onto a one-dimensional subspace that has the same direction as the choice through the choice decoders,

c=11+eβrβ0 (9)

We refer to the transform as the choice mapping. The quantity c is a normalized value within [0, 1] that maps the residual onto the choice direction [34], and enables aggregation across sessions as well.

In order to prevent potential inflation of choice probability due to high dimensionality (3D), we did not only regularized the choice decoder to estimate the weights but also use the choice mapped values of the test set (pooled samples held-out by cross-validation). This approach guarantees that the overfitting of choice decoder will not result in overestimating the choice probability. The synthetic example (S1 Fig) also verified that adding choice-irrelevant dimensions does not inflate the choice probability.

We pooled these mapped values and aggregate them across all sessions. By plugging different dimensions of latent factors or population activity as r in the mapping, we obtained the choice-mapped values of the stimulus-dimension, non-stimulus-dimensions of latent factors and the whole population. With these mapped values, we calculated the choice probability of the corresponding dimensions.

To investigate the effect of different dimensions on the choice, we did sequential likelihood ratio tests through adding the choice-mapped value of stimulus-dimension, non-stimulus-dimensions and the population one by one to a logistic model that predicts the choice,

LR1=L(choicecstimulus)L(choicecstimulus,cnon-stimulus)LR2=L(choicecstimulus,cnon-stimulus)L(choicecstimulus,cnon-stimulus,cpopulation) (10)

where the values of cstimulus, cnon-stimulus and cpopulation were calculated by Eq 9 with the response r (Eq 7) and weights {β0, β} estimated via Eq 8 of corresponding axes.

To investigate the time course of choice probabilities, we used choice decoders to perform choice-mapping on the whole data with a 100 ms non-overlapping moving window. The choice decoders were fitted to early (200–500 ms), middle (600–900 ms) and late (1000–1300 ms) periods of non-stimulus latent factors, and regularized with cross-validation mentioned above. The choice probabilities of all time bins were then calculated on the choice-mapping using the three decoders individually.

Results

Low-dimensional shared variability structure

Three monkeys performed a motion-pulse direction discrimination task with an eye movement to one of two targets [32]. The visual stimulus was presented as a sequence of 7 temporally coherent motion pulses of varying strength. An ensemble of MT neurons were simultaneously recorded using multi-electrode arrays. Given the recording, we statistically infer a low-dimensional latent factors that explains the shared component of the high-dimensional variations in the observed spiking activity. Conventional analysis methods such as factor analysis or principal component analysis assume either observation models inappropriate for spikes (e.g. Gaussian) or linear dynamics that lack expressive power to describe any non-trivial computation. To overcome these disadvantages, we imposed a general (nonlinear) Gaussian process prior on the latent factors and assumed a point-process observation model to account for spikes. The generative model was fit using the variation latent Gaussian process (vLGP) method to recover nonlinear smooth latent factors from population recordings [26]. Fig 3A shows the scheme of the model and an example trial. The population firing rates are driven by the latent factors through a linear-nonlinear cascade. The loading matrix linearly maps the high-dimensional observation space to the low-dimensional latent space, of which the rows corresponding to the neurons and the columns corresponding to the latent factor dimensions. The extracted latent factors captured the shared variability of the population activity, while the individual variability of each neuron was explained by stochastic generation of spike trains. The dimensionality of latent factors was chosen to be 4 by a leave-one-out cross-validation scheme on the session with the largest population (N = 21 neurons). To aggregate analysis across sessions, we fixed this dimensionality of the latent factors.

Fig 3. Probabilistic description of a single trial using variational latent Gaussian process method and resulting noise correlation.

Fig 3

(A; top) Simultaneously recorded spike trains of the MT units in an example trial aligned to stimulus onset (yt in Eq 2). (A; bottom) Corresponding 4-dimensional factors. The rank-4 matrix multiplication of the loading matrix (matrix A in Eq (2)), and latent factors are exponentiated to produce the population rate. The loading matrix is rotated to maximize stimulus encoding (see Fig 4), so that the first column has the strongest stimulus response. The inferred latent factors (xt in Eq (2)) are colored to indicate the respective factors corresponding to the loading matrix. (B) The pairwise noise correlation matrices (neuron by neuron) for the sessions with frozen trials (trials with identical stimulus). The lower triangles are the correlations calculated from the raw data, and the upper triangles are the correlations from the reconstruction by the inferred 4-dimensional latent factors. Time bin size 100 ms.

To validate the model, we evaluate the pairwise noise correlations between neurons on randomly interleaved frozen trials where the stimulus was held constant (Fig 3B). With the inferred latent factors and loading matrix, we can generate spike trains from the model. We calculated the noise correlation matrices from data and reconstructed spikes. To quantify the goodness of the model capturing the noise correlation, we calculated the R21CdataCmodelF2/CdataF2 where C(⋅)’s are the correlation matrices corresponding to the data and model. The resulting values of the sessions with frozen trials were 0.952 on average (s.e.m. 0.007). The results show that the extracted latent factors captures well the co-variability of the population with only 4 dimensions. To compare with linear models, we performed PCA on the raw spike train with 100 ms bins. S4 Fig shows the cumulative explaining power of the PCs. The R2 of the first 4 PCs are −0.877 on average (s.e.m. 0.303). The negative predictive R2 values show that 4 PCs are not adequate to capture the shared variability of neural activity.

Stimulus-encoding is concentrated in one shared dimension

In previous work, MT neurons showed strong transient responses on average to motion pulses [27]. We ask if the individual MT responses to visual stimulus are aligned at the population level. To describe the temporal dependence of the latent factors on the motion pulses, we calculated the pulse-triggered average (PTA) for each of the seven pulses [27]. The PTAs are the regression coefficients that predict the change in latent states. Specifically, each PTA corresponding to one of the 7 motion pulses represents the modulation of latent factors by a unit visual motion (a single patch of Gabor drifting in one direction during a pulse), assuming a linear scaling with motion strength (see Materials and methods).

The latent factors are subject to arbitrary rotation [26] which results in models with equivalent explanatory power. Hence, we rotated the latent factors for each session so that the effects of motion pulses are concentrated in decreasing order across dimensions (Fig 4A). For both subjects, the pulses are faithfully represented as transiently modulated latent factors, and most of the motion information is encoded in the mean value of the first factor—we refer to this factor as the stimulus axis.

Fig 4. Visual motion pulse information encoded in one dimension.

Fig 4

(A) Pulse-triggered average of three example sessions, one from each monkey, are shown. The factors are rotated such that most of the stimulus power is in the first factor. They visualize the weights of pulses on the latent factors that were estimated from respective sessions. The color gradient indicates the seven pulses of visual motion stimuli. Each pulse last 150ms. (B) The power of each factor that explains the variation contributed by the stimuli to the factors. Each marker indicates one session, the shape indicates the animal and the color indicate the respective example sessions in (A).

We pooled the stimulus-explaining latent factors alignment across all sessions. The first dimension explains most (> 90%) of the PTA in the latent factors for all but one session (Fig 4B). This concentration of stimulus information in 1-dimension is consistent with the canonical view of MT as primarily a sensory area. Since the sensory stimulus is 1-dimensional (directional motion with different strength), this suggests that the encoding of MT units is temporarily uniform (without multiple time scales of adaptation or lag) and linear (no nonlinear superposition). Note that this is not a trivial result, since the motion information can be encoded in a curved 1-dimensional manifold that spans multiple dimensions in the neural space [35].

Sensory and choice population codes are misaligned in MT

Next, we investigate how the downstream choice signal is aligned with respect to the stimulus axis. There are several possibilities that the choice-correlations can manifest in the MT population activity (Fig 1). To optimally perform the task, the choice should rely only on the stimulus and ignore the off-axis “noise” [17]. Hence, for a purely feed-forward system, only the noise in the stimulus dimension should influence the choice, resulting in choice-correlation reflecting the optimal strategy (Fig 1C). Otherwise sub-optimal “readout” can show choice-correlation through stimulus-irrelevant variability (Fig 1D). On the other hand, feedback paths can mix the downstream choice process signals back into the MT representation: if the feedback is aligned with the stimulus-axis, it will corrupt the encoding of the sensory signal (Fig 1E), while misaligned feedback that stays orthogonal to the continuous stream of stimulus modulated population activity subspace (Fig 1F).

To investigate the effect of different axes on the choice, we calculated the choice probability of the recorded neural population after mapping the multivariate activity to choice through choice-mapping (Fig 5; see Materials and methods). The pooled choice probability estimated using the choice-mapped stimulus-axis, non-stimulus-axes (the 3-dimensional subspace orthogonal to the stimulus-axis), and all 4 dimensions of the MT latent factors are 0.546 (s.e.m. across sessions: Monkey N 0.013, Monkey P 0.007 and Monkey L 0.012), 0.591 (s.e.m. across sessions: Monkey N 0.009, Monkey P 0.020 and Monkey L 0.027), and 0.621 (s.e.m. across sessions: Monkey N 0.013, Monkey P 0.012, Monkey L 0.011) respectively (Fig 6). The estimated population spike count choice probability is 0.627. To verify that the pooling across sessions (Fig 5, stage 4) does not weaken the choice information, we compare the pooled model with models of individual sessions as a baseline. The likelihood ratios of full models of individual sessions to the full model of pooled sessions is between 0 and 1 because the log-likelihoods are always negative and the pooled model is at most as good as the individual session models. Among the 10 sessions, the likelihood ratio ranges between 0.92 and 0.99, and the average is 0.98. These indicate that the pooling over sessions keeps most of the choice information. We confirmed that the higher dimensionality of the non-stimulus subspace does not result in inflating choice probability (see Material and methods, and S1 Fig). We have further verified that the CP on the stimulus axis defined by the logistic regression weights from the raw spike counts to the stimulus is much smaller (stim.: 0.53, nonstim.: 0.61), consistent with the nonlinear latent factor results.

Fig 5. Data analysis pipeline and nested model comparison.

Fig 5

(1) Extract latent factors. (2) Align latent factors to stimulus & null dimensions. (3) Map dimensions of latent factors into real-valued scalars. (4) Pool the choice-mapping over all sessions and perform nested log-likelihood tests.

Fig 6. Choice probabilities of latent factors for each monkey.

Fig 6

Contours corresponds to 50%, 90%, 99% quantities of the choice-mapped stimulus and non-stimulus trial distribution. The IN choice distribution (red-shade contours) is biased upward, indicating existence of the choice information in the non-stimulus axes. The pooled choice probability estimated using the choice-mapped stimulus-axis, non-stimulus-axes (the 3-dimensional subspace orthogonal to the stimulus-axis), and all 4 dimensions of the latent factors are 0.546, 0.591, and 0.621 respectively. The estimated population spike count choice probability is 0.627. For nested statistical tests of the corresponding regression models, see main text and Fig 5.

To determine the major contribution among regressors on choice, we performed nested likelihood ratio tests by adding the choice-mapped value of stimulus-axis, non-stimulus-axes, and the population, one by one (Fig 5). The choice is significantly correlated with the latent non-stimulus subspace (p < 2.2 × 10−16), which indicates that the choice axis is not perfectly aligned with the stimulus axis as the optimal readout or corrupting feedback models suggest. Therefore, our analysis supports representation of choice information in the non-stimulus latent subspace. This misalignment of stimulus axis and choice axis can occur through either non-optimal readout (Fig 1D) or non-corrupting feedback (Fig 1F). The misalignment between choice and stimulus in MT provides evidence for a feedback source of choice information in sensory neurons. The presence of CP orthogonal to the stimulus axis suggests that choice information is not just a result of noise on the sensory response, but rather arises from another process altogether.

Time course of choice probability indicates feedback of decision-making process to MT

The misalignment between choice and stimulus in MT suggests a feedback source for choice-correlated activity, but could still be explained by suboptimal readout. Debates based on models and arguments in the literature have yet to resolve issue of feedforward versus feedback choice correlations in area MT [16, 20, 3638]. To disambiguate the two, we investigate the temporal profile of choice probability. Behavioral analysis showed that the sensory information immediately after its presentation has a strong influence in the choice [27]. In turn, one would expect to see choice information early in the population activity. If the choice information is only present late in the trial, then we can conclude that the feedback from the downstream decision-making process is contributing to the misaligned choice information we observed in the previous section.

To investigate the temporal profile of choice correlation in the non-stimulus axes, we calculated time course of CP. We fit 3 linear choice decoders to the latent non-stimulus axes during the early (200–500 ms), middle (600–900 ms) and late (1000–1300 ms) periods, and then used them to decode the whole period with a 100 ms moving window. Fig 7 shows that the middle and late decoders start climbing late during the visual motion presentation and reach a peak at around the motion stimulus was terminated. This temporal profile is consistent with a choice variable that accumulates sensory evidence [12], and supports the non-corrupting feedback from the decision-making process. On the other hand, the early decoder shows a constant choice probability throughout the motion presentation period (Fig 7) which could represent a per-trial choice bias. These observations suggest that the choice information resides in more than one dimension within the non-stimulus subspace.

Fig 7. Time course of choice probability in the latent stimulus subspace, latent non-stimulus subspace and neuron population space suggests feedback from the decision-making process.

Fig 7

Decoders were fit to early (yellow), middle (red), and late (purple) periods (300 ms, marked by the colored bars) of non-stimulus latent factors to predict choice. We used the resulting weights of the decoders to perform choice-mapping on the whole time interval divided into 100 ms non-overlapping moving windows (aligned at the center). The colored curves correspond to the choice probability time course using the respective decoder.

Discussion

To understand how stimulus and perceptual choice information is represented across the population of MT neurons, we take advantage of recent developments in unsupervised statistical approaches to single-trial population analyses (Fig 8). We use a Bayesian inference framework, vLGP [26], to demix the visual stimulus, the reported choice, and the trial-to-trial variability signals presented in the population activity. In contrast to other demixing methods [39, 40], our approach does not require trial-averaging to obtain conditional mean firing rates nor a fixed trial structure necessary for the averaging. Analyzing the trial-averaged responses is convenient as simultaneous recording is not necessary, however, it comes with strong assumptions about the neural code: the noise correlation between neurons are ignored and only the differences in the conditional mean firing rates are assumed to carry useful information. Note that our analysis heavily relies on the marginal correlations (shared variations) present in the simultaneous recording. Moreover, only a single-trial analysis can reveal choice probability.

Fig 8. Four possible sources of neural-choice correlation.

Fig 8

1-dimensional stimulus drive to MT is picked up as population variability along with other noise correlations denoted x1(t), x2(t), x3(t). To optimally perform the task, the choice should rely on only the stimulus dimension, and hence noise in x1 shows up as CP in relevant units reflecting their ‘readout’ strategy (case 1). Non-optimal readout can provide CP through stimulus-irrelevant variability (case 3). Alternatively, feedback from the decision-making process to MT can provide choice-correlation in the stimulus-irrelevant subspace (case 4) without corrupting the optimal representation or the stimulus driven shared dimension (case 2) causing non-optimal behavior.

We found that the low-dimensional latent factors capture the majority of the variability present in the population recordings. By linearly aligning the latent factors to the stimulus and behavioral choice, we were able to investigate how stimulus and choice are shared across neurons. Although we assumed a linear readout [41, 42], note that the optimal nonlinear readout may change this interpretation [43]. Within the space spanned by the latent factors, we found that the sensory task variable was primarily captured by a single latent factor, indicating that high-dimensional visual stimulus was represented in a low-dimensional, task-relevant manner across the MT population and time. This stimulus-dimension was only partially explained by the stimulus, suggesting a presence of information-limiting noise [19, 22, 41, 42] that may strongly influence the animal’s choice. However, the choice-correlated variability in the population of was mainly captured in latent subspace orthogonal to the stimulus-encoding dimension. This orthogonality suggests that either the downstream decision circuit used suboptimal readout from MT response (Fig 8, path 3) or the feedback from downstream circuit was non-corrupting (Fig 8, path 4). Further analysis of the time course of choice probability revealed a slow and late rise, supporting the feedback mechanism rather than the readout [20, 37]. The non-corrupting feedback of choice formation to MT can be useful for tuning of receptive fields and learning of optimal readouts in relation to the task context.

Supporting information

S1 Text. Variational latent Gaussian processes.

(PDF)

S1 Fig. Choice mapping does not inflate choice probability.

(PDF)

S2 Fig. CP (pseudo frozen trial) of latent factors for each monkey.

(PDF)

S3 Fig. Time courses of CP (pseudo frozen trials).

(PDF)

S4 Fig. Cumulative explaining power of principal components of raw spike trains.

(PDF)

S5 Fig. Visual motion pulse information encoded in one dimension of raw spike trains.

(PDF)

S6 Fig. Time course of choice probability in the stimulus subspace of raw spike trains.

(PDF)

Acknowledgments

We thank the anonymous reviewers for their helpful comments. Memming thanks Hendrikje Nienborg for stimulating discussions.

Data Availability

All data and code can be reached at https://doi.org/10.6084/m9.figshare.11413182.v1.

Funding Statement

YZ, ACH and IMP were supported by NSF IIS-1734910. National Science Foundation, www.nsf.gov. ACH was supported by NEI EY017366. National Eye Institute, nei.nih.gov. JLY and AJL were supported by National Institutes of Health under Ruth L. Kirschstein National Research Service Awards T32DA018926 from the National Institute on Drug Abuse and T3EY021462 from the National Eye Institute. JLY was supported by T32EY007125 from the National Eye Institute. JLY is an Open Philanthropy fellow of the Life Sciences Research Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Tolhurst DJ, Movshon JA, Dean AF. The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Research. 1983;23(8):775–785. [DOI] [PubMed] [Google Scholar]
  • 2. Goris RLT, Ziemba CM, Stine GM, Simoncelli EP, Movshon JA. Dissociation of Choice Formation and Choice-Correlated Activity in Macaque Visual Cortex. The Journal of Neuroscience. 2017;37(20):5195–5203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Smith FV, Bird MW. The relative attraction for the domestic chick of combinations of stimuli in different sensory modalities. Animal Behaviour. 1963;11(2-3):300–305. [Google Scholar]
  • 4. Zylberberg A. Neurophysiological bases of exponential sensory decay and top-down memory retrieval: a model. Frontiers in Computational Neuroscience. 2009;3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ponce-Alvarez A, Thiele A, Albright TD, Stoner GR, Deco G. Stimulus-dependent variability and noise correlations in cortical MT neurons. Proceedings of the National Academy of Sciences. 2013;110(32):13162–13167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Cohen MR, Newsome WT. Context-Dependent Changes in Functional Circuitry in Visual Area MT. Neuron. 2008;60(1):162–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Bondy AG, Haefner RM, Cumming BG. Feedback determines the structure of correlated variability in primary visual cortex. Nature Neuroscience. 2018;21(4):598–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Britten KH, Newsome WT, Shadlen MN, Celebrini S, Movshon JA. A relationship between behavioral choice and the visual responses of neurons in macaque MT. Visual Neuroscience. 1996;13(01):87–100. [DOI] [PubMed] [Google Scholar]
  • 9. Nienborg H, Cumming BG. Decision-related activity in sensory neurons reflects more than a neuron’s causal effect. Nature. 2009;459(7243):89–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Uka T, DeAngelis GC. Linking Neural Representation to Function in Stereoscopic Depth Perception: Roles of the Middle Temporal Area in Coarse versus Fine Disparity Discrimination. Journal of Neuroscience. 2006;26(25):6791–6802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Gu Y, Liu S, Fetsch CR, Yang Y, Fok S, Sunkara A, et al. Perceptual Learning Reduces Interneuronal Correlations in Macaque Visual Cortex. Neuron. 2011;71(4):750–761. 10.1016/j.neuron.2011.06.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Shadlen MN, Newsome WT. Neural Basis of a Perceptual Decision in the Parietal Cortex (Area LIP) of the Rhesus Monkey. Journal of Neurophysiology. 2001;86(4):1916–1936. [DOI] [PubMed] [Google Scholar]
  • 13. Gold JI, Shadlen MN. Representation of a perceptual decision in developing oculomotor commands. Nature. 2000;404(6776):390–394. [DOI] [PubMed] [Google Scholar]
  • 14. Pitkow X, Liu S, Angelaki DE, DeAngelis GC, Pouget A. How Can Single Sensory Neurons Predict Behavior? Neuron. 2015;87(2):411–423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Haefner RM, Gerwinn S, Macke JH, Bethge M. Inferring decoding strategies from choice probabilities in the presence of correlated variability. Nature Neuroscience. 2013;16(2):235–242. [DOI] [PubMed] [Google Scholar]
  • 16. Haefner RM, Berkes P, Fiser J. Perceptual Decision-Making as Probabilistic Inference by Neural Sampling. Neuron. 2016;90(3):649–660. [DOI] [PubMed] [Google Scholar]
  • 17. Panzeri S, Harvey CD, Piasini E, Latham PE, Fellin T. Cracking the Neural Code for Sensory Perception by Combining Statistics, Intervention, and Behavior. Neuron. 2017;93(3):491–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Zohary E, Shadlen MN, Newsome WT. Correlated neuronal discharge rate and its implications for psychophysical performance. Nature. 1994;370(6485):140–143. [DOI] [PubMed] [Google Scholar]
  • 19. Moreno-Bote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A. Information-limiting correlations. Nature Neuroscience. 2014;17(10):1410–1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wimmer K, Compte A, Roxin A, Peixoto D, Renart A, Rocha Jdl. Sensory integration dynamics in a hierarchical network explains choice probabilities in cortical area MT. Nature Communications. 2015;6:6177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Lange RD, Haefner RM. Characterizing and interpreting the influence of internal variables on sensory activity. Current opinion in neurobiology. 2017;46:84–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nature Reviews Neuroscience. 2006;7(5):358–366. [DOI] [PubMed] [Google Scholar]
  • 23. Nogueira R, Peltier NE, Anzai A, DeAngelis GC, Martínez-Trujillo J, Moreno-Bote R. The Effects of Population Tuning and Trial-by-Trial Variability on Information Encoding and Behavior. The Journal of Neuroscience. 2019;40(5):1066–1083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Yates JL, Katz LN, Levi AJ, Pillow JW, Huk AC. A simple linear readout of MT supports motion direction-discrimination performance. Journal of Neurophysiology. 2020;123(2):682–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Kaufman MT, Churchland MM, Ryu SI, Shenoy KV. Cortical activity in the null space: permitting preparation without movement. Nat Neurosci. 2014;17(3):440–448. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Zhao Y, Park IM. Variational Latent Gaussian Process for Recovering Single-Trial Dynamics from Population Spike Trains. Neural Computation. 2017;29(5):1293–1316. [DOI] [PubMed] [Google Scholar]
  • 27. Yates JL, Park IM, Katz LN, Pillow JW, Huk AC. Functional dissection of signal and noise in MT and LIP during decision-making. Nature Neuroscience. 2017;20(9):1285–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Yu BM, Cunningham JP, Santhanam G, Ryu SI, Shenoy KV, Sahani M. Gaussian-Process Factor Analysis for Low-Dimensional Single-Trial Analysis of Neural Population Activity. Journal of Neurophysiology. 2009;102(1):614–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Graf ABA, Kohn A, Jazayeri M, Movshon JA. Decoding the activity of neuronal populations in macaque primary visual cortex. Nature Neuroscience. 2011;14(2):239–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Engel TA, Chaisangmongkon W, Freedman DJ, Wang XJ. Choice-correlated activity fluctuations underlie learning of neuronal category representation. Nat Commun. 2015;6:6454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Levi AJ, Yates JL, Huk AC, Katz LN. Strategic and Dynamic Temporal Weighting for Perceptual Decisions in Humans and Macaques. eNeuro. 2018;5(5):ENEURO.0169–18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Katz LN, Yates JL, Pillow JW, Huk AC. Dissociated functional significance of decision-related activity in the primate dorsal stream. Nature. 2016;535(7611):285–288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Golub GH, Heath M, Wahba G. Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics. 1979;21(2):215–223. [Google Scholar]
  • 34. Lueckmann JM, Macke JH, Nienborg H. Can Serial Dependencies in Choices and Neural Activity Explain Choice Probabilities? The Journal of Neuroscience. 2018;38(14):3495–3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Gao P, Trautmann E, Yu BM, Santhanam G, Ryu S, Shenoy K, et al. A theory of multineuronal dimensionality, dynamics and measurement. bioRxiv. 2017. [Google Scholar]
  • 36. Shadlen M, Britten K, Newsome W, Movshon J. A computational analysis of the relationship between neuronal and behavioral responses to visual motion. The Journal of Neuroscience. 1996;16(4):1486–1510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Cumming BG, Nienborg H. Feedforward and feedback sources of choice probability in neural population responses. Curr Opin Neurobiol. 2016;37:126–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Crapse TB, Basso MA. Insights into decision making using choice probability. Journal of Neurophysiology. 2015;114(6):3039–3049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Kobak D, Brendel W, Constantinidis C, Feierstein CE, Kepecs A, Mainen ZF, et al. Demixed principal component analysis of neural population data. Elife. 2016;5 10.7554/eLife.10989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Aoi MC, Mante V, Pillow JW. Prefrontal cortex exhibits multi-dimensional dynamic encoding during decision-making. bioRxiv. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Kafashan M, Jaffe A, Chettih SN, Nogueira R, Arandia-Romero I, Harvey CD, et al. Scaling of information in large neural populations reveals signatures of information-limiting correlations. bioRxiv. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Bartolo R, Saunders RC, Mitz AR, Averbeck BB. Information-Limiting Correlations in Large Neural Populations. The Journal of Neuroscience. 2020;40(8):1668–1678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Park IM, Pillow JW. Bayesian Spike Triggered Covariance Analysis. In: Advances in Neural Information Processing Systems (NIPS); 2011. p. 1692–1700. Available from: http://papers.nips.cc/paper/4411-bayesian-spike-triggered-covariance-analysis.pdf.
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007614.r001

Decision Letter 0

Daniele Marinazzo

21 Jan 2020

Dear Dr. Park,

Thank you very much for submitting your manuscript "Stimulus-choice (mis)alignment in primate MT cortex" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors applied a nonlinear statistical method to extract latent factors from MT neurons during a decision making task. Then, they analyzed the dimensionality of the stimulus and choice encoding axes. Overal, I found the paper to be interesting and general sound. I have some technical concerns, as well as some confusing aspects of the figures. These concerns should be readily addressable.

1. A major claim of this study relates to feedback signals and choice. However the timeline of the task is not clear to me. Please provide a schematic that shows the timeframe of when the stimulus appears, each of the 7 motion pulses, the subject's response, and when the putative feedback signal is detected. Please also indicate in what time frames the spiking data is analyzed. On pg 11, line 300 you state that you decode using time windows up to 1300ms, but I dont have any reference for what those times mean. Additionally in Figure 3, the horizontal axis is labeled with "time" but no scale is ever given.

2. Fig2. The noise correlation matricies are confusing to me. Is each matrix for a single session? There are 2 sessions for monkey N, and 2 for monkey P, and 7 for monkey L? What is the point of this panel? Is it to show that the model captures the noise correlations? If so, please quantify it instead of visual comparison. If the point is that the noise correlations differ between sessions, again, please quantify it. With respect to concern (1), the timeframe for the latent trajectory goes out to 1500 ms, how does this relate to task structure?

3. (Minor) Fig3. I believe you plotted the factors from three trials, one from each monkey. But the wording of the legend was confusing to me. You might want to change "Three example trials from each monkey are shown" to "Three example trials, one from each monkey, are shown." Again here, what is the scale of time?

4. (Minor) Fig 3. Bottom Right, the label says "stimulus oriented factors" but I think it would be more accurate to say "latent factors, ordered by stimulus orientation" or something like that.

5. (Minor) Fig 3. What does each marker from each monkey mean? Is it the average across all trials from that session? Please clarify.

5. Fig 4. The text states "the IN choice distribtuion ... is biased upwards." Really? For Monkey L that looks true, but not Monkey N. Your main point is that the CPs on the non-stim axis are > 0.5. In your computations of the CPs for each set of subspaces, please report the variability across sessions and across monkeys.

6. Fig 4. How much does your result depend on your nonlinear encoding model? For example, could you arrive at the same conclusion by using a linear method to find the stimulus encoding dimension and computing the choice probability in that dimension vs other dimensions of the data? Please comment on the use of a nonlinear method, or provide a comparison with a linear method.

7. I believe you included "weak stimulus" trials so you could have more data. When you did this you controlled the accuracy rate to be below 65%. I believe this means you did not match the number of hits and misses for these weak stimulus trials. Is it possible that this introduces a correlation between stimulus and choice? Please comment, or ideally, repeat the analysis by randomly removing hit trials until the accuracy rate is 50% (which would match the behavior on zero coherence trials).

8. Please compare your methods and results for this recent paper, which appears to develop a similar method and comes to similar conclusions. https://www.biorxiv.org/content/10.1101/808584v1.full.pdf

Reviewer #2: The authors find a low-dimensional representation of population recordings in MT while animals performed a sensory discrimination task. This low-dimensional representation defines the “stimulus-axis”. They also define different choice axis and find that those orthogonal to the stimulus-axis have a stronger correlation with animals’ choice than the one defined by the stimulus-axis. They claim that this finding provides evidence in favor of sub-optimal read-out with feedback as the origin of choice-related signals in sensory areas. This could be a useful strategy for transmitting feedback signals without harming the neural code.

The manuscript is an interesting piece of work and it represents a relevant contribution to the field. However, the clarity of the introduction, presented results, and methods, is not high and the accessibility could be improved in some sections. I have also additional concerns about specific points in the methodology and results and how they support some of the claimed conclusions, see below.

Major

1. The methods used to address the main scientific question of this study are sound and suited for it. However, a simpler and more straightforward approach to this question should be explored. The authors should compare the direction of two classifiers: one that has been trained on predicting stimulus identity and another that has been trained on predicting animal’s choice. Ideally, the choice classifier should be trained on non-evidence trials (or weak stimulus conditions where stimulus-driven activity has been regressed out). Finding similar results with this simpler analysis, would imply strong evidence in favor of the main finding of this study.

2. When describing Fig. 1 in the Introduction, the authors say (line 42): “In this space, ...”. This statement is slightly misleading because they are neglecting the effect of the covariance matrix on the optimal read-out. If the population’s covariance is not isotropic, the optimal read-out is (Gaussian approximation) proportional to the inverse covariance matrix times the “stimulus axis”. This happens throughout the text, and given that this is central to the motivation of this work, the authors should clarify why they refer to the optimal read-out as parallel to the stimulus-axis and modify the text and figures accordingly.

3. The authors should clarify why the stimulus axis is defined upon the first variable of the 4D latent model. Couldn’t it also be defined directly as the direction joining the mean of the two distributions? Or even better, the direction defined by a classifier trained on decoding stimulus identity?

4. The order the different panels of Fig. 1 appear in the text (Introduction) is a bit chaotic. It is also a bit unclear what are exactly the different hypothesis. Even though in the “Sensory and choice …” section it is mentioned again, the authors should be more explicit about the different hypothesis (i.e. i) …, ii) …, iii) …, etc.) and how they are related to Fig. 1 already in the Introduction. The panels should also ordered in order of appearance in this section.

5. The latent model validation seems to have been performed poorly. In Fig. 2B it is only visually shown how the structure of neuronal correlations is similar to those generated by the latent model. The authors should add a figure or table with the cross-validated explained variance of the model for all monkeys. Fig. 2B should also include a quantification of the similarity between the two matrices.

6. It is unclear why the number of latent variables is set to 4. The authors should specify why their criterion was based on the session with the largest number of neurons. Is it consistent across all monkeys? Is 4 the optimal number for only that session, for few of them, or for most of them?

7. Comparing the influence of the stimulus-axis (1-dimensional) and the non-stimulus space (3-dimensional) is somehow unfair because 3 dimensions will in general be more informative than 1 dimension. It would be strong evidence in favor of the hypothesis of this study to show that the CP for the stimulus axis is smaller than the CP for each non-stimulus dimension evaluated individually. The authors should address this point, and in case of a negative result, explain how this can be consistent with their hypothesis.

8. Fig. 6 is confusing. Why is the stimulus-axis CP (time profile) not shown in this figure? The fact that curves purple and red peak later in the trial is expected because of the way the classifiers were trained. I am a bit skeptical that Fig. 6 shows indeed evidence in favor of the main finding of this study. Using these curves as well as the stimulus-axis CP time profile, the authors should be more clear on why Fig. 6 is provides evidence in this regard.

Minor

- In Fig. 1 it would be helpful to also depict how information is affected for each different scenario (panel). Please add some graphical clarification in this regard.

- Please, add a paragraph where it is briefly explained how the feedback signals can affect CP. Readers might not be fully familiar with that literature and it would make the text easier to follow.

- Even though the behavioral protocol is explained in ref. 24 and 29, it would improve the clarity and readability of the manuscript to describe the experiment in more detail in the methods section. The reader should be able to get a good picture of the dataset without the need to read refs. 24 and 29. For instance, the “Pulse-triggered average” section in the methods would be clearer with more detailed description of the stimulus presented to the monkeys.

- In the Methods section some of the equations are numbered and others are not. It would improve the clarity of the manuscript to have all numbered.

- In the “Single-trial latent dynamics of population” section (Methods), the vLGP model is poorly explained. For readers that are unfamiliar with variational inference it would be useful to have an explicit expression for the loglikelihood of the model as well as the ELBO term to be maximized. It should also be stated what term in the ELBO is going to be approximated by the equation in line 112.

- There is a typo in line 118: “the” shouldn’t be there.

- Labels and figures (everything) in Fig. 2 are too small and therefore very difficult to read. They should be made larger.

- The “Pulse-triggered average” section is unclear. It should be re-written to make it more easier to follow. In line 140, Beta hasn’t been defined (is it a typo for W?).

- The term “frozen trials” is important for this study and is used throughout the manuscript. However, it is never properly defined. Please, add a sentence both in the methods and the results section defining the term.

- The last section of the Methods should be rewritten in a clearer way. Even tough in the results section it is visited again, it is conceptually a fundamental part of the study and it should be more explicitly described already in the Methods. In particular, it is not well stated how are c_stimulus and c_nonstimulus calculated.

- Fig. 4 is difficult to understand. The authors should add a more detailed explanation on both the results section and the Figure caption. Specifically they should make sure they answer: what is the distribution plotted for each monkey? How does this figure relate to Fig. 1? How does each panel relate graphically to its shown CP? Also, text is difficult to read, the size should be increased.

- For point 2 of the major revisions, Averbeck et al. Nat. Rev. Neuro. (2006) and Nogueira et al., J. of Neuro (2019) could be cited given their relevance to this topic.

- In line 15, Kafashan et al. (2020) bioRxiv2020.01.10.902171; and Bartolo et al. J. of Neurosci. (2020) 2072-19 are also very relevant studies in this regard.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007614.r003

Decision Letter 1

Daniele Marinazzo

5 Apr 2020

Dear Dr. Park,

We are pleased to inform you that your manuscript 'Stimulus-choice (mis)alignment in primate MT cortex' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

At the same time, please make sure to address the few style recommendations suggested by the reviewers, which you can find below.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

Daniele Marinazzo

Deputy Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for their response and updates to the manuscript. The methods and conclusion of the paper are more clear, and are sound. I support publication of this manuscript.

One minor change is that they use the abbreviation for area MT without defining it first in the abstract, although they do say "middle temporal area", they should change it to "middle temporal (MT) area".

Reviewer #2: The authors have thoroughly revised the manuscript and have increased significantly the clarity of their main message. It is easier to follow for a broader audience and it fits well within the existing literature. I have been answered most of my concerns with additional analysis and figures that make this study stronger and more robust.

Minor:

- Sentence 413 – 417 is difficult to follow. I would rewrite it or split it in two.

- Fig. 6 is mentioned in the text before Fig. 5 (line 328).

- y axis in Fig. S1 seems too short on the upper part.

- Text in Fig. S2 difficult to read due to small font.

- y label is missing in Fig. S3 and S6.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007614.r004

Acceptance letter

Daniele Marinazzo

11 May 2020

PCOMPBIOL-D-19-02195R1

Stimulus-choice (mis)alignment in primate area MT

Dear Dr Park,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Variational latent Gaussian processes.

    (PDF)

    S1 Fig. Choice mapping does not inflate choice probability.

    (PDF)

    S2 Fig. CP (pseudo frozen trial) of latent factors for each monkey.

    (PDF)

    S3 Fig. Time courses of CP (pseudo frozen trials).

    (PDF)

    S4 Fig. Cumulative explaining power of principal components of raw spike trains.

    (PDF)

    S5 Fig. Visual motion pulse information encoded in one dimension of raw spike trains.

    (PDF)

    S6 Fig. Time course of choice probability in the stimulus subspace of raw spike trains.

    (PDF)

    Attachment

    Submitted filename: 20200322_PLOSCB_MT_response_to_reviewers.pdf

    Data Availability Statement

    All data and code can be reached at https://doi.org/10.6084/m9.figshare.11413182.v1.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES