Skip to main content
PLOS One logoLink to PLOS One
. 2021 Jul 30;16(7):e0251647. doi: 10.1371/journal.pone.0251647

Scalable and accurate method for neuronal ensemble detection in spiking neural networks

Rubén Herzog 1,*, Arturo Morales 2, Soraya Mora 3,4, Joaquín Araya 1,5, María-José Escobar 2, Adrian G Palacios 1, Rodrigo Cofré 6
Editor: Jonathan David Touboul7
PMCID: PMC8323916  PMID: 34329314

Abstract

We propose a novel, scalable, and accurate method for detecting neuronal ensembles from a population of spiking neurons. Our approach offers a simple yet powerful tool to study ensemble activity. It relies on clustering synchronous population activity (population vectors), allows the participation of neurons in different ensembles, has few parameters to tune and is computationally efficient. To validate the performance and generality of our method, we generated synthetic data, where we found that our method accurately detects neuronal ensembles for a wide range of simulation parameters. We found that our method outperforms current alternative methodologies. We used spike trains of retinal ganglion cells obtained from multi-electrode array recordings under a simple ON-OFF light stimulus to test our method. We found a consistent stimuli-evoked ensemble activity intermingled with spontaneously active ensembles and irregular activity. Our results suggest that the early visual system activity could be organized in distinguishable functional ensembles. We provide a Graphic User Interface, which facilitates the use of our method by the scientific community.

Introduction

Donald Hebb predicted more than 70 years ago that ensembles would naturally arise from synaptic learning rules, where neurons that fire together would wire together [1]. However, despite the long history of this idea, only recently the simultaneous recordings and computational analysis from hundreds of cells have turned out to be possible [2]. Recent advances in recording technology of neuronal activity combined with sophisticated methods of data analysis have revealed significant synchronous activity between neurons at several spatial and temporal scales [35]. Theses groups of neurons that have the tendency to fire together known as neuronal ensembles (also called cell assemblies) are hypothesized to be a fundamental unit of neural processes and form the basis of coherent collective brain dynamics [1, 68].

The idea that the functional units of neural processes are groups of neurons firing together (not single neurons) represented a paradigm shift in the field of computational neuroscience [6]. Neuronal ensembles have been proposed as a fundamental building block of the whole-brain dynamics, and relevant to cognitive functions, in particular, as ensemble activity could implement brain-wide functional integration and segregation [3].

Large-scale neuronal recordings techniques such as multi-electrode arrays (MEA) or calcium imaging, allow for the recording of the activity of hundreds and even thousands of neurons simultaneously [5, 912]. These recent technological advances provide a fertile ground for analysing neuronal ensembles and investigating how collective neuronal activity is generated in the brain. Recent studies using multi-neuronal recording techniques have revealed that a hallmark of population activity is the organization of neurons into ensembles, generating new insights and ideas about the neural code [9, 1315]. In particular, the activation of specific ensembles has been shown to correlate with spontaneous and stimuli evoked brain function [16]. The brain-wide alterations present in neurological and mental impairments disrupt neural population activity and therefore affect the neuronal ensembles. Indeed, neuronal ensembles are susceptible to epileptic seizures and schizophrenia as shown in in vivo two-photon calcium imaging data in mouse [17, 18], in medically-induced loss of consciousness in mice and human subjects [19] and in a mice model of autism [20].

However, identifying and extracting features of ensembles from high-dimensional spiking data is challenging. Neuronal ensembles have different sizes and have different activity rates. Some neurons may not participate in ensemble activity, while others may participate in many, and not all neurons within an ensemble fire when the ensemble is active. Ensembles can exhibit temporal extension, overlap, or display a hierarchical or modular organization, making it difficult to distinguish between them [2].

In fact, there is a wide variety of techniques and ideas that have been used to detect and interpret neuronal ensembles (see [13, 21, 22] for reviews), for example, previous works have applied different methodologies such as principal component analysis [2326], correlation between neurons [2628], correlation between population vectors [16], statistical evaluation of patterns [2932] and non-negative matrix factorization [25, 33]. However, most of these methods are computationally expensive, and require tuning several parameters, which hinder their application by the scientific community.

These divergences on the definition of neural ensembles have been addressed by Carrillo and Yuste [34], where they define quantitatively neural ensembles as population vectors whose dimension is the size of the population. Here, we follow their definition and consider as population vector the binary response of the population on a given time bin. Also, we add the specification that a neural ensemble should have a core, a group of neurons whose temporal activity is significantly correlated with the temporal activity of the ensemble. On the opposite, non-core cells are neurons that can participate in the ensemble or not, but their correlation with the activation times of the ensemble is not statistically significant. Further, the core of an ensemble is the group of neurons that best represent the activity of the ensemble, so their presence indicates a functional coupling between ensembles and neurons. Note that, in general, the temporal activity of the ensemble cannot be determined by the sole activity of its core-cells, rather, it is a functional whole built up from core-cells and the recruitment of non-core cells.

Accordingly, we develop a method based on dimensionality reduction, clustering, and non-parametric statistical tests to detect neural ensembles on simultaneous neuronal recordings. We use Principal Component Analysis to project population vectors on a low-dimensional space where they can be reliably clustered. Once clustered, a non-parametric statistical test is performed to detect the core-cells, fulfilling our definition of ensembles as multi-dimensional population vectors with a core.

Moreover, we tested and validated our method using biological and simulated data, showing its accuracy and broad applicability to different scenarios recreated by synthetic data with diverse characteristics. Our tests on simulated data shows a remarkable detection performance for the ensemble number, the ensemble activation times, and the core-cells detection over a wide range of parameters.

Besides, we exemplify our method’s functionality on spiking neuronal data recorded using MEA from mouse in vitro retinal patch, from which the spiking activity of hundreds of retinal ganglion cells (RGCs) is obtained during a simple ON-OFF stimuli, i.e. consecutive changes between light and darkness. In brief, the vertebrate retina is part of the central nervous system composed of thousand of neurons of several types [3537], organized in a stratified way with nuclear and plexiform layers [38]. This neural network has the capability to process several features of the visual scene [39], whose result is conveyed to the brain through the optic nerve, a neural tract composed mainly by the axons of the RGCs. In fact, the physiological mechanisms involved in many of those processes are starting to become clear with the development of new experimental and computational methods [37, 38, 40]. One of the most remarkable, and simple, example of retinal processing is the ON-OFF responses of RGCs, a stereotypical increase or decrease in firing rate responding to changes light intensity. In this case, the connectivity between RGCs and Bipolar cells (Bc) in the Inner Plexiform Layer (IPL) plays a major role, determining the tendency of RGC to preferentially fire when the light increased, decreased, or both [38]; this property is often called polarity [41], and represents the broadest functional classification of RGCs into ON, OFF, and ON-OFF cell types.

We hypothesize that retinal ensembles may also exhibit this property as a whole. Our analysis suggests the existence of diverse ON and OFF retinal ensembles with a specific stimulus preference as functional units, which may implicate that a stimulus tuning preference is a property of the ensembles as a whole, and not a simple inheritance from their corresponding core-cells.

To facilitate our method’s use by the community, we provide a Graphic User Interface and the codes that implement our algorithm that aim to provide a fast, scalable, and accurate solution to the problem of detecting neuronal ensembles in multi-unit recordings.

Materials and methods

Ethics statement

Regarding retinal data, animal manipulation and breeding and corresponding experiments were approved by the bioethics committee of the Universidad de Valparaiso, in accordance with the bioethics regulation of the National Agency for Research (ANID, Ex-CONICYT) and international protocols.

Ensemble detection

We consider binary spike trains arranged in a matrix SN×T where N corresponds to the number of neurons and T the number of time bins. The entries of the matrix denoted by Sn,t are equal to one if the n-th neuron is active on the t-th time bin, and zero otherwise. At each time bin t, there is a binary population vector of active and silent neurons (S1,t, …, SN,t).

Feature extraction using PCA

To reduce the dimensionality of the population vectors, we used Principal Component Analysis (PCA). PCA extracts a set of orthogonal directions capturing the most significant variability observed on the population vectors. We discarded population vectors with less than three spikes (this is a free parameters in the GUI) and projected the selected population vectors on the first six principal components (PCs). However, the optimal number of PCs may vary depending on the data, so it is also considered a free parameter in the GUI. Its selection is informed by the cumulative explained variance of those PCs and by the cut-off of the eigenvalue spectrum of the PC decomposition. This plot is available in the GUI.

Finally, we computed the Euclidean distance between the population vectors projected on the PC space to obtain a distance matrix. The distance matrix characterizes the dissimilarity between population vectors and its analysis is essential in the clustering step.

Centroid detection

We used a hard clustering algorithm based in the deterministic analysis of the distance matrix. Two parameters characterized each data point (i.e., a population vector) in the clustering procedure: i) ρ the density and ii) δ the minimum distance to a point with higher density. Conceptually, the density of a point represent how close (in terms of the distance matrix) the closest neighbours are. The density of each vector i was given by ρi=1/di,v¯, where di,v¯ was the average distance from point i to its closest neighbours. Typically, we considered 2% of the closest neighbours. This choice can be tuned (in the GUI) with a parameter denoted by dc. population vectors with relatively high values of ρ and δ were candidates of centroids. The rationale is that points with relatively high ρ and δ (respect to the rest of the points) have the highest number of points in their neighbourhood and are far from other points with high density. The procedure to find the centroids of the clusters follows the procedure presented in [42], which is a modified version of the method developed in [43]. To automatically detect the cluster centroids, we fit a power-law to the δ vs. ρ curve, using the 99.9% upper confidence boundary as a threshold (another free parameter in the GUI). We considered centroids to be all points falling outside this boundary. The rest of the points were assigned to their closest centroids, building up the clusters. Since each population vector is associated to a time bin, the clustering step yields the cluster activation times.

Core-cell detection

Core-cells are the best representatives of a neural ensemble and their presence, measured by a statistical test, suggest coupling between the single neuron scale and the ensemble scale, i.e. multi-scale statistical dependence. Accordingly, once the candidate clusters and centroids were identified, the Pearson correlation coefficient between the activation times of neuron n and cluster e, denoted by corr(n, e), was computed for each pair (n, e). This correlation is 1 if neuron n and cluster e were always active in the same time bins, and is -1 if they are never active at the same time bins. The intermediate values represented combinations of neurons and clusters with a tendency to be active either in the same or in different time bins.

To distinguish between a core and a non-core cell, we set a threshold obtained from a null hypothesis built from shuffled versions of the (n, e) pair. For each cluster, eE, the number of times in which it was active is kept fixed, but the temporal activation was randomly shuffled, obtaining a new temporal sequence of activation that we denoted er.

We repeated this procedure 5000 times for each cluster, obtaining a distribution of corr(n, er) for neuron n in cluster e, which represented the null hypothesis distribution associated with the correlation between a neuron and a cluster given by chance. A significance threshold was then defined as the 99.9-th percentile of this distribution (free parameter); therefore, we set p < 0.001 as threshold θe. Then, if corr(n, e)>θe, we considered that n was a core-cell of cluster e, as their correlation was above the significance level.

Within-cluster average correlation

To evaluate the inner structure of the cluster, we computed the within-cluster correlation by averaging the pairwise correlations of all core-cells of a given cluster. As a comparison, we also computed the average pairwise correlation of the whole population to obtain a threshold for ensemble selection.

Ensemble selection criteria

Following [44], we used two criteria: i) minimum cluster size and ii) within-cluster average correlation. The first criterion considered as candidates for neuronal ensembles clusters with a minimum number of core-cells. This ensures that the detected cluster matches the definition of neural ensemble as a cluster of population vectors with a core. In this article we used three as the minimum number of core-cells, but this number is presented as a free parameter in the GUI and codes. The second criterion compares the within-cluster average correlation with the average pairwise correlation of the population. In this study, if the latter exceeds the former, we discard that cluster. Note that this threshold is also free parameters in the GUI and it is expressed in terms of the standard deviation of the pairwise correlations of the whole population.

This way, we kept only clusters with a minimum size (in terms of core-cells) and with relatively (respect to the population) high within-cluster average correlation. If a cluster failed to pass any of these two criteria, we discarded it from the analysis. Otherwise, we considered the cluster as an ensemble.

Results

Method overview

Before a systematic evaluation of the performance of our method, we summarize its core elements using the example shown in Fig 1, and refer the reader to the Methods section for further details.

Fig 1. Methodological scheme.

Fig 1

(A) Synthetic spike trains of 50 neurons during 100-time bins. Four ensembles were artificially generated (see Methods for details). (B) PCA is performed on all the population vectors of more than two active neurons, we used in this case three. For visualization purposes, we plot each of them as a point in the 3D space spanned by the first three principal components (PC). The points tend to group in clusters. (C) The Euclidean distance between all the points on the PC coordinates is computed. The density of each data point ρ and its distance to the next data point with a higher density of δ is obtained. The points above the threshold (blue dashed line, see Methods for details) are considered cluster centroids. (D) All points are assigned to their closest centroid, building up the clusters of population vectors. (E) The correlation matrix between neurons and clusters, corr(n, e) is represented by the symmetric red-blue color map. (F) A threshold matrix is computed to define the significance of corr(n, e) values. (G) Core-cells are detected based on corr(n, e) values that exceed the threshold (red if core, white otherwise). (H) The inner-cluster mean correlation is compared against a threshold of non-core cell correlation, discarding any cluster below this threshold (cluster 1, in this case). This final filtering yields the set of ensembles. (I) Each population vector shown in A is coloured according to the cluster to which they belong. Black vectors were discarded by the minimum number of spikes or the threshold rejection criteria. (J) The detected ensemble sequence is represented as an integer sequence, where the color corresponds to the population vector color in I. Note that the rejection criteria discarded cluster 1, so only ensembles 2–5 are shown, and black circles denote non-clustered population vectors.

First, we discard any population vector (Fig 1A) with less than three active neurons (a free parameters in the GUI). This way we focus only on high-order interactions, leaving pairwise interaction out of the analysis. Then, we perform a principal component analysis (PCA) using the selected population vectors as observations (Fig 1B). Then, we computed the Euclidean distance between all the vectors projected on the first six principal components (see Methods for details on the selection of the number of PCs). Once the pairwise distances between population vectors are computed, the local density, ρ, and the distance to the next denser point, δ, are computed for each population vector. Based on these two measures, and a power-law fit to the ρ vs. δ curve, we automatically detect the cluster centroids (Fig 1C), and assign the rest of the vectors to their closest centroid, building up the clusters (Fig 1D). Since the core of our algorithm is the density-based clustering procedure, we call it Density-based.

To find the core-cells, we computed corr(n, e), the correlation between the spike train of neuron n and the activation times of cluster e (Fig 1E), and test its significance using a threshold associated with a null hypothesis obtained from shuffled versions of data (Fig 1F). If corr(n, e) is above the threshold, neuron n is considered a core-cell of cluster e (Fig 1G).

Finally, to obtain the ensembles, the pairwise correlation between all the core-cells of a given cluster are computed and averaged, representing the within-cluster average correlation.

To define a cluster be an ensemble, we compared the within-cluster average correlation to the average pairwise correlation of the whole network (Fig 1H). If the within-cluster if significantly higher (based on a threshold), we considered an ensemble to be present; otherwise, the cluster is discarded due to the lack of internal correlation.

With this procedure, we were able to split the population vectors into ensemble and non-ensemble vectors (Fig 1I), and to obtain the activation times of different ensembles in time (Fig 1J) with their corresponding core-cells. We clarify that our method does not provide any analytical tool to evaluate the statistical significance of the ensemble temporal sequence, rather, it generates it as an integer sequence which we refer as global sequence (Fig 1J).

Synthetic spike trains

We generated a set of ensembles E characterized by their core-cells and an activation sequence of ensembles. Each ensemble was composed of a fixed number of core cells randomly drawn from the whole population of neurons, allowing the repetition of core-cells among ensembles. We generated for each ensemble, a column binary vector ce of dimension N, with ce(n) = 1 if neuron n is a core-cell of ensemble e and 0 otherwise.

The activation sequence of each ensemble eE follows a time homogeneous Bernoulli process with parameter pe, denoted B(pe). We generate a row binary vector denoted by ae of dimension T with ae(t) = 1 if ensemble e is active on time bin t and 0 otherwise. We allowed at most one active ensemble per bin, so each population vector can be part of one ensemble or none. To implement this idea, we first drew the first ensemble’s activation times at random and then removed these times from the list of possible time points available for the second ensemble. We proceeded this way until reaching PE.

With the core-cells and the activation sequence of each ensemble defined, a spike train was generated by the product ce ae (matrix) of dimension N × T.

SE(n,t)=eEce(n)ae(t).

In order to preserve the probabilistic nature of spiking neurons, the firing rates of each neuron was drawn from a rectified Gaussian distribution (only positive values) with variable standard deviation (s.d). The larger the s.d., the higher the firing rates are present on the spike train. This parameter allowed for the control of the density of the spike train (the total number of spikes respect to the spike train duration).

We randomly added/removed spikes to/from each neuron’s spike train until matched the target firing rate P(n). In removing spikes, the population vector located on the time bins where a given ensemble was active, end up having less active neurons than defined; on the other hand, if we added spikes, the opposite effect occurred. We used three different spike train densities: low (s.d. = 0.05), medium (s.d. = 0.1) and high (s.d. = 0.2). These densities reproduce a wide range of spiking responses observed experimentally, however, the density in experimental data will depend on the bin size. Our retinal data corresponds to the medium case. For the low-density case, matching the target firing rates usually required the removal of spikes. This method corrupted the vectors related to ensemble activity by under-representing their core-cells. In the medium case, we had a balance between adding and removing spikes, and, in the last case, most neurons required the addition of spikes to match the desired firing rate, making neurons participate in ensembles they did not belong by construction.

In summary, we generated a spike train from the following procedure:

  1. Generate E ensembles characterized by population vectors built from their core-cells.

  2. Fill a spike train with the activation times of ensembles following a time-homogeneous Bernoulli process for each ensemble considering the proportion of PE of the complete set of population vectors. The remaining population vectors are filled with the population vector consisting only of silent neurons.

  3. Draw the firing rates of each neuron from a rectified positive Gaussian distribution.

  4. Randomly add/remove spikes to/from each neuron’s spike train until it matches the target firing rate of each neuron.

This procedure yields a random spike train built from the ensemble activity.

Matching detected ensembles with ground truth ensembles

To evaluate our method with synthetic data, we matched the GT ensembles with the detected ones. This was implemented by computing the correlation between the GT’s activation sequence and the detected ensembles. Thus, we looked for the detected ensemble that maximized the correlation with the GT ensembles.

To test our method, we designed an algorithm to generate synthetic data where neuronal ensembles’ activity can be parametrically controlled (see Methods for details). We assessed the detection performance of our method concerning known ground truth (GT).

We illustrate our results with a simple example shown in Fig 2, where different spike trains were generated using a network of N = 100 neurons and T = 5000 bins (the figure shows just 200 bins to improve the visualization). We created seven ensembles defining the temporal sequence of the ensembles, whose global temporal activity comprised 80% of the sample (Fig 2E), and the core-cells, whose participation comprised from 20 to 40 neurons (Fig 2I). Then, we randomly added or removed spikes to each neuron to satisfy a given firing probability for each neuron, P(n), which controls the spike trains density (Fig 2B–2D).

Fig 2. Robust ensemble detection on synthetic data under different density regimes.

Fig 2

(A) The spatial (core-cells) and temporal (activation times) properties of seven ensembles are synthetically generated on a spike train of 100 neurons. The right panel shows the firing probability of each neuron at each time bin denoted by P(n). Each ensemble is coloured differently. (B-D) Based on the ground truth (GT) spike trains, three spike trains with different densities were generated: low B, medium C and high density D. The firing probability of each neuron is shown at the right as in A. population vectors are coloured as in A. Black dots are population vectors that do not belong to any ensemble. (E) Activation times (i.e. sequence) of the GT ensembles. Colour corresponds with A. (F-H) The detected ensemble sequence for the three spike trains with different densities, sorted to match the GT indexing and their colors. At low density, the method detects two extra ensembles, denoted by a coloured asterisk. The green vertical region shows an example of ensemble that was correctly detected for the three spike trains. The red vertical region shows an example where the method correctly detects the GT vector for Medium density, but partially fails in the other two cases. (I) GT core-cells are sorted in descending order according to the number of core-cells. Each column represents an ensemble, where red indicates a core-cell. (J-L) Detected core-cells for the three cases. Despite the two extra ensembles found in J (8 and 9, asterisks), core-cells are in good agreement with GT.

Detection of ensembles on synthetic data

We found two ensembles more than expected (seven) for the low-density spike-trains (Fig 2F, red asterisks), while for the medium and high-density ones we found the expected number of ensembles (Fig 2G and 2H, respectively).

Then, we compared the ensemble sequence of the GT (Fig 2E) and the detected ensemble sequence in each density scenario (Fig 2F–2H), finding almost perfect agreement between both, with the exception of a few false positives in the case of low and high density. Finally, we compared the detected core-cells with the GT (Fig 2J–2L), finding good agreement between both. Despite the over-detection in the low-density regime (red asterisk), the other ensembles were in good agreement with the GT core-cells.

With this example, we show that our method can detect the ensemble number, the temporal sequence, and the core-cells of neuronal ensembles in noisy spike trains with different densities. In the next section, we evaluate the detection of three features, i.e., ensemble number, temporal ensemble sequence, and core-cells, for a different sample and network sizes.

Scaling performance and comparison with an alternative method

To systematically quantify and evaluate the performance of our method (Density-based), we generated synthetic data with different network and sample sizes and compared our algorithm to a current state-of-the-art ensemble detection method published by Carrillo et al. (SVD-based) [16]. The associated implementation can be found in https://github.com/hanshuting/SVDEnsemble. We used the default parameters that accompany these codes. Due to the latter method’s computational cost, we only compared the scaling with the sample size for both methods, while for the former, we also computed the scaling of performance with network size. The parameters used in our method can be found on the S1 Table. However, they are easily settable in the provided GUI.

First, we generated a synthetic spike train with fixed network size (N = 300), number of ensembles (E = 12), number of core-cells equal to 35, ensemble probability (PE = 0.8), and medium density. Then, we varied the recording length from T = 500 to T = 104, finding that both methods increased the computational time with the sample size, as expected, but the SVD-based method scaled exponentially, while the Density-based method is two orders of magnitude faster for T = 104 (Fig 3A).

Fig 3. Performance scaling with sample size and network size.

Fig 3

(A) Computational processing time in log scale of our method (purple) and Carrillo et al [16] (blue), as a function of sample size T (bins). Synthetic data generated using N = 300, medium density (see Methods for details) and 35 neurons per ensemble. Dots are averages, and shaded regions are ± 1 standard deviation of 100 repetitions. (B) Number of detected ensembles. The dashed red line is the real Number of ensembles. (C) Correlation between the detected and true global ensemble sequence. (D) Average correlation between detected and true individual ensemble sequence, and (E) between detected and true ensemble core-cells. (F) Number of detected ensembles as a function of network size. The Number of core-cells per ensemble corresponds to the 35% of the network size. (G) Same as C, but as a function of network size. Due to computational cost, only our method was evaluated as a function of network size T = 5000.

Regarding the ensemble activity, our method accurately detects the number of ensembles for samples as small as T = 1000, while the SVD-based method converges to underestimation of the ensemble number (Fig 3B).

Once the detected ensembles were matched to their closest GT ensemble (see Methods for details), we measured the correlation between both sequences, finding that our method detects the GT with excellent performance over the whole range of sample sizes. The SVD-based method, in turn, systematically fails to detect the global sequence (Fig 3C).

Furthermore, we measured the average correlation between the detected and GT individual sequences, and between the detected and GT core-cells, finding again that our method achieved excellent performance for both features even for small sample sizes (Fig 3D and 3E, respectively).

Finally, to evaluate the performance for different network sizes, we fixed the sample size, while also keeping the other parameters fixed (number of ensembles, of core-cells and PE), and varied the network size from N = 50 to N = 1000, finding that our method slightly overestimated the ensemble number for small N, but yielded accurate results for the larger N (Fig 3F and 3G).

We conclude that our method reliably performs on a wide range of sample and network sizes, giving a more scalable and accurate solution to the ensemble detection problem than the alternative SVD-based method.

Reliable performance over a wide range of spike-train parameters

Here, we show the performance of our method when the number of ensembles and the number of core-cells vary independently.

We generated synthetic spike-trains with fixed network size (N = 300), sample size (T = 5000), ensemble probability (PE = 0.8), and medium density while the number of ensembles and core-cells varied, as shown on Fig 4. We found a wide combination of these parameters where the method detects the number of ensembles with a small relative error (VOVT)/VT where VO stands for observed value and VT for theoretical value (Fig 4A), accurately detects the global ensemble sequence (Fig 4B), and the corresponding core-cells (Fig 4C). Further explorations of other parameters and combinations of the same are considered work to be developed. To this end, we provide the computational codes and a GUI at https://github.com/brincolab/NeuralEnsembles. This GUI allows one to perform all the analyses presented here on multi-variate recordings of single events (e.g., spiking data, calcium events, arrival times in a sensor).

Fig 4. Detection performance over a wide range of spike-train parameters.

Fig 4

(A) The relative error of detecting the ensemble number for spike-trains with different ensembles and the different number of core-cells. Synthetic data generated using medium density, N = 300, and T = 5000. Heat maps represent the average over 100 repetitions using the same spike-train parameters. Red/blue represents over/under detection respectively. (B) The correlation between the detected and the GT ensemble sequence. (C) Average correlation between the detected and the GT core-cells.

Detection of ensembles on RGC under a simple ON-OFF stimulus

We show our method’s usefulness on a single simultaneous recording of a mouse retinal patch in vitro using MEA (see Methods for experimental and pre-processing details). We aim to illustrate how to interpret the basic results of our method in a particular physiological setting rather than revealing functional properties of the collective activity of the retina. This is the reason why we did not analysed more animals or went deep in the possible implications of retinal neural ensembles for visual processing.

We analyzed the spike response of retinal ganglion cells (RGCs) under a simple ON-OFF light stimulus, where neuronal ensembles are detected without prior information about the stimulus. Their functional role is evaluated in terms of stimulus tuning preference. Data was binarized using a standard bin size of 20ms [45, 46]. We note that our method works on already binarized data, so the binning process is not part of the method. As expected for RGCs, the sum of all the emitted spikes in a given time bin (population activity) is tightly locked to the stimulus, transiently increasing each time the stimulus changes (Fig 5A, top panel). After this harsh response, the population activity decays exponentially until it reaches a stable point. However, the population activity evoked by the ON-stimulus is different in amplitude and shape from the ones evoked by the OFF-stimulus.

Fig 5. Stimulus-evoked retinal ensembles.

Fig 5

(A) Top. Population activity (spikes per bin) of 319 RGC’s in time during ON light stimulation (white background) and OFF (gray background). The black arrow shows the ON stimulus started. As expected for the retina, the population activity transiently increases when the stimulus switches and then decays and stabilizes. Middle. Each population vector is coloured according to the ensemble to which they belong. Bottom. The activation of each ensemble in time is represented by a coloured dot (matching the population vector color code). The method detects a consistent stimulus-locked ensemble activity with ON (ensembles from 1 to 5) and OFF (ensembles from 6 to 10) ensembles. Ensemble 0 represents all the population vectors that were discarded according to rejection criteria (see Methods). (B) The spatial distribution of the estimated receptive fields for the corresponding core-cells of each ensemble. (C) The stacked trial-by-trial responses of each ensemble in time show their reliable and selective responses to specific stimulus features. (D) The PSTH for each ensemble shows the stimulus preference of the ON and OFF ensembles. The different responses can be classified as transient and sustained, showing detailed stimulus tuning. (E) The stacked trial-by-trial responses of the most correlated core-cell for each ensemble shows that the temporal activity of the ensemble is poorly determined by the activity of their core-cells. (F) For each ensemble, the average PSTHs of their corresponding core-cells grouped by cell type (colors) shows no clear tuning preference. (G) The coloured matrix shows the RGCs in rows and the ensembles in the columns. Each column is colored in the entries corresponding to their core-cells, according to which ensemble they belong (A).

These different evoked responses led us to expect at least four types of ensembles: two ensembles related to transitions (one for the ON-OFF and one for the OFF-ON) and two ensembles related to the decaying-stable activity after the ON-OFF and OFF-ON transition, respectively. To test this hypothesis, we applied our ensemble detection algorithm (see Methods for details and parameters) on the spiking activity of 319 RGCs during 120 seconds of MEA recording.

We found ten ensembles comprising ∼68% of the population vectors in the analyzed recording. Their activity was highly locked to the stimulus (Fig 5A middle and bottom panel and D). We found two transiently active ensembles (one for the ON-OFF and one for the OFF-ON transition), whose activity was only evoked by the stimulus transition, showing no activity either before the stimulus start (black arrow in Fig 5A) or during the decaying or stable response. The other eight ensembles were active before stimulus presentation, but at a lower rate, and during the decaying or stable response. Notably, four of them (Ens. 7–10) are preferentially active in the OFF stimulus, and the other four during the ON stimulus (Ens. 2–5).

These results show that the detected retinal ensembles are preferentially tuned to the features of the stimulus, showing that without any stimulus-related information, our method can obtain ensembles whose activity is functionally coupled with the stimulus.

Functional properties of RGC ensembles

We detect the ensembles and their core-cells i.e. cells whose correlation with a given ensemble is statistically significant(see Methods for details). On average, each core-cell participated in 2.7 ± 1.3 ensembles, and only four RGCs were considered non-core, considering all the ensembles. Three cells participated in six ensembles, indicating that some cells may participate in both ensemble classes.

The two transiently active ensembles, namely Ens. 1 (ON) and Ens. 6 (OFF), have 257 and 222 core-cells, respectively, while the rest of the ensembles have, on average, 47.2 ± 17.9 core-cells (Fig 5G). This result is consistent with the increased population activity evoked by stimulus transition, where many cells are firing, and decaying responses where many cells become silent.

Using the RGC responses to the repeated ON-OFF stimulus (21 trials) and an automated RGC functional classification algorithm (see Methods for details), we obtained 44 ON, 23 OFF, 205 ON-OFF, and 47 Null (no significant preference) RGCs.

All the ensembles were composed of ON-OFF core-cells, but the ON ensembles were dominated by ON RGCs while OFF ensembles by OFF RGCs. Null cells showed significant participation in some ensembles, despite their lack of preference according to the classification algorithm. Indeed, many core-cells participated in many ensembles. For each ensemble, the spatial distribution of the estimated receptive fields (see Methods) of their corresponding core-cells is shown in Fig 5B, revealing this spatial overlap between ensembles. Then, in this simple setup, we found that the ensemble’s classification is consistent with their corresponding core-cells’ dominant classes.

The detection of core-cells allows us to inquire if the tuning preferences of the detected ensembles are inherited from their core-cells or if the ensembles have a specific tuning preference as a functional unit. We recall that core-cells are not perfectly correlated to the activation times of their corresponding core cells. In other words, the ensemble activation times can not be completely determined by the spike trains of its core-cells, so the tuning preferences of the ensemble as a whole could differ from the tuning preferences of single cells. To qualitatively evaluate this for each ensemble, the peri-stimulus time histogram (PSTH) of each ensemble (Fig 5C and 5D) was computed and compared to the PSTH of their corresponding core-cells (Fig 5E and 5F). Moreover, the most representative core-cell for each ensemble (highest correlation with the ensemble activity, Fig 5E) showed responses that strongly differs from the ensemble responses. Finally, for each ensemble, we averaged the PSTH of the core-cells grouped by RGC class Fig 5F, obtaining one average PSTH per cell class, which also shows the correspondence between ensemble tuning preference and cell type dominance. Since ON-OFF cells are present in all ensembles, the core-cells of each ensemble, as a group, have a preference for both light transitions, despite the precise stimulus tuning of the ensembles (Fig 5D).

As a proof of concept, we test our method on retinal data. We did not went deep on the systematic analysis of retinal ensembles, which we consider a future work to be developed. Nevertheless, our qualitative comparisons suggest that the ensemble tuning preference cannot be completely derived from their core-cells’ tuning preference, providing preliminary evidence in favor of neuronal ensembles as whole functional units in the early visual system.

Discussion

We introduced a scalable and computationally efficient method to detect neuronal ensembles from spiking data. Using simple dimensionality reduction, clustering techniques [43], and suitable statistical tests, we were able to develop a simple, fast and accurate method. On the one hand, our example of mouse retinal ganglion cells provides evidence for an expected causal relationship between stimuli and RGC ensembles. On the other hand, the simulated data examples show that our method provides accurate results for a wide range of neural activity scenarios, outperforming existing tools for ensemble detection in terms of accuracy and computation time.

Our method detects neural ensembles considering four general properties of them: transient activation (spontaneous and/or stimuli-evoked), the presence of a core, the same neurons may participate in different ensembles, and within-ensemble correlation should be higher than the rest correlation of the whole population. Other methods for ensemble detection fulfill other criteria, e.g., finding communities between spiking neurons along time [26, 47], or using event-related activity [44]. As an alternative to theses these methods, our analysis relies only on grouping the population vectors with no event-related information, allowing us to segment a spiking network’s temporal activity under both spontaneous and stimuli-evoked conditions.

Regarding the SVD-based method [9], we acknowledge the insights that its application has provided to the study of neural networks. However, it has limitations in terms of computational cost (for relatively small sample sizes, Fig 3) and parameter tuning. Further, the SVD-based method extracts the temporal activation of ensembles from the spectral decomposition of the similarity matrix between population vectors, aiming to detect groups of linearly independent vectors. Instead, we use a subset of the first principal components to embed the population vectors and cluster them into that space, with no need to find linear independence between the clusters. Finally, the SVD-based method has no explicit implementation for the evaluation of the within-ensemble correlation, which, in our case, is a critical step to distinguish between a noisy cluster and an ensemble cluster.

There is room for further improvement of our method. For example, non-stationarity and non-linear spike correlations may produce spurious results in ensemble detection methods that depend on PCA. Our approach can be adapted to include other measures of spike train similarity besides linear correlation values [48] or include alternative surrogate data methods to provide null hypothesis for correlations that preserve temporal features of spike trains as their inter-spike interval or autocorrelation [49]. However, for generality, we use the least constrained null hypothesis, i.e. to destroy all the spike train temporal structure in order to test the significance of correlation values. For simplicity, in our example of RGCs, we used the standard bin size for mammalian RGCs, but the bin size used to create the spike trains may influence the detected ensembles, and thus different bin sizes can yield different results. Note, however, that our method is designed for already binarized data, so we consider the binning procedure as a pre-processing step, and, as we did, it should be informed by prior knowledge about the temporal resolution of the recorded neural population [45, 46].

Our method does not evaluate the statistical significance of the detected ensemble sequences, which has been proposed to be physiologically meaningful [34]. In fact, the extension toward clustering temporal sequences of population vectors is straight-forward, and we consider its validation and evaluation as a work to be developed. However, our contribution is to put forward a simple, fast, accurate, and scalable algorithm to detect synchronous neural ensembles that, given its simplicity and code availability, can be easily applied in multiple scenarios and extended in several ways.

Finally, we provided novel evidence in favour of the existence of retinal ensembles that are functionally coupled to the stimuli. However, our purpose was to exemplify our method on a biological spiking network rather than explaining the possible mechanisms involved in the activity of retinal ensembles or their functional implications. Indeed, we consider the study of retinal ensembles as an exciting new research avenue that needs to be developed in the way to understand vision, sensory systems, and more generally, the nervous system. In line with this perspective, we delivered a method that is general enough to be applied to any multi-variate binary data set. Furthermore, it is intuitive and can generate results that are easy to visualize, which should favor their comprehension and general use by the scientific community. Matlab codes and a GUI implementing our new method accompany this article.

Supporting information

S1 Appendix. Code repository, retinal experiments, and data pre-processing.

Link to the Codes repository and the details of the retinal experiments with their respective pre-processing.

(PDF)

S1 Table. Method parameters with their description and default values.

(PDF)

Acknowledgments

The authors thank Fernando Rosas for insightful discussions and valuable suggestions.

Data Availability

The GUI, codes, and data required for the use and validation of our method is available on GitHub (https://github.com/brincolab/NeuralEnsembles).

Funding Statement

RH is funded by CONICYT scholarship CONICYT-PFCHA/Doctorado Nacional/2018- 21180428. (https://www.anid.cl) AM is funded by CONICYT scholarship CONICYT-PFCHA/Magíster Nacional/2020- 22200156. (https://www.anid.cl) RC is funded by Fondecyt Iniciación 2018 Proyecto 11181072. (https://www.anid.cl) MJE and AGP are funded by AFOSR Grant FA9550-19-1-0002 (https://www.afrl.af.mil/AFOSR/). AGP is funded by ICM-ANID #P09-022-F, CINV (http://www.iniciativamilenio.cl/). SM and JA received no funding for developing this work. All the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Hebb DO. The Organization of Behaviour. Organization. 1949; p. 62. https://doi.org/citeulike-article-id:1282862
  • 2. Russo E, Durstewitz D. Cell assemblies at multiple time scales with arbitrary lag distributions. eLife. 2017;6(1):e19428. doi: 10.7554/eLife.19428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Varela F, Lachaux JP, Rodriguez E, Martinerie J. The brainweb: Phase synchronization and large-scale integration. Nature Reviews Neuroscience. 2001;2(4):229–239. doi: 10.1038/35067550 [DOI] [PubMed] [Google Scholar]
  • 4. Uhlhaas P, Pipa G, Lima B, Melloni L, Neuenschwander S, Nikolić D, et al. Neural synchrony in cortical networks: history, concept and current status. Frontiers in Integrative Neuroscience. 2009;3:17. doi: 10.3389/neuro.07.017.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Stringer C, Pachitariu M, Steinmetz N, Reddy CB, Carandini M, Harris KD. Spontaneous behaviors drive multidimensional, brainwide activity. Science. 2019;364 (6437). doi: 10.1126/science.aav7893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yuste R. From the neuron doctrine to neural networks; 2015. [DOI] [PubMed]
  • 7.Ringach DL. Spontaneous and driven cortical activity: implications for computation; 2009. [DOI] [PMC free article] [PubMed]
  • 8.Carrillo-Reid L, Yuste R. Playing the piano with the cortex: role of neuronal ensembles and pattern completion in perception and behavior; 2020. [DOI] [PMC free article] [PubMed]
  • 9. Carrillo-Reid L, Lopez-Huerta VG, Garcia-Munoz M, Theiss S, Arbuthnott GW. Cell Assembly Signatures Defined by Short-Term Synaptic Plasticity in Cortical Networks. International journal of neural systems. 2015;25(7):1550026. doi: 10.1142/S0129065715500264 [DOI] [PubMed] [Google Scholar]
  • 10.Buzsáki G. Large-scale recording of neuronal ensembles; 2004. [DOI] [PubMed]
  • 11. Einevoll GT, Franke F, Hagen E, Pouzat C, Harris KD. Towards reliable spike-train recordings from thousands of neurons with multielectrodes. Current opinion in neurobiology. 2012;22(1):11–7. doi: 10.1016/j.conb.2011.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Watanabe K, Haga T, Tatsuno M, Euston DR, Fukai T. Unsupervised detection of cell-assembly sequences by similarity-based clustering. Frontiers in Neuroinformatics. 2019. doi: 10.3389/fninf.2019.00039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Harris KD. Neural signatures of cell assembly organization. Nature reviews Neuroscience. 2005;6(5):399–407. doi: 10.1038/nrn1669 [DOI] [PubMed] [Google Scholar]
  • 14. Carrillo-Reid L, Yang W, Kang Miller Je, Peterka DS, Yuste R. Imaging and Optically Manipulating Neuronal Ensembles. Annual Review of Biophysics. 2017. doi: 10.1146/annurev-biophys-070816-033647 [DOI] [PubMed] [Google Scholar]
  • 15. See JZ, Atencio CA, Sohal VS, Schreiner CE. Coordinated neuronal ensembles in primary auditory cortical columns. eLife. 2018. doi: 10.7554/eLife.35587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Carrillo-Reid L, Miller JEK, Hamm JP, Jackson J, Yuste R. Endogenous Sequential Cortical Activity Evoked by Visual Stimuli. Journal of neuroscience. 2015;35(23):8813–28. doi: 10.1523/JNEUROSCI.5214-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wenzel M, Hamm JP, Peterka DS, Yuste R. Acute Focal Seizures Start As Local Synchronizations of Neuronal Ensembles. The Journal of neuroscience: the official journal of the Society for Neuroscience. 2019. doi: 10.1523/JNEUROSCI.3176-18.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Hamm JP, Peterka DS, Gogos JA, Yuste R. Altered Cortical Ensembles in Mouse Models of Schizophrenia. Neuron. 2017. doi: 10.1016/j.neuron.2017.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Wenzel M, Han S, Smith EH, Hoel E, Greger B, House PA, et al. Reduced Repertoire of Cortical Microstates and Neuronal Ensembles in Medically Induced Loss of Consciousness. Cell Systems. 2019;8(5):467–474.e4. doi: 10.1016/j.cels.2019.03.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Fang WQ, Yuste R. Overproduction of Neurons Is Correlated with Enhanced Cortical Ensembles and Increased Perceptual Discrimination. Cell Reports. 2017. doi: 10.1016/j.celrep.2017.09.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Buzsáki G. Neural Syntax: Cell Assemblies, Synapsembles, and Readers. Neuron. 2010;68(3):362–385. doi: 10.1016/j.neuron.2010.09.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Eichenbaum H. Barlow versus Hebb: When is it time to abandon the notion of feature detectors and adopt the cell assembly as the unit of cognition?; 2018. [DOI] [PMC free article] [PubMed]
  • 23. Nicolelis MAL, Baccala LA, Lin RCS, Chapin JK. Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system. Science. 1995. doi: 10.1126/science.7761855 [DOI] [PubMed] [Google Scholar]
  • 24. Benchenane K, Peyrache A, Khamassi M, Tierney PL, Gioanni Y, Battaglia FP, et al. Coherent Theta Oscillations and Reorganization of Spike Timing in the Hippocampal- Prefrontal Network upon Learning. Neuron. 2010. doi: 10.1016/j.neuron.2010.05.013 [DOI] [PubMed] [Google Scholar]
  • 25. Peyrache A, Benchenane K, Khamassi M, Wiener SI, Battaglia FP. Principal component analysis of ensemble recordings reveals cell assemblies at high temporal resolution. Journal of Computational Neuroscience. 2010. doi: 10.1007/s10827-009-0154-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lopes-dos Santos V, Conde-Ocazionez S, Nicolelis MAL, Ribeiro ST, Tort ABL. Neuronal assembly detection and cell membership specification by principal component analysis. PLoS ONE. 2011;6(6). doi: 10.1371/journal.pone.0028885 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Singh A, Humphries MD. Finding communities in sparse networks. Scientific Reports. 2015;5:1–7. doi: 10.1038/srep08828 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Bruno AM, Frost WN, Humphries MD. Modular deconstruction reveals the dynamical and physical building blocks of a locomotion motor program. Neuron. 2015;86(1):304–318. doi: 10.1016/j.neuron.2015.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Torre E, Picado-Muiño D, Denker M, Borgelt C, Grün S. Statistical evaluation of synchronous spike patterns extracted by frequent item set mining. Frontiers in computational neuroscience. 2013;7:132. doi: 10.3389/fncom.2013.00132 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Torre E, Canova C, Denker M, Gerstein G, Helias M, Grün S. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains. PLoS Computational Biology. 2016. doi: 10.1371/journal.pcbi.1004939 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yegenoglu A, Quaglio P, Torre E, Grün S, Endres D. Exploring the usefulness of formal concept analysis for robust detection of spatio-temporal spike patterns in massively parallel spike trains. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 2016.
  • 32. Quaglio P, Yegenoglu A, Torre E, Endres DM, Grün S. Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE. Frontiers in Computational Neuroscience. 2017;11(May):1–17. doi: 10.3389/fncom.2017.00041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Onken A, Liu JK, Karunasekara PPCR, Delis I, Gollisch T, Panzeri S. Using Matrix and Tensor Factorizations for the Single-Trial Analysis of Population Spike Trains. PLoS Computational Biology. 2016. doi: 10.1371/journal.pcbi.1005189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Carrillo-Reid L. What Is a Neuronal Ensemble? OXFORD RESEARCH ENCYCLOPEDIA, NEUROSCIENCE. 2020;.
  • 35. Euler T, Haverkamp S, Schubert T, Baden T. Retinal bipolar cells: elementary building blocks of vision. Nature reviews Neuroscience. 2014;15(8):507–19. doi: 10.1038/nrn3783 [DOI] [PubMed] [Google Scholar]
  • 36. Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, et al. Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell. 2016;166(5):1308–1323.e30. doi: 10.1016/j.cell.2016.07.054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Vlasits AL, Euler T, Franke K. Function first: classifying cell types and circuits of the retina. Current Opinion in Neurobiology. 2019;56:8–15. doi: 10.1016/j.conb.2018.10.011 [DOI] [PubMed] [Google Scholar]
  • 38. Masland RH. The Neuronal Organization of the Retina. Neuron. 2012;76(2):266–280. doi: 10.1016/j.neuron.2012.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Gollisch T, Meister M. Eye Smarter than Scientists Believed: Neural Computations in Circuits of the Retina. Neuron. 2010;65(2):150–164. doi: 10.1016/j.neuron.2009.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Real E, Asari H, Gollisch T, Meister M. Neural Circuit Inference from Function to Structure. Current Biology. 2017;27(2):189–198. doi: 10.1016/j.cub.2016.11.040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Tikidji-Hamburyan A, Reinhard K, Seitter H, Hovhannisyan A, Procyk C, Allen A, et al. Retinal output changes qualitatively with every change in ambient illuminance. Nature Neuroscience. 2014. doi: 10.1038/nn.3891 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Yger P, Spampinato GLB, Esposito E, Lefebvre B, Deny S, Gardella C, et al. A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. eLife. 2018;7:1–23. doi: 10.7554/eLife.34518 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–1496. doi: 10.1126/science.1242072 [DOI] [PubMed] [Google Scholar]
  • 44. Montijn JS, Olcese U, Pennartz CMA. Visual Stimulus Detection Correlates with the Consistency of Temporal Sequences within Stereotyped Events of V1 Neuronal Population Activity. The Journal of Neuroscience. 2016;36(33):8624–8640. doi: 10.1523/JNEUROSCI.0853-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Greschner M, Shlens J, Bakolitsa C, Field GD, Gauthier JL, Jepson LH, et al. Correlated firing among major ganglion cell types in primate retina. The Journal of physiology. 2011;589(Pt 1):75–86. doi: 10.1113/jphysiol.2010.193888 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Schneidman E, Berry MJ, Segev R, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440(7087):1007–12. doi: 10.1038/nature04701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Humphries MD. Dynamical networks: finding, measuring, and tracking neural population activity using network theory. 2017. 10.1101/115485 [DOI] [PMC free article] [PubMed]
  • 48. Humphries MD. Spike-train communities: Finding groups of similar spike trains. Journal of Neuroscience. 2011. doi: 10.1523/JNEUROSCI.2853-10.2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Louis S, Borgelt C, Grün S. In: Generation and Selection of Surrogate Methods for Correlation Analysis; 2010. p. 359–382.

Decision Letter 0

Jonathan David Touboul

4 Jan 2021

PONE-D-20-33389

Scalable and accurate method for neuronal ensemble detection in spiking neural networks

PLOS ONE

Dear Dr. Herzog,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

As you will see from the Referees reports, many major points have been raised, and I encourage you to address all of the major points, or explain precisely how they have been taken into account in your revised manuscript (or why they were not).

Please submit your revised manuscript by Feb 04 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jonathan David Touboul

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please specify in the methods section how many animals were used for your study.

3.In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this paper « Scalable and accurate method for neuronal ensemble detection in spiking neural networks », the authors are presenting a new method to identify groups of similarly behaving neurons, and the temporal sequences of these activations. Using both real experimental and synthetic data, they show how a method based on dimensionality reduction combined with a robust density based clustering can identify pattern of activity shared by numerous neurons. Overall, I find the article easy to read and to follow. The methodology is clear and the results well explained. However, I have several concerns before I could recommend the publication of the manuscript.

Major concerns

- My first comment is on the general structure of the manuscript. I think that the experimental validation (with retinal recordings) should come as an illustration of the methods, i.e. should be at the end of the manuscript. The motivation and the scientific questions could remain as they are, but currently, this is a bit odd to start the paper with results performed on real data, before explaining that they are valid because of tests performed on synthetic data. To me, I would switch the two parts, and develop first thoroughly how robust the algorithm is, before applying it to real data.

- As far as I understood, the methods seeks at finding groups of neurons with similar responses. This is particularly useful in the retina, where we know that the retinal ganglion cells have different functional responses, tiling the visual space. However, I found that this information is not used to assess the quality of the ensemble detection with experimental data. Indeed, if the methods succeed at finding groups of neurons, then when applied to the retina, we should recover different subclasses of cells (ON/OFF, sustain vs transient, …). See for example the classification that has been made in [Baden et al, 2016] with calcium recordings. If this is true, then one could look at the tiling of the receptive fields for the neurons in the distinct ensemble, and assess if they are properly covering the space or not. If yes, then this could be a hint than the algorithmic method is finding an ensemble which is likely to reflect a particular subtypes of cells. If no, then it could also be used as an hint than these ensembles are made of more complicated sub-circuits, mixing different subtypes. This is I guess what has been done by the authors in Fig1 panel B, C, D, but I think this would deserve a dedicated figure, as the current take home message is not clear. I would like to see the Receptive Fields, not colored as function of ON/OFF/ON-OFF/Null, but rather as function of their ensemble (as this is currently done for the PSTHs). By doing so, one could appreciate if the ensembles are tiling the space, i.e. if the method is able to distinguish putative cell subtypes.

- Regarding the experimental procedures. How many animals were used to collect the data? When the authors are talking about 319 RGCs during 120s of MEA recordings, is this on a single recording? This comment makes me realize that the tiling mentioned before might not be so obvious, if data are collected across several animals. But if they are, what do the plots in Fig1B corresponds to ? Can the authors explain what these receptive fields maps are, and if data are from a single retina? If, on the other hand, data are from a single retina, why did the authors not reproduce the analysis on various recordings? This would consolidate the (functional) claims on the ensembles, if the number found could be reliable across animals.

- From an experimental point of view, and still in line with was has been done in [Baden et al, 2016], why the authors restricted themselves to very low dimensional stimuli (ON/OFF light stimulus)? If the goal is to detect ensemble, i.e. cells sharing similar responses, then it might be more interesting to display high-dimensional stimuli, to trigger more variability in the responses, and thus get a closer look on these putative ensembles. The more diverse the responses will be, the more the ensemble might appear (assuming there is approximately 30 subtypes of retinal ganglion cells in the mammalian retina, I would naively assume that there should be much more ensembles popping up).

- What is the rationale for discarding responses with low firing rates?

- There is something I do not get with the methods: in the discussion, the authors are talking about the fact that they used “the standard bin size for mammalian RGCs”. I guess this bin size is used to bin the spike trains before PCA, but this is not explicitly said in the methods, is it? Or I may have missed it…Anyway, what is the value used in the paper?

- What is the rationale for allowing only one active ensemble per? I know that otherwise, non-linearity are likely to perturb the methods, but this is a very strong assumption that need to be discussed. While it might be true for retinal activation, assuming different stimuli are triggering different subtypes/responses, one at a time, this assumption does not hold in the global context pictured by the authors in the Introduction. Cortical ensembles in vivo might overlap both in time and space. Please comment on that.

- As the authors acknowledges themselves in the Discussion, different time bins might affect the results. It would be good to provide an analysis on the robustness of the methods w.r.t. the time bin. One could not expect the results to be the same, but a quantification of the variability is important. Since there is no guarantee that the chosen time bin is optimal experimentally, at last it could give some confidence in the results. This can be explored with the synthetic data generated by the authors.

- What about time-warping? I know this might be out of the scope of the manuscript, but maybe it would be worth comparing a bit the method proposed here with the ones seeking at the detection of assemblies taking into account the fact that sequences can be time-warped [Williams et al, 2020, Mackevicius et al, 2019]. In the current paper, the authors are seeking at a fixed sequence of activation between ensembles, but this assumption of a fixed bin size should be discussed.

- What are the equivalent of low/medium/high density cases for real data? More precisely, where are the retina data, on this scale? This should be said to enforce confidence in the results, since the low density case seems to be more problematic.

References

- Williams et al, 2020, Point process models for sequence detection in high-dimensional neural spike trains, https://arxiv.org/abs/2010.04875

- Williams et al, 2020, Discovering Precise Temporal Patterns in LargeScale Neural Recordings through Robust and Interpretable Time Warping, https://doi.org/10.1016/j.neuron.2019.10.020

- Baden et al, 2016, The functional diversity of retinal ganglion cells in the mouse, Nature, https://doi.org/10.1038/nature16468

- Mackevicius et al 2019, Unsupervised discovery of temporal sequences in high-dimensional datasets, with applications to neuroscience https://elifesciences.org/articles/38471

Reviewer #2: In this manuscript the authors introduce a new pipeline for the detection of synchronous cell assemblies in parallel spike trains. The proposed algorithm relies on density-based clustering of population activity patterns. The algorithm performance is assessed on simulated ground truth and then tested on spike trains from in vitro retinal ganglion cells during an ON-OFF light stimulation.

I found the simplicity and the scalability (huge problem for the community) of the algorithm the strength points of the proposed manuscript, which I found useful for the scientific community. However, there are a number of major points to address before publication.

Major points:

1. Assessment of core-cells: Once established the assembly core patterns (centroids) the authors define the core-cells of each assembly by testing the correlation between single-unit activity and assembly signals. This is done by shuffling the temporal sequence of assembly activations. While I found appropriate using bootstrap techniques to assess the correlation significance, in case of time series it is important to preserve the internal autocorrelation of the shuffled signal to avoid false discoveries. The most common, but not exclusive, way of doing so is by first dividing the time series in windows of size comparable to the signal autocorrelation length, and then by permuting the order of such windows [Efron & Tibshirani, Introduction to the bootstrap, 1993, Springer]. I, therefore, suggest the authors to address this point and modify the algorithm accordingly. Also, the authors leave the significance parameter of this test as free parameter. This should not be the case, as significance it is not arbitrary, and this parameter should be limited to <0.05.

2. The simulated time series used to test the validity of the method are quite unrepresentative of real biological data: a) they are stationary; b) the ratio between assembly activations and background spikes is extremely high, to the point that (in the low and medium density case) almost all spikes fired by a unit take part to one or the other assembly; c) as mentioned in page 11 lines 295-296, non-assemblies spike patterns are filled with only silent spikes. In these extreme conditions is a bit difficult to assess the algorithm qualities. I would therefore suggest testing the false discovery rate of the method by using a nonstationary background (non-assembly) activity for all units, and a way smaller ratio between assembly and non-assembly activity. In particular, I suggest formally evaluating the false discovery rate (in terms of detected core-units and assemblies) for a non-stationary background activity and: 1) a range of values of total assembly activations; 2) a range of values of percentages of assembly spikes missing (similar in spirit to what done for Fig 3B).

3. I imagine that detecting assemblies composed of very few core-units might be challenging for this kind of approach. I would therefore suggest discussing this limitation (if present).

4. As last step of the algorithm “The inner-cluster mean correlation is compared against a threshold of non-core cell correlation, discarding any cluster below this threshold”. This final step is not clearly explained and would benefit of some more detail. How is this threshold chosen?

5. The sentence “Among them, only the one based on the correlation between population patterns, and the one based on non-negative matrix factorization, fit with a definition of ensembles.” is either false or confusing. I would suggest rephrasing or removing.

6. In the discussion, the authors compare their method with other state of the art techniques (Page 13, lines 372-382). I found this paragraph a bit surprising, as a large variety (I would say the majority) of assembly detection techniques can detect synchronous assembly patterns and with no-event related information. I would therefore suggest to rephrase and tone down this paragraph and, maybe, focus more on the method scalability, real point of strength.

Minor points:

7. Please, define mathematically the quantity “relative error”.

8. Please, indicate the value of all arbitrary parameters used for each analysis (also for the SDV-based method).

9. There are relatively many parameters that the authors leave free for the user to decide. While user necessities are multiple and have to adapt to diverse datasets, a total arbitrariness of these choices might lead to extreme settings and unreasonable assembly detections. I think it would beneficial for future users to discuss the implications behind each parameter choice (e.g. by showing the consequences of choosing a different number of PCA principal components when many assemblies are present) and to suggest a parameters set “safe” for the users to use.

Reviewer #3: Comments to manuscript: “Scalable and accurate method for neuronal ensemble detection in spiking neural networks” PONE-D-20-33389

This manuscript proposes an analytical method to detect neuronal ensembles from MEA electrical recordings. The work demonstrates that the method could be used in retinal recordings exposed to ON-OFF light stimuli. The authors validated the method using synthetic data to identify different neuronal ensembles. The standardization of methodologies to identify groups of neurons with coordinated activity that are related to specific experimental conditions is a relevant topic in neuroscience since the simultaneous recordings of neuronal populations has become worldwide available and is growing at a steady pace. The manuscript describes a potential method to identify neuronal ensembles, but several concepts and implementations should be clarified in order to make the method useful for diverse brain structures. Otherwise, the tittle should be changed to highlight that the method is useful mainly for in vitro retinal recordings and specific experimental conditions.

Finally, the analysis for the sequential organization of the ensembles needs to demonstrate that sequential patterns of activity could not be identified in random population activity.

Major:

1) The definition of what is the meaning of a neuronal ensemble is not clear from the abstract nor the introduction. This is a relevant issue since the goal of the manuscript is to detect neuronal ensembles.

It is mentioned that a neuronal ensemble is a group of neurons that fire together (page 1, line 9) but the authors indistinctively call a neuronal ensemble a group of neurons firing together or different groups of neurons firing in sequences. The methodological demonstration of groups of neurons firing together or groups of neurons firing in sequences are two completely different approaches.

2) Page 2, line 42. What is the meaning of core cells? Does core cells represent the same concept of a neuronal ensemble? In such case the use of core is unnecessary and confusing.

3) Page 3, lines 72-79. The use of the concept of core-cells is not clear. Why “the ensembles as a whole are not a simple inheritance from their corresponding core-cells” if the authors define a neuronal ensemble as the core-cells in the previous page.

4) Page 3, line 83. It is not clear what is the meaning of scalable. Does it mean different brain areas or different orders of magnitude in the number of recorded neurons? Or different lengths and sizes of the data?

5) Page 3, lines 95-96. In the example shown in figure 1A, it is clear that ON and OFF early responses have different properties as length and amplitude. In particular, the approach used in the manuscript (PCA) will enhance such differences in the low dimensional projection of the data. However, in many other conditions and brain structures the activity of different ensembles could have the same properties limiting the use of PCA to separate such subtle changes. The authors should show that ensembles with the same overall characteristics of population activity could be detected with PCA. Otherwise, the manuscript should clarify that this method is useful for retinal recordings.

6) Figure 1A. From the figure Bottom it is hard to see if many ensembles could be active at the same time? If this is the case, then the definition of a neuronal ensemble becomes critical for the claims of the manuscript. Because, in that case several combinations of different groups could potentially represent different neuronal ensembles.

6) Figure 1B. The labeling is confusing because it is mentioned before that four different ensembles are expected. What is the meaning of ON-OFF?

7) Page 5, lines 148-150. It is not clear what measurement was used to reach the conclusion that the ensemble tuning preference cannot be completely derived from their core-cells.

8) Even though the method is intended for a broad neuroscience community. It is important to mention in the results how the spikes were extracted and how the ON and OFF cells could be identified.

9) Page 5, line 156. It should be indicated in the results what is the meaning of a fixed number of spikes.

10) Page 5, line 159. It is mentioned that only 3 to 6 principal components are used. The authors should indicate in the results section what is the criteria to select 3, 4, 5 or 6 principal components.

11) Page 8. Section Detection of ensembles on synthetic data. To test the method the authors generated synthetic data that resembles the data showed previously from retina. As I mentioned before the manuscript shows that the method is ad hoc for retina recordings but not other brain areas.

12) Page 8, lines 195-206. The manuscript introduced here the detection of sequential activity patters. But the criteria to define sequential patterns of activity and their statistical significance were not mentioned anywhere. Such criteria should be included otherwise should be removed from the paper.

13) Pages 8-9. To evaluate and compare the performance of the method the manuscript compared the results obtained with synthetic data using the current method and a previously published method (SVD-based). As I mentioned before the authors showed that their method is useful for data with specific characteristics so it is expected that the performance will excel other methods. On the other hand, the method compared (SVD-based) was used for optical recordings whereas the current method analyzed electrical recordings. The difference in the temporal resolution of the data could also bias the results. The authors should compare their method against other method applied to electrical recordings or discuss the limitations to apply it to data obtained with lower temporal resolution.

14) Page 10. Synthetic Spike Trains section. It is not clear why the authors used a Bernoulli process to define the activation sequence of each ensemble. A Bernoulli process define a series of random events that are independent whereas the sequential activity of neuronal ensembles has been shown to be time dependent.

15) Page 11. Feature extraction using PCA section. The authors should be consistent in the terminology they used along the manuscript. This section mentions that spike patterns with less than 3 spikes were discarded. What is a spike pattern? Is this a population pattern or a single cell pattern? Why 3 spikes?

16) Page 11. Previously the manuscript mentioned that 3 to 6 principal components (page 5, line 159) were used but now they mention that 4 to 7 principal components were used (page 11, line 309).

17) Page 11, line 313. What is close pattern’ clustering?

18) Page 12, Centroid detection section. It is not clear from the text how the clustering is performed. Is it a hard or a soft clustering?

19) Did the authors remove the points that couldn’t be clustered in the low dimensional space and then perform the clustering again? The main limitation of clustering high dimensional data is that all the points will appear in the low dimensional representation even though they represent unique events.

20) What is a closest centroid? The manuscript didn’t mention any measurement for that.

21) Page 12. Core-cell detection section. This section described the procedure to detect core-cells. However, from the text it is not clear why these core-cells are important or what is the difference between an ensemble and the core-cells of an ensemble. Is it the same?

22) Page 12. Ensemble selection criteria. The authors mention two criteria for ensemble selection. But from the text it is not clear how to choose such parameters. This is important for anyone interested in applying the method to their own data.

23) Page 13, line 386. The SVD-based method from the reference cited (16) didn’t extract a temporal sequence of ensembles as mentioned in the text.

24) The authors should compare different bin sizes as they mentioned in the discussion. This is very important for the broad use of the method.

Minor:

1) Page 2, line 39. It is not clear from the text what is the meaning of “one-time bin of the spike train”.

2) Page 3, lines 66-67. Revise “…when confronted to changes light intensity”

3) Page 3, line 88. What is parallel recording? Does it mean simultaneous?

4) Page 4, lines 114-115. The conclusion of this sentence is not clear since the data only shows ON and OFF responses. What does it mean without any stimulus-related information?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Luis Carrillo-Reid

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jul 30;16(7):e0251647. doi: 10.1371/journal.pone.0251647.r002

Author response to Decision Letter 0


4 Feb 2021

We opt to make a pdf file with the response to the reviewers. We attached this pdf as Reviewer Response in the 'Attach Files' section.

Attachment

Submitted filename: PlosOne_Reply_to_reviewers.pdf

Decision Letter 1

Jonathan David Touboul

1 Apr 2021

PONE-D-20-33389R1

Scalable and accurate method for neuronal ensemble detection in spiking neural networks

PLOS ONE

Dear Dr. Herzog,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

As you will see, Reviewers 1 and 3 are satisfied with the work, while Reviewer 2 has repeated or made more specific some of their initial requests. Given the feedback of the Referees and the nature of the additional requests, it seems that the paper will likely meet the publication criterial of Plos One. However, I wanted to give you an opportunity to access this report and, when you judge it appropriate, alter your manuscript accordingly; for points you do not wish to pursue, please provide a brief rebuttal in your resubmission letter.

==============================

Please submit your revised manuscript by May 16 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jonathan David Touboul

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this revised manuscript, most of my comments/concerns have been addressed, and I can now recommend its publication

Reviewer #2: To my major surprise, the Authors decided not to address the majority of the points raised as major concerns. Without an appropriate reply to all points raised in my original review (and with this I mean the addition in the manuscript of new tests and a discussion of the algorithm limitations) at the next round of revision I will be forced to reject the paper.

Specifically, in relation to the Author’s replies:

1) First, as visible from Fig.5A retinal ganglion cells have a clear autocorrelation. Secondly, and most importantly, the goal of the paper is to provide the scientific community with a tool to detect cell assemblies in experimental data. It is well established that neuronal time series are non-stationary and autocorrelated, ignoring this fact leads to a wrong significance assessment and false discovery rates. The fact that the synthetic data used in the paper are stationary just means that they were not suitable to test the algorithm (and this is why in point 2 of my original review I asked the Authors to test the algorithm on more realistic time series). I am very surprised that the Authors decided to neglect this fundamental request, both because of its major importance and because it can be easily fixed (as explained in point 1 of my original review).

As side note, a significance value above 0.05 means that the result is non-significant. Increasing the alpha value (that is the threshold for significance) can not solve problems of low statistical power.

2) The purpose of the synthetic data is exactly to reproduce all characteristics of the biological data which might affect the reliability of the algorithm. The algorithm is presented as a general methodology to test cell assemblies and not to test exclusively retinal ganglion cells (if it was explicitly aimed to test retinal ganglion cells, this should be clearly stated and the Authors would still have to change their synthetic data, since the data in Fig. 2 don’t have the same characteristic of autocorrelation of those shown in Fig. 5A). Of course I understand that all algorithms have some limitations, but such limitations have to be tested and addressed in the manuscript to inform future users. In this manuscript this assessment is very poor and misleading.

3) Again, the limitations of the method have to be tested and described. I might have missed something but to my understanding it has not been studied how the assembly discovery rate varies as function of the assembly core size. Clarifying this point will be of great relevance for future users who might interpret why they can not find assemblies if the number of core cells is much smaller of the whole population number (if this is true, it should be indeed be tested).

4) Thanks for the clarification. Since the threshold for ensemble selection comes from the average pairwise correlation of the whole population, would this create a problem if e.g. all units of the tested sample take part in one global-assembly? Would this whole population-assembly be detected?

6) I had understood that the presented method does not rely on any event related information. Point 6 of my old review referred to the Author’s sentence “Despite the usefulness of these methods in their context, our analysis is more general. It relies on grouping the population spiking patterns with no event-related information …” This sentence hint at the fact that the majority of the available methods use event-related information. This is not true, since the majority of cell assembly detection methods indeed do not. I suggest correcting this sentence accordingly.

8) All the free parameters used should be available to the reader without forcing him to dig into the code.

Reviewer #3: The authors followed all my suggestions and answer my concerns, including their changes in the new version of the manuscript.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Luis Carrillo-Reid.

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 2

Jonathan David Touboul

30 Apr 2021

Scalable and accurate method for neuronal ensemble detection in spiking neural networks

PONE-D-20-33389R2

Dear Dr. Herzog,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jonathan David Touboul

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Jonathan David Touboul

22 Jul 2021

PONE-D-20-33389R2

Scalable and accurate method for neuronal ensemble detection in spiking neural networks 

Dear Dr. Herzog:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jonathan David Touboul

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Code repository, retinal experiments, and data pre-processing.

    Link to the Codes repository and the details of the retinal experiments with their respective pre-processing.

    (PDF)

    S1 Table. Method parameters with their description and default values.

    (PDF)

    Attachment

    Submitted filename: PlosOne_Reply_to_reviewers.pdf

    Attachment

    Submitted filename: PlosOne_Ensembles_reviewer_response_R2.pdf

    Data Availability Statement

    The GUI, codes, and data required for the use and validation of our method is available on GitHub (https://github.com/brincolab/NeuralEnsembles).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES