Abstract
Granger causality is a commonly used approach for network inference in neural systems. Recent advances in the field allow for the analysis of high-dimensional and nonlinear systems through the use of artificial neural networks, but the formulations are optimized for continuous data. In this work, we show the limitation of this formulation for discrete count data, particularly when the data are sparse. To overcome this limitation, we extend Jacobian Granger causality, a neural network-based approach to Granger causality, to other data types, namely count data and binary data, through the use of different loss functions. We examine its performance compared to a competing approach through the use of simulated data and finally apply it to real neural spiking data recorded from monkey visual cortex when presented with white noise and natural movie stimuli. We found that the natural movie leads to a more structured activity with a larger set of edges shared over two separate observations, and more neurons inferred with positive self-connection, whose burst-like activity has been associated with the encoding of salient visual information, which is present in natural scenes.
Keywords: Granger causality, Time series, Machine learning
Subject terms: Biophysics, Mathematics and computing, Physics
Introduction
In the analysis of neural data, one of the aspects of interest lies in the connectivity between neurons. As experimental methods continue to progress, neural datasets become increasingly high-dimensional, thereby requiring approaches that can efficiently and accurately process such data while also inferring salient properties, such as whether the connection is excitatory or inhibitory. While neural connectivity was traditionally analyzed by measuring functional statistical dependencies through methods such as cross-correlogram1, the bivariate nature of the analysis allows the potential for common and indirect causes to confound the analysis. To sidestep such an issue, multiple approaches have been applied for network inference in neural data, including dynamic Bayesian networks2–5, transfer entropy6,7, and Granger causality8–10. Other approaches for causal analysis also exist (e.g. see11), which will not be discussed here.
Bayesian networks12 infer causal graphs through conditional independencies between variables in the system, resulting in directed acyclic graphs. Dynamic Bayesian networks13 (DBNs) extend Bayesian networks by incorporating time-lagged interactions, which sidestep the constraint in the latter, requiring the resulting network to be acyclic. Also, DBNs typically model interactions at a fixed lag of 1 timestep14, which prevents the identification of interactions at larger lags. In addition, due to the need for modeling high dimensional probability distributions, DBNs are usually affected by the curse of dimensionality, requiring data size exponential to the dimension of the data being analyzed, although score-based and greedy search-based approaches have been developed to sidestep this problem14.
Transfer entropy15 measures information flow between components to infer connections, which accounts for nonlinear interactions. While the original formulation is bivariate, multivariate extensions have been proposed to allow more reliable inference of network structures16,17. This approach seamlessly incorporates higher order lags, allowing time lag selection directly from the application of its approach. Like DBNs, however, it also typically suffers from the curse of dimensionality due to its modeling of high-dimensional probability distributions18,19.
Granger causality20 measures predictive causality, where a candidate variable is Granger causal to a target variable if the inclusion of the candidate variable improves the prediction of the target variable in the presence of information from other variables in the system. Under certain settings, such as for variables with Gaussian noise, it has been shown to be equivalent to transfer entropy21,22. The standard formulation of multivariate Granger causality typically involves the use of linear vector autoregressions, which allows seamless time lag selection but restricted to linear modeling. Recent years however saw increased interest in a nonlinear formulation of Granger causality through the use of neural networks23–29, which presents several advantages. Firstly, the neural network-based model allows the modeling of any nonlinear function by virtue of the universal approximation theorem30. Despite the flexibility, these approaches have also been shown to work with high-dimensional data without the need for exponentially large datasets. Lastly, Granger causality in these approaches is typically inferred through the use of sparsity-enforcing penalties23,24,26–29 resulting in sparse inferred networks, which agree well with what would be expected from neural data. Due to these advantages, we focus on Granger causality in this work.
Despite the many uses of Granger causality, its original formulation has been largely restricted to continuous data31, which is not compatible with neural spike data made up of discrete counts. While smoothing methods have been applied to spike train data32 to account for this, the resulting distortion to the data reduces the accuracy of the resulting inferred network. In recent years, however, more attention has been given to Granger causality for point processes9,10 and heterogeneous data as a whole33,34. These studies generally involve a generalized linear formulation of vector autoregression involving different link and loss functions according to different assumptions on the underlying distribution behind the generating model, allowing the model to work with diverse data types including continuous data, discrete count data (i.e. non-negative integers), as well as categorical data. While there has been work in this area, it remains the case that the formulation either fixes a certain functional form (usually linear)9,10,33,34, or faces difficulty in high-dimensional cases31.
Similar to Granger causality as a whole, the aforementioned neural network-based approaches are also formulated for continuous data. Here in this paper, we focus on one of such methods, namely Jacobian Granger causality (JGC), and extend its formulation towards count and binary data through the use of different loss functions in the same way as the aforementioned generalized linear approaches, in order to infer the connectivity behind spike train data.
The inference of neuronal networks in neural spike train data necessitates the ability of the approach to handle high-dimensional data efficiently, while also working well with sparse data. While the former has been established for JGC29, we push this further in this work and also explore a feature absent in continuous data and unique to count data, namely sparsity, where a significant fraction of the data is made up of zeros. We see later that the latter is especially problematic for the mean squared error loss used in the original JGC formulation. The application of JGC (and by extension any similar neural-network based approach) therefore necessitates the use of a different loss function more suitable for such data types.
This paper is organized in the following manner. We begin by reviewing the original formulation of Jacobian Granger causality (JGC), after which we discuss other loss functions beyond what was originally formulated for JGC: those specialized against specific data types, namely Poisson loss for count data, binary cross-entropy for binary data, as well as the hurdle loss which generalizes the linear hurdle model35,36 for highly sparse data. We then demonstrate a new graphical approach to identify under-regularized and over-regularized regimes in JGC, allowing the determination of a balanced regularization regime that allows an unambiguous identification of potential Granger causal variables. With this, we then test the different loss functions in simulated branching process data, demonstrating the value of loss functions optimized for discrete data in comparison to the regular mean squared error, particularly for sparse data. Subsequently, we compare JGC with two existing algorithms developed for count data, namely adaptive Granger causality (AGC)10 and heterogeneous Granger causality (HMML)34 using simulated spike train data, evaluating also how well JGC is able to correctly identify the sign of each interaction, i.e. whether it is excitatory or inhibitory. Lastly, we apply JGC to real spike train data from monkey visual cortex and discuss its findings and implications in detail.
Methods
Jacobian Granger causality
Jacobian Granger causality (JGC) considers a general autoregressive perspective to Granger causality with the evolution of a target variable
written as:
![]() |
1 |
where
represent variable
for
up to some pre-determined maximum tested time lag
, and
representing zero-mean noise. We note that in the original formulation of JGC, contemporaneous variables (variables with zero lag from the target variable, i.e.
) are included in
for systems that can be modeled as trajectories evolving under differential equations. As this generally does not apply to count data nor neural spike trains, we exclude them in the model in this work.
From (1), we can define Granger non-causality of some variable
on target variable
at a time lag
if for all
:
![]() |
2 |
with
representing
at all time lags except
. This defines a variable
to be not be Granger causal to a target variable
at lag
if the evolution function of
is invariant of
for all t. In this work, we refer to these Granger non-causal variables as irrelevant variables. By definition, a variable is a Granger cause if it is not irrelevant.
In practice, the definition above is typically approached by first finding a way to measure the importance of each variable at each given time lag in
followed by some variable selection procedure to infer the Granger causal variables based on the variable importance scores. The resulting inference is at the level of individual time lags, which therefore incorporates time lag selection for each inferred Granger causal interaction. The resulting Granger causal network can then be drawn by drawing directed edges from the Granger causal variables to the corresponding target variables.
In JGC,
in the above equations is modeled using a feedforward neural network, where the first hidden layer is wired one-to-one from the input layer, and each variable at each time lag
is fed into separate input nodes in the neural network (Fig. 1). After training, the importance of each variable
is measured using the Jacobian:
![]() |
3 |
Due to the use of time-series data, this computation yields a time-indexed vector of derivatives for each
. We can then condense it into a single summary statistic for variable importance by taking the absolute mean value
, which carries the assumption that the Granger causal structure is stationary over the entire time series.
Fig. 1.
The neural network architecture for JGC: an MLP with one-to-one connections in the first hidden layer followed by subsequent layers of dense connectivity. In this paper, the network is trained using the Adam algorithm with the default learning rate at 1e-3, which we found to have sufficiently good performance across all test cases.
In the original formulation, JGC is trained with the following loss function:
![]() |
4 |
where the error function
used is the mean squared error,
denotes the weights of the n-th layer of the neural network, while
and
are both regularization parameters.
actively selects for salient variables in the first hidden layer by enforcing sparsity, while
serves to prevent overfitting in the subsequent layers. Due to the relatively minimal direct contribution of
in variable selection,
is fixed to a small value (0.01).
Granger causal variables are subsequently selected in a two-part procedure29, respectively testing for significance and consistency. Significance is tested under the assumption that the system contains a mixture of irrelevant variables and Granger causal variables, where the latter is relatively sparse compared to the former. Due to L1 regularization, the estimated signed importance score
for irrelevant variables are clustered together with small values near zero, which we assume to be Gaussian (Fig. 2). The minimum covariance determinant estimator37 is then applied to the z-score of the mean Jacobian values to estimate the mean and variance of this Gaussian null distribution, following which we impose the criterion that a variable is considered significant if its mean Jacobian is at least 1 standard deviation away from the mean of the null distribution. The criterion here is deliberately less stringent to prevent the exclusion of weak causes; the subsequent test would serve to minimize the false identification of causes.
Fig. 2.
The Jacobian as a function of time (a–c) and the corresponding distribution of the mean Jacobian (d,e,f) at different regularization regimes: (a,d) under-regularization, (b,e) balanced regularization, (c,f) over-regularization. All plots have the same target variable, and the different curves in (a–c) and data points in (d–f) correspond to different candidate causes. These plots are obtained from data in the simulated spike train section.
The subsequent consistency test seeks to find the largest stable set of variables with consistently high importance scores, which then requires multiple independent runs of the full procedure (with independent initiation and training of neural networks); in our case we find 3 independent runs to suffice. With these three replicates, the consistency test then begins with the corresponding three sets of significant variables. The three sets are then iteratively trimmed by removing all variables not in the intersection of the three sets, as well as all variables whose importance scores are lower than any of the removed variables. The latter enforces the assumption that removed variables (which do not contribute consistently) are not Granger causes, and therefore all other variables with importance scores weaker than these removed variables are also not Granger causes. This is repeated until convergence, and the variables remaining in the final convergent set are taken to be the Granger causal variables. The interaction sign of each Granger causal variable can then be extracted from the sign of the corresponding estimated signed importance score
. The full algorithm describing this procedure is given in Algorithm 1.
Algorithm 1.
Inference of consistent variables
The formulation of JGC requires a separate neural network for each target variable. This allows two benefits: the regularization parameter
can be individually tuned for each target variable separately, and the loss function can also be adapted to the specific data type of each target variable, which allows the analysis of heterogeneous data. In particular, we seek to replace the mean squared error term (
in (4)) with other loss functions for other data types. We note that the other procedures in JGC, including the two-part variable selection procedure (i.e. the significance and consistency tests), remain the same; only the loss functions are changed according to the data type of the target variable.
Loss functions for heterogeneous data types
JGC in its original formulation uses the mean squared error loss function. In this section, we discuss other loss functions that cater to count and binary data. In addition, we discuss a different formulation for sparse count data that may potentially present an improvement over the other loss functions, and subsequently compare these formulations.
We begin by noting that the mean squared error in the original loss function (4) can be interpreted as a maximum likelihood estimation given the assumption that the conditional distribution of the output given input is Gaussian with constant variance38. This may present a problem when the true output distribution significantly deviates from this assumption, although to what extent this is a problem is unclear due to the high flexibility of neural networks. In the same manner as generalized linear models, we can construct different loss functions by starting with different fundamental assumptions on the generating distribution.
Poisson loss for count data
The most common loss function used for discrete data assumes the Poisson distribution. Other than the discrete assumption, another significant difference it has to the Gaussian assumption of mean squared error is that its variance is non-constant but scales with the mean, and that the distribution is asymmetric at small mean values. In the same manner as Poisson regression in generalized linear models, the Poisson loss for a given observation
can be computed by minimizing its negative log-likelihood to obtain39:
![]() |
5 |
where
is the mean Poisson rate, which in this case is what we seek to model using the neural network.
In the neural network implementation, the activation function of the output neuron is set to be the exponential function to encode the non-negativity constraint in count data.
Binary cross-entropy for binary data
Binary data is typically modeled using the Bernoulli distribution, and minimizing its negative log-likelihood leads us to the binary cross-entropy loss function, which is a standard loss function used for binary classification tasks in neural networks as well as logistic regression. Given a binary observation
and an output probability
, the binary cross-entropy is given by39,40:
![]() |
6 |
To enforce the probabilistic interpretation of the output
, the activation function of the output neuron in this case has to be set to the sigmoid function, thereby keeping the output value in the (0, 1) range. We note that as we shall see subsequently, it is possible to encounter count data that are effectively binary (i.e. taking values no larger than 1) when it is very sparse. If this loss function is used for count data that is approximately binary (i.e. very low likelihood of observing counts larger than 1), the data has to be binarized prior to neural network training.
Hurdle loss for sparse count data
In the case of sparse count data, the prevalence of zeros may potentially lead to suboptimal representations. We seek to minimize this impact by introducing a separate component to the model for the zeros. In the modeling of discrete data, there are two common modeling approaches for this41, namely the zero-inflated model42 and the hurdle model35,36.
In this paper, we focus on the hurdle model, which is a mixture of a Bernoulli distribution and a truncated count distribution, the latter being truncated at zero. As before, we model the count distribution using the Poisson distribution. The Bernoulli component models the two states of zero and nonzero values, while the Poisson component models the nonzero integers. Given an observation of the target variable
, we define an indicator variable
and denote
as the parameter of the Bernoulli component and
as the mean rate parameter of the Poisson component. We can therefore write the likelihood function of the hurdle Poisson model as35,36,41:
![]() |
7 |
Minimizing the negative log-likelihood and removing all terms independent of the parameters yields:
![]() |
8 |
The standard hurdle Poisson model models
through a generalized linear model of covariates with the appropriate link functions for respective distributions. Here, we generalize this by letting
be any nonlinear function of the covariates through the use of neural networks. One significant difference from the original hurdle Poisson model is that in the original case, the
and
are separately modeled with different weights. Here we model these two quantities with weight sharing, which allows the shared weights to learn from both Poisson and Bernoulli perspectives. In particular, the neural network weights are shared in all layers except the final one. Subsequently, for a given target variable
, the importance of each variable
is then measured via the Jacobian
.
Selection of
The
term in the loss function (4) serves to control the sparsity of the representation, which naturally has to be tuned to optimally represent the system. While this tuning is typically done by examining predictive performance in a held-out dataset, this is a challenge for variable selection as it is known that a lower prediction error needs not necessarily correspond to a more accurate set of selected variables26,43. Also, while relative stability of the variable selection performance relative to
was observed for the original formulation of JGC29, we see later that the same is not true when other loss functions are used. Due to that, a criterion for selecting
is required when these other loss functions are used. In this section, we describe a new graphical approach for identifying different regularization regimes and introduce a heuristic for
selection that makes use of these characteristics.
We begin by noting that a severely over-regularized regime can be readily identified by inspecting the plot of the Jacobian as a time series. As we see in Fig. 2c,f, severe over-regularization leads to the dropping of all input variables, thereby compressing the Jacobian into constants of very small magnitudes. The under-regularized regime, however, is not as obvious. Even then, we can see in Fig. 2a,d that when under-regularized, irrelevant variables are readily used for the prediction of the target variable leading to a stronger mixing among the null and non-null components in the mixture, causing the mean Jacobian as a whole to approach a Gaussian shape. In other words, if there is no apparent separation between potential causes and irrelevant variables, then it is likely the case that the applied regularization is too weak. While optimizing
may be a difficult task, these observations could, at the very least, guide the user towards finding a good enough value for
in between the over- and under-regularized regimes, such as that seen in Fig. 2b,e. In particular, we note the clear separation between irrelevant and significant variables in Fig. 2e, which allows even identification by eye. While Fig. 2f appears to also have this property, the very small magnitudes imply that all variables are insignificant.
In our experiments, we begin by running JGC over a range of
values. We then tested against over-regularization by only accepting values of
whose Jacobian has at least a single occurrence of value with magnitude
over all time t. We highlight that this threshold of 0.01 was experimentally verified in all our experimental data by observing the Jacobian values using different values of
covering the 3 regimes. In all our tested datasets, over-regularized Jacobian values are all
while balanced and under-regularized Jacobian values have at least one value
for some t. Figure 2 is representative of this observation across all our datasets. In practice, therefore, we suggest doing similar observational tests on the dataset being studied before deciding on a suitable Jacobian threshold for the over-regularized regime. We additionally present an argument for why over-regularization shrinks the Jacobian in the Supplementary Material.
Lastly, we selected from the remaining
values the one with the largest number of Granger causal variables. Where multiple
values tied with the same number of selected variables, we chose the largest
.
Experimental results and discussion
In this section, we test the performance of JGC on two simulated datasets, focusing on the correct identification of the full Granger causal network, and subsequently apply JGC to one real dataset. We label the different tested loss functions by the assumed underlying distribution, with the exception of the hurdle model as it involves a mixture of distributions, therefore the mean squared error and binary cross-entropy are labeled as Gaussian and Bernoulli loss respectively. We make use of up to two separate metrics in our analyses, namely the area under the precision-recall curve (AUPRC) and the F-score. Both measures are similar in that they involve the same quantities: precision (
false discovery rate) and recall (
false negative rate), and are bounded by [0, 1] where a larger score implies a stronger performance. On the other hand, AUPRC focuses on the raw importance scores while F-score deals with the final variable selection results. AUPRC measures how well the importance scores are able to distinguish between the true Granger causal variables from the irrelevant ones, and achieves a perfect score of 1 if a threshold could be found that perfectly separates the two groups. If not, it measures how good a theoretical best threshold is, based on how the balance between precision and recall is impacted. In contrast, the F-score simply looks at a specific variable selection outcome, and computes the harmonic mean of the resulting precision and recall. With these two metrics, we can separately measure the performance of the Jacobian as an importance score, as well as the combined performance with our variable selection procedure.
We begin by simulating a simple additive branching process in both sparse and non-sparse regimes to characterize and compare the different loss functions on JGC underlying different fundamental assumptions on the data type. Subsequently, we consider a more complex simulated dataset of neural spike trains and compare the performance of JGC against two existing approaches, AGC and HMML. In both experiments, we repeated the Granger causal inference 3 times with different data and present the aggregated results. Lastly, we apply JGC to real spike train data of monkey visual cortex.
Further information on experimental hyperparameters for JGC when applied to each of these datasets can be found in Table S1. The source code for JGC and the simulated data used in this paper can be found on https://github.com/suryadi-t/Discrete-JGC. The real spike train data can be obtained by following the citations given in the beginning of its section.
Additive branching process
In a typical branching process, we consider a network of nodes with active and passive states such that an active node i has some branching probability
of causing a downstream connected node j to be active in the next timestep. This typically involves binary states (active and inactive), however in this simulation, we allow the activity to be additive: each “activation” adds
value to a node (from a baseline value of 0), and multiple activations can occur to a single node in any given timestep, thereby allowing any nonnegative integer states, generalizing the branching process towards count data. Mathematically, we can represent the activity
of node j with parent nodes
as:
![]() |
9 |
To prevent runaway activity, we set it such that activations larger than 1 in a given node have the same probability of stimulating a downstream node as activation valued at 1. At the end of each timestep, the value for each node is reset to 0, and can be activated again in the next timestep.
Here we generated random directed networks of 50 nodes with probability
of there being an edge between any two distinct nodes. At any given timestep, each node has a random chance to gain activity with a Poisson mean rate
. To ensure stability, we set the branching probability such that
where
is equally divided among the incoming connections. For this system we generated two different cases (Fig. 3): one which is non-sparse (
) and one which is sparse (
). These two cases have mean activity in the order of magnitude of 1 and 0.01 respectively.
Fig. 3.
Sample data and results for one of the experiments on the additive branching process: (a,b) sample time series of one variable, (c,e) the true adjacency matrices, (d,f) the false discovery rate (FDR) and false negative rate (FNR) computed from the resulting adjacency matrix inferred by JGC. (a,c,d) correspond to the non-sparse case while (b,e,f) correspond to the sparse case.
Figure 4 shows the AUPRC and F-score of each approach achieved at different values of the regularization parameter
on both the non-sparse (Fig. 4a,b) and sparse (Fig. 4d,e) data, with numerical values given in Table S2. In the non-sparse case, we immediately observe that the Gaussian loss has greater stability with respect to
while the other losses are much more sensitive to
. This suggests that one should consider biasing
to smaller values when using the non-Gaussian loss functions. The highest scores are comparable among the different methods with the exception of the Bernoulli loss, which makes sense as it requires the binarizing of the data, which entails a significant loss of information for the non-sparse case. We also see that in the non-sparse case, the Gaussian loss performs very well despite its continuous assumption. While the hurdle model at its best does not outperform Poisson and Bernoulli losses, we observe greater stability in the results with respect to
.
Fig. 4.
The AUPRC (a,d) and F-score (b,e) attained by JGC using different loss functions for both the non-sparse (a,b,c) and sparse (d–f) realizations of the additive branching process (section), as a function of the regularization parameter
. (c,f) shows the F-score achieved by JGC by selecting
according to our proposed heuristic in section, with the bars given in the order: Hurdle, Poisson, Bernoulli, Gaussian. The exact numbers are given in the Supplementary Material.
The limitation of the Gaussian loss is immediately apparent in the sparse case. Here, the three non-Gaussian losses perform very similarly. In particular, Bernoulli loss performs comparably well to Poisson and hurdle losses due to the rarity of values larger than 1 in this sparse case, in which case binarizing the data (which is needed for the Bernoulli loss) does not lead to much loss of information.
Lastly, we also performed our
selection heuristic and plot the resulting F-scores in Fig. 4c,f (numerical values given in Table S3). Here we emphasize that
is independently selected for each target variable, as the optimal
should in general differ due to different number of Granger causes and weights of each Granger causal relation. We see that the selection heuristic performs well in both non-sparse and sparse cases, where the resulting F-scores are not very far from the peaks shown in Fig. 4. Here we observe no clear advantage of using the hurdle loss over the other loss functions.
Simulated neural spike train
Here we test the performance of JGC with its different loss functions on simulated neural spike train data9, where the simulation contains a network of 9 self-inhibiting neurons with a mix of excitatory and inhibitory connections to other neurons (Fig. 5). We compare the performance with existing methods for count data, namely AGC10 and HMML34 at three different data lengths T of 500, 1000, and 2000 in Fig. . Activity is very sparse in this dataset, where each neuron is active less than 3% of the time, and the activity is binary in this case. For this dataset, we compare the result of the full pipeline, i.e. for JGC we perform the
selection heuristic, and compare the inferred Granger causal variables using the F-score. We see in Fig. 6a that the results for JGC correspond well to the sparse case of Fig. 4d–f, where the Gaussian loss performs much worse than the remaining three losses that perform very similarly. We also see that the top 3 loss functions in JGC significantly outperform both AGC and HMML.
Fig. 5.

Sample data and results for one of the experiments on the simulated neural spike train data: (a) raster plot of the system, (b) the true adjacency matrix, (c) the adjacency matrix inferred by JGC.
Fig. 6.

Comparison of the different JGC loss functions, AGC, and HMML on simulated neural spike train data (section) at different time series lengths T: (a) F-score, (b) adjusted sensitivity, (c) false discovery rate (FDR), (d) false negative rate (FNR). The adjusted sensitivity is computed as a ratio of the number of correctly identified interaction signs among the inferred edges that exist in the ground truth. As we are unable to extract interaction sign from HMML, it is excluded from (b). The exact numbers are given in the Supplementary Material.
We further unpack this difference by inspecting the individual differences in the false discovery rate (FDR) and false negative rate (FNR) in Fig. 6c,d. We note that we use FDR instead of false positive rate (FPR) here because in systems like this with sparse true positives, FDR serves as a stricter and more representative measure of false positives compared to FPR. In comparison with HMML, we see that JGC with hurdle, Bernoulli, and Poisson losses have similar FDR values, which are relatively high due to the sparse data and short time series length used, which we recall has less than 3% active bins. JGC with the three losses have significantly lower FNR compared to both AGC and HMML, and it is this lower FNR that explains the significant difference in the F-scores between JGC and the two approaches. While we do observe AGC and JGC with Gaussian loss to have significantly lower FDR than the other JGC loss functions, this is not a strength as it led to much higher FNR comparable to HMML.
As this network contains a mixture of excitatory (positive) and inhibitory (negative) connections, another aspect of interest is in whether JGC is able to correctly identify the interaction sign. To properly investigate this property, we consider only the inferred connections that exist in the ground truth network and compute the fraction of these connections that has the correct sign inferred. We denote this quantity as the adjusted sensitivity. Figure 6b shows this score for only JGC and AGC as we are unable to extract the sign of the inferred connections from HMML. As Fig. 6b indicates, JGC is generally reliable in inferring the sign when the inferred connection exists in the ground truth, around 90% of the time in this simulated data. While the Gaussian loss may appear to have higher adjusted sensitivity, we note that this is due to the much fewer correct inferred interactions (as is also hinted by the high FNR). In other words, as the Gaussian loss could only infer the true connections that are easier to discover, the signs are also easier to infer correctly.
Real spike train data in monkey visual cortex
Here we apply JGC with the Poisson loss on real spike train data44,45. In this dataset, neural spiking activity was measured in anesthetized macaque primary visual cortex using
multi-unit electrode arrays with 400
m spacing, with spike sorting performed after measurement to identify distinct neurons underlying the measured activity. In this section, we make use of the evoked dataset, which involves repeated measurements on two different monkeys with the microelectrode array position intact over different repetitions and stimuli presentations. We analyzed separately the first two repeated measurements on each monkey presented with two stimuli: a movie of white noise (30 s) and a movie of a contiguous natural scene (a monkey wading through water, 30 s). In each case, the data was binned at time bins of 5 ms, leading to 6000 time bins in each dataset. Activity is sparse in all cases, with
active bins in the white noise case and
active bins in the natural movie case. Figure 7 shows a sample raster plot and the firing rate distribution for all the datasets studied in this work.
Fig. 7.
Raster plot and descriptive statistics for the real spike train data: (a,b) Raster plot for the first experiment on monkey 1 with (a) white noise and (b) natural movie. The experiment is 30 seconds long, but the raster plots are only plotted to 6 seconds for better resolution. (c) Shows the descriptive statistics for each monkey with white noise and natural movie stimuli. Each group has 2 box plots, one for each experiment.
As the data contains counts larger than 1, and as we did not observe any major benefit in using the hurdle model from the earlier experiments, we analyzed this dataset using JGC with Poisson loss. The sensitivity of the results with respect to
is given in Fig. S1.
Figure 8 displays the excitatory component of the two resulting networks inferred using JGC from the first measurement of monkey 1 presented with the white noise and natural scene respectively, with the corresponding adjacency matrices given in Fig. 9a,b. The corresponding networks and adjacency matrices for the other monkey and measurements are given in Figs. S2 and S3. We focus the presentation in this figure on the excitatory component of the network (i.e. omitting the inhibitory connections) as there is a clear contrast between the two in this aspect, where the white noise case presents a relatively dense network and the natural movie case displays a much sparser and more modular network. The same pattern is observed also in the second experiment (Fig. S2a-b), suggesting an overall more structured neural activity pattern, at least in the excitatory circuit, in response to a natural visual stimulation. This contrast however is absent in the second monkey for both experiments (Fig. S2c-f). This suggests the individual subject as a potential source of variance, or possibly the experimental setup itself (where the neurons probed between the two monkeys may be quite different in function), or that JGC is unable to capture this aspect of the structure in the second monkey. It is worth noting that the number of neurons measured in monkey 2 (
) is almost twice that of monkey 1 (
). Despite so, we later see that a measurable difference nevertheless exists between the white noise and natural movie case, even for monkey 2. In subsequent network analyses, we include the full network into consideration, i.e. with both excitatory and inhibitory components.
Fig. 8.
The excitatory component of the inferred functional networks for the first measurement of the primary visual cortex in monkey 1 when presented with the stimulus (a) white noise; (b) natural movie. The color of the nodes labels the inferred communities, with the isolated grey nodes indicating nodes not in any community.
Fig. 9.
Details on the inferred networks. (a,b) Inferred adjacency matrices for the first measurement of the primary visual cortex in monkey 1 (section) when presented with the stimulus (a) white noise; (b) natural movie. Rows represent source nodes and columns represent target nodes. The gray areas represent the absence of edges. We observe a sparser network in (a) relative to (b), while (b) shows a larger number of self-excitatory connections along the diagonal. (c–j) Joint distribution of the in-degree and out-degree in each network. (c–f) monkey 1, (g–j) monkey 2; (c,d,g,h): white noise, (e,f,i,j) natural movie. In each monkey-stimulus pair, the plots are ordered from experiment 1 to 2. The colorbar measures the frequency of each bin.
We next examine the degree distributions in the inferred networks. Generally, for both the in-degree and out-degree (Figs. S4 and S5), we observe a preference for low degree (typically less than 5), with a small probability of higher degrees. In all cases, we do not observe power law distributions (likelihood ratio tests were performed using the powerlaw package46, and in all cases the power law distribution is not preferred over the exponential distribution,
), hence there is no evidence of scale-free property in these networks. While this agrees with the study suggesting that scale-free networks are empirically rare47, we nevertheless note that the lack of scale-free property here could potentially be due to finite size effects48, as the sizes of the analyzed networks are not large enough to cover several orders of magnitude in the degree distribution. We additionally examined the joint distribution of both the in- and out-degrees (Fig. 9c–j) and interestingly, we observe that there appears to be a distinct preference for low in-degree with a wider spread of the out-degree, indicating that neurons in this dataset may generally have more outgoing connections than incoming ones.
Here we investigate the possibility for small-world property49 in these networks, which is characterized by low average shortest path lengths and high average clustering coefficient relative to random networks. For directed networks, these measures can only be computed if the network is weakly connected (i.e. there exists a path between any two neurons in the undirected version of the network), hence we applied these measures to the largest weakly connected component of each inferred network. For each inferred network, we compared the two measures against 500 weakly connected random networks of the same size and sparsity (Fig. S6) and found that generally, with one exception, there is no significant difference for the two measures among the inferred networks and random networks, suggesting that these inferred networks do not exhibit small-world property.
Table 1 summarizes the information on the connectivity of the inferred network among distinct neurons, which excludes any inferred self-connections. It is notable that the resulting networks from the same monkey-stimulus pair have similar number of edges across the two experiments. The relatively small number of edges also means that there is consistent sparsity across all cases, with
of all possible pairwise connections being present. Another interesting point of note is that among distinct neurons, there appears to be significantly more inhibitory connections than excitatory ones.
Table 1.
Summary of information on the monkey visual cortex dataset as well as the resulting network from JGC analysis on edges between distinct neurons.
| ID | Neurons | Stimulus | Edges (% Sparsity) | (+,-) | Shared | % Edges Shared | |||
|---|---|---|---|---|---|---|---|---|---|
| Expt1 | Expt2 | Expt1 | Expt2 | Edges | Expt1 | Expt2 | |||
| Monkey 1 | 74 | Noise | 302 (5.59%) | 327 (6.05%) | 114, 188 | 162, 165 | 26 | 8.61% | 7.95% |
| Natural | 364 (6.74%) | 340 (6.29%) | 55, 309 | 83, 257 | 46 | 12.64% | 13.53% | ||
| Monkey 2 | 123 | Noise | 403 (2.69%) | 377 (2.51%) | 127, 276 | 103, 274 | 21 | 5.21% | 5.57% |
| Natural | 433 (2.89%) | 406 (2.71%) | 134, 299 | 126, 280 | 35 | 8.08% | 8.62% | ||
The “Neurons” column indicates the total number of observed neurons in the system, which indicates the network size N. Under the “Stimulus” column, “Noise” refers to white noise, while “Natural” refers to a movie of a natural scene, both of which were presented for 30 s. Sparsity here is computed as the fraction between the number of inferred edges and the total number of possible pairwise edges between distinct neurons (for a network of size N, this number is given by
). The (+,-) column breaks down the inferred edges into excitatory and inhibitory connections respectively. This table analyzes connections between distinct neurons; the corresponding analysis on self-connections is found in Table 2.
Between the two types of presented stimuli, we expect white noise to lead to less structured neural activity compared to the natural movie. We sought to explore this idea by inspecting the commonality between the different measurements of the same monkey-stimulus pair. One striking observation is that in all cases, including the natural movie stimulus, only a small percentage of edges are inferred to be shared between the measurements of experiment 1 and 2. These numbers are shown in the “Shared Edges” column of Table 1. This is particularly interesting, especially since the number of edges do not significantly differ among the two measurements. This suggests that while trial-to-trial variability may exist in the functional network, the density of the network remains at a similar level.
While the fraction of shared edges is consistently small across the repeated experiments in all cases, we nevertheless observed a notable difference between the two stimuli: the percentage of edges present in both measurements is significantly higher (about 50% higher) in the natural stimulus case compared to white noise in each monkey. This agrees with our earlier intuition that the natural stimulus should lead to a more structured functional network, which manifests here through a larger set of consistent connections compared to the white noise stimulus.
Next, we consider the corresponding analysis for self-connections in Table 2, where self-connection here refers to a Granger causal connection from a neuron to itself. As has been hinted in the adjacency matrices in Fig. 9a,b, one interesting observation is that the majority of the inferred self-connections are excitatory. This suggests the possibility of bursting dynamics, where neurons fire spikes repeatedly in a rapid-fire motion over a short timeframe, which has indeed been observed in the visual cortex50,51. We investigated this in Fig. 10, where for each dataset we segregated the data into neurons with positive self-connections inferred and the remaining neurons, denoted “non-positive”. In each group we computed the fraction of consecutive activity, i.e. the fraction between the number of times a neuron is active in two consecutive bins over its total number of active bins, and indeed, we see that the positive group on average has at least twice as many such observations (this difference is significant by Welch’s t-test with unequal variances,
in all cases). While the non-positive neurons also exhibit this behavior to a smaller extent, JGC does not infer self-connection if this could be better and sufficiently explained by the activity of other neurons.
Table 2.
Summary of information on the monkey visual cortex dataset and the resulting network from JGC analysis on edges for self-connections.
| ID | Neurons | Stimulus | Edges (% Sparsity) | (+,-) | Shared | % Edges Shared | |||
|---|---|---|---|---|---|---|---|---|---|
| Expt1 | Expt2 | Expt1 | Expt2 | Edges | Expt1 | Expt2 | |||
| Monkey 1 | 74 | Noise | 12 (16.22%) | 20 (27.03%) | 11, 1 | 20, 0 | 3 | 25.00% | 15.00% |
| Natural | 20 (27.03%) | 21 (28.38%) | 18, 2 | 21, 0 | 10 | 50.00% | 47.62% | ||
| Monkey 2 | 123 | Noise | 19 (15.45%) | 20 (16.26%) | 19, 0 | 18, 2 | 8 | 42.11% | 40.00% |
| Natural | 29 (23.58%) | 25 (20.33%) | 28, 1 | 24, 1 | 15 | 51.72% | 60.00% | ||
The columns are as defined in Table 1. As this table considers only self-connections, sparsity here is computed as the ratio between the number of inferred self-edges and the network size N.
Fig. 10.

Comparison of the average percentage of consecutive activity between neurons with (blue) and without (orange) inferred positive self-connection. The percentage consecutive activity was computed for each neuron as a ratio between the number of consecutive active time bins and the total number of active time bins. The bars for each monkey-stimulus pair are plotted in two sets, one for each experiment.
While neurons in the primary visual cortex, like other brain regions, are known to spontaneously burst52, bursting is also associated with the encoding of visual information in visual stimuli, such as orientation and spatial frequency53. Similarly, a study on awake monkeys54 showed that restricting the analysis on bursting (or otherwise very fast firing) neurons was necessary to produce a clear response map to stimuli, which was otherwise obscured by the background activity of other neurons. Given that the natural movie stimulus is likely to contain more structures that lie within the receptive fields of neurons in the primary visual cortex compared to white noise, our finding of a larger number of bursting neurons in the former is consistent with these studies.
Lastly, we also note that with regards to shared edges across different experiments, we observe the same situation in Table 2 as in Table 1: the natural movie case leads to a significantly larger set of shared edges, further supporting our intuition that the natural movie leads to a more structured activity pattern in the brain.
An important aspect to note of this dataset is that while the experiment was being performed, the monkeys were maintained in a state of anesthesia while eye movements were minimized by paralyzing the animal, both of which were done through continuous injections of chemicals45. In addition, the pupils were dilated and supplementary lenses were used to bring the retinal image into focus. This implies that any neural responses had to have either occurred spontaneously or by reflex in response to stimuli. The latter has been shown to be possible55, where neurons in the primary visual cortex in anesthetized macaques and spider monkeys were observed to respond to stimuli systematically according to the receptive fields of the respective neurons. In other words, visual processing in the primary visual cortex occurs even when the subject is unconscious.
The authors in45 noted that the impact of anesthesia on their study, which focused on correlations between observed neurons, should be minimal. Firstly, it was noted that sufentanil was used for maintaining anesthesia, which they claimed was often used in cortical studies and had no known discrepancy with data from awake animals. Their correlation studies, including those on orientation tuning, was also found to correspond well with other studies. The authors concluded that while it was not possible to rule out some influence of anesthesia, the activity should at least correspond well with awake animals at the level of correlations. Subsequent studies do suggest that opioid anesthesia, such as sufentanil used in this dataset, do impact measured activity; anesthetized responses were observed to be noisier with higher response variability56. Subsequent analyses suggest that this could be attributed to fluctuating global network states due to anesthesia56. This is independent of stimulation; it has been known that neural activity in the presence of stimulus constitutes a combination of ongoing simultaneous activity and evoked activity from the stimulus, where the ongoing simultaneous component could be as large as the evoked component57. These findings could explain why the number of shared edges between trials and stimuli are small. On the other hand, the evoked activity due to stimuli has some level of determinism57, although it is nevertheless influenced by the initial state of the network prior to stimulation57. This could explain both the presence of shared edges between trials of the structured stimulus (natural movie), and the fact that the number of shared edges is small, as the initial brain state across the 2 trials may not be identical.
In light of the fact that the application of anesthesia does affect neural measurements, our study nevertheless shows that even when anesthetized, the increase in bursting activity in the primary visual cortex does occur when stimulated; this aspect is therefore not removed by anesthesia. The full extent of the similarities and differences between the two states, anesthesia and awake, could only be fully elucidated through a thorough study comprising datasets of both states.
Conclusion
In this work, we extended Jacobian Granger causality (JGC) towards count and binary data through the use of Poisson, Bernoulli, and hurdle loss functions. We compared the different loss functions in JGC using the additive branching process as well as simulated neural spike train data, and observed in particular the limitation of the mean squared error loss in sparse count data. This limitation is overcome by the other loss functions discussed in this work, and we subsequently show through the simulated neural spike train data that JGC outperforms other competing approaches. These results also suggest the strength of the Poisson loss in sparse count data, with no significant benefit in using the hurdle formulation. Therefore, we applied JGC with Poisson loss on real data measuring the primary visual cortex of macaque presented with two types of stimuli (white noise and natural movie) over two repeated measurements. We observed that the inferred networks are sparse in all cases, and that while only a small number of edges are shared between the repeated measurements, the percentage of edges shared is significantly larger with natural stimulus compared to white noise, suggesting a more structured activity evoked by natural stimulus compared to white noise. In addition, we observed a larger set of neurons with positive self-connections in the natural stimulus case, whose burst-like activity is known to encode more visual information. This also agrees with our expectation that natural scenes provide a more meaningful visual stimulus, which has higher information content, compared to white noise.
Supplementary Information
Author contributions
S. and L.Y.C. designed the research; S. performed numerical studies and simulations; S. and L. Y. C. analyzed the results; S. and L. Y. C. wrote the manuscript; L.Y.C., Y. S. O. and S. reviewed the manuscript.
Data availability
The datasets generated and/or analyzed during the current study are available in the data repository with the following weblink: https://github.com/suryadi-t/Discrete-JGC/tree/main/Data. The real monkey brain data is available at https://crcns.org/data-sets/vc/pvc-11
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-33385-w.
References
- 1.Perkel, D. H., Gerstein, G. L. & Moore, G. P. Neuronal spike trains and stochastic point processes: II. Simultaneous spike trains. Biophys. J.7, 419–440 (1967). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. Reconstructing functional neuronal circuits using dynamic bayesian networks. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 5531–5534 (IEEE, 2008). [DOI] [PubMed]
- 3.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. G. On the use of dynamic Bayesian networks in reconstructing functional neuronal networks from spike train ensembles. Neural Comput.22, 158–189 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Patnaik, D., Laxman, S. & Ramakrishnan, N. Discovering excitatory relationships using dynamic Bayesian networks. Knowl. Inf. Syst.29, 273–303 (2011). [Google Scholar]
- 5.Chen, R. Causal network inference for neural ensemble activity. Neuroinformatics19, 515–527 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ito, S. et al. Extending transfer entropy improves identification of effective connectivity in a spiking cortical network model. PLoS ONE6, e27431 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Orlandi, J. G., Stetter, O., Soriano, J., Geisel, T. & Battaglia, D. Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging. PLoS ONE9, e98842 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Valdés-Sosa, P. A. et al. Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. R. Soc. B.360, 969–981 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kim, S., Putrino, D., Ghosh, S. & Brown, E. N. A granger causality measure for point process models of ensemble neural spiking activity. PLoS Comput. Biol.7, e1001110 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sheikhattar, A. et al. Extracting neuronal functional network dynamics via adaptive granger causality analysis. Proc. Natl. Acad. Sci. USA115, E3869–E3878 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Camps-Valls, G. et al. Discovering causal relations and equations from data. Phys. Rep.1044, 1–68 (2023). [Google Scholar]
- 12.Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).
- 13.Murphy, K. P. Dynamic Bayesian Networks: Representation, Inference and Learning (University of California, 2002).
- 14.Shiguihara, P., Lopes, A. D. A. & Mauricio, D. Dynamic Bayesian network modeling, learning, and inference: A survey. IEEE Access9, 117639–117648 (2021). [Google Scholar]
- 15.Schreiber, T. Measuring information transfer. Phys. Rev. Lett.85, 461 (2000). [DOI] [PubMed] [Google Scholar]
- 16.Sun, J. & Bollt, E. M. Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings. Physica D267, 49–57 (2014). [Google Scholar]
- 17.Wibral, M. et al. Measuring information-transfer delays. PLoS ONE8, e55809 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Runge, J., Heitzig, J., Petoukhov, V. & Kurths, J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys. Rev. Lett.108, 258701 (2012). [DOI] [PubMed] [Google Scholar]
- 19.Papana, A., Papana-Dagiasis, A. & Siggiridou, E. Shortcomings of transfer entropy and partial transfer entropy: Extending them to escape the curse of dimensionality. Int. J. Bifurc. Chaos30, 2050250 (2020). [Google Scholar]
- 20.Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 424–438 (1969).
- 21.Barnett, L., Barrett, A. B. & Seth, A. K. Granger causality and transfer entropy are equivalent for gaussian variables. Phys. Rev. Lett.103, 238701 (2009). [DOI] [PubMed] [Google Scholar]
- 22.Hlavácková-Schindler, K. Equivalence of granger causality and transfer entropy: A generalization. Appl. Math. Sci.5, 3637–3648 (2011). [Google Scholar]
- 23.Tank, A., Covert, I., Foti, N., Shojaie, A. & Fox, E. B. Neural granger causality. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021). [DOI] [PMC free article] [PubMed]
- 24.Khanna, S. & Tan, V. Y. Economy statistical recurrent units for inferring nonlinear granger causality. arXiv preprint arXiv:1911.09879 (2019).
- 25.Nauta, M., Bucur, D. & Seifert, C. Causal discovery with attention-based convolutional neural networks. Mach. Learn. Knowl. Extract.1, 312–340 (2019). [Google Scholar]
- 26.Marcinkevičs, R. & Vogt, J. E. Interpretable models for granger causality using self-explaining neural networks. arXiv preprint arXiv:2101.07600 (2021).
- 27.Banerjee, A., Pathak, J., Roy, R., Restrepo, J. G. & Ott, E. Using machine learning to assess short term causal dependence and infer network links. Chaos29, 121104 (2019). [DOI] [PubMed] [Google Scholar]
- 28.Banerjee, A., Hart, J. D., Roy, R. & Ott, E. Machine learning link inference of noisy delay-coupled networks with optoelectronic experimental tests. Phys. Rev. X11, 031014 (2021). [Google Scholar]
- 29.Suryadi Chew, L. Y. & Ong, Y.-S. Granger causality using Jacobian in neural networks. Chaos33, 023126. 10.1063/5.0106666 (2023). [DOI] [PubMed] [Google Scholar]
- 30.Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2, 359–366 (1989). [Google Scholar]
- 31.Shojaie, A. & Fox, E. B. Granger causality: A review and recent advances. Annu. Rev. Stat. Appl.9, 289–319 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sameshima, K. & Baccalá, L. A. Using partial directed coherence to describe neuronal ensemble interactions. J. Neurosci. Methods94, 93–103 (1999). [DOI] [PubMed] [Google Scholar]
- 33.Behzadi, S., Hlaváčková-Schindler, K. & Plant, C. Granger causality for heterogeneous processes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining 463–475 (Springer, 2019).
- 34.Hlaváčková-Schindler, K. & Plant, C. Heterogeneous graphical granger causality by minimum message length. Entropy22, 1400 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mullahy, J. Specification and testing of some modified count data models. J. Econom.33, 341–365 (1986). [Google Scholar]
- 36.Heilbron, D. C. Zero-altered and other regression models for count data with added zeros. Biom. J.36, 531–547 (1994). [Google Scholar]
- 37.Rousseeuw, P. J. & Driessen, K. V. A fast algorithm for the minimum covariance determinant estimator. Technometrics41, 212–223 (1999). [Google Scholar]
- 38.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
- 39.Bühlmann, P. & Geer, S. v. d. Generalized linear models and the lasso. In Statistics for High-Dimensional Data 45–53 (Springer, 2011).
- 40.Bishop, C. M. et al.Neural Networks for Pattern Recognition (Oxford University Press, 1995).
- 41.Feng, C. X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl.8, 1–19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lambert, D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics34, 1–14 (1992). [Google Scholar]
- 43.Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B72, 417–473 (2010). [Google Scholar]
- 44.Kohn, A. & Smith, M. A. Utah array extracellular recordings of spontaneous and visually evoked activity from anesthetized macaque primary visual cortex (V1). CRCNS.org10.6080/K0NC5Z4X (2016).
- 45.Smith, M. A. & Kohn, A. Spatial and temporal scales of neuronal correlation in primary visual cortex. J. Neurosci.28, 12591–12603 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Alstott, J., Bullmore, E. & Plenz, D. Powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE9, e85777 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat. Commun.10, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Serafino, M. et al. True scale-free networks hidden by finite size effects. Proc. Natl. Acad. Sci. USA118, e2013825118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Watts, D. J. & Strogatz, S. H. Collective dynamics of small-world networks. Nature393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]
- 50.Yinon, U. & Auerbach, E. Bursting patterns of neurons in the cat’s visual cortex. Exp. Neurol.44, 71–81 (1974). [DOI] [PubMed] [Google Scholar]
- 51.Onorato, I. et al. A distinct class of bursting neurons with strong gamma synchronization and stimulus selectivity in monkey v1. Neuron105, 180–197 (2020). [DOI] [PubMed] [Google Scholar]
- 52.Legendy, C. & Salcman, M. Bursts and recurrences of bursts in the spike trains of spontaneously active striate cortex neurons. J. Neurophysiol.53, 926–939 (1985). [DOI] [PubMed] [Google Scholar]
- 53.Krahe, R. & Gabbiani, F. Burst firing in sensory systems. Nat. Rev. Neurosci.5, 13–23 (2004). [DOI] [PubMed] [Google Scholar]
- 54.Livingstone, M., Freeman, D. & Hubel, D. Visual responses in v1 of freely viewing monkeys. In Cold Spring Harbor Symposia on Quantitative Biology vol. 61, 27–37 (Cold Spring Harbor Laboratory Press, 1996). [PubMed]
- 55.Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ecker, A. S. et al. State dependence of noise correlations in macaque primary visual cortex. Neuron82, 235–248 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Arieli, A., Sterkin, A., Grinvald, A. & Aertsen, A. Dynamics of ongoing activity: Explanation of the large variability in evoked cortical responses. Science273, 1868–1871 (1996). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and/or analyzed during the current study are available in the data repository with the following weblink: https://github.com/suryadi-t/Discrete-JGC/tree/main/Data. The real monkey brain data is available at https://crcns.org/data-sets/vc/pvc-11

















