Jacobian Granger causality for count and binary data with applications to causal network inference

Suryadi; Lock Yue Chew; Yew-Soon Ong

doi:10.1038/s41598-025-33385-w

. 2025 Dec 21;16:3452. doi: 10.1038/s41598-025-33385-w

Jacobian Granger causality for count and binary data with applications to causal network inference

Suryadi ¹, Lock Yue Chew ^1,^✉, Yew-Soon Ong ²

PMCID: PMC12835157 PMID: 41423563

Abstract

Granger causality is a commonly used approach for network inference in neural systems. Recent advances in the field allow for the analysis of high-dimensional and nonlinear systems through the use of artificial neural networks, but the formulations are optimized for continuous data. In this work, we show the limitation of this formulation for discrete count data, particularly when the data are sparse. To overcome this limitation, we extend Jacobian Granger causality, a neural network-based approach to Granger causality, to other data types, namely count data and binary data, through the use of different loss functions. We examine its performance compared to a competing approach through the use of simulated data and finally apply it to real neural spiking data recorded from monkey visual cortex when presented with white noise and natural movie stimuli. We found that the natural movie leads to a more structured activity with a larger set of edges shared over two separate observations, and more neurons inferred with positive self-connection, whose burst-like activity has been associated with the encoding of salient visual information, which is present in natural scenes.

Keywords: Granger causality, Time series, Machine learning

Subject terms: Biophysics, Mathematics and computing, Physics

Introduction

In the analysis of neural data, one of the aspects of interest lies in the connectivity between neurons. As experimental methods continue to progress, neural datasets become increasingly high-dimensional, thereby requiring approaches that can efficiently and accurately process such data while also inferring salient properties, such as whether the connection is excitatory or inhibitory. While neural connectivity was traditionally analyzed by measuring functional statistical dependencies through methods such as cross-correlogram¹, the bivariate nature of the analysis allows the potential for common and indirect causes to confound the analysis. To sidestep such an issue, multiple approaches have been applied for network inference in neural data, including dynamic Bayesian networks^2–5, transfer entropy^6,7, and Granger causality^8–10. Other approaches for causal analysis also exist (e.g. see¹¹), which will not be discussed here.

Bayesian networks¹² infer causal graphs through conditional independencies between variables in the system, resulting in directed acyclic graphs. Dynamic Bayesian networks¹³ (DBNs) extend Bayesian networks by incorporating time-lagged interactions, which sidestep the constraint in the latter, requiring the resulting network to be acyclic. Also, DBNs typically model interactions at a fixed lag of 1 timestep¹⁴, which prevents the identification of interactions at larger lags. In addition, due to the need for modeling high dimensional probability distributions, DBNs are usually affected by the curse of dimensionality, requiring data size exponential to the dimension of the data being analyzed, although score-based and greedy search-based approaches have been developed to sidestep this problem¹⁴.

Transfer entropy¹⁵ measures information flow between components to infer connections, which accounts for nonlinear interactions. While the original formulation is bivariate, multivariate extensions have been proposed to allow more reliable inference of network structures^16,17. This approach seamlessly incorporates higher order lags, allowing time lag selection directly from the application of its approach. Like DBNs, however, it also typically suffers from the curse of dimensionality due to its modeling of high-dimensional probability distributions^18,19.

Granger causality²⁰ measures predictive causality, where a candidate variable is Granger causal to a target variable if the inclusion of the candidate variable improves the prediction of the target variable in the presence of information from other variables in the system. Under certain settings, such as for variables with Gaussian noise, it has been shown to be equivalent to transfer entropy^21,22. The standard formulation of multivariate Granger causality typically involves the use of linear vector autoregressions, which allows seamless time lag selection but restricted to linear modeling. Recent years however saw increased interest in a nonlinear formulation of Granger causality through the use of neural networks^23–29, which presents several advantages. Firstly, the neural network-based model allows the modeling of any nonlinear function by virtue of the universal approximation theorem³⁰. Despite the flexibility, these approaches have also been shown to work with high-dimensional data without the need for exponentially large datasets. Lastly, Granger causality in these approaches is typically inferred through the use of sparsity-enforcing penalties^{23,24,26–29} resulting in sparse inferred networks, which agree well with what would be expected from neural data. Due to these advantages, we focus on Granger causality in this work.

Despite the many uses of Granger causality, its original formulation has been largely restricted to continuous data³¹, which is not compatible with neural spike data made up of discrete counts. While smoothing methods have been applied to spike train data³² to account for this, the resulting distortion to the data reduces the accuracy of the resulting inferred network. In recent years, however, more attention has been given to Granger causality for point processes^9,10 and heterogeneous data as a whole^33,34. These studies generally involve a generalized linear formulation of vector autoregression involving different link and loss functions according to different assumptions on the underlying distribution behind the generating model, allowing the model to work with diverse data types including continuous data, discrete count data (i.e. non-negative integers), as well as categorical data. While there has been work in this area, it remains the case that the formulation either fixes a certain functional form (usually linear)^9,10,33,34, or faces difficulty in high-dimensional cases³¹.

Similar to Granger causality as a whole, the aforementioned neural network-based approaches are also formulated for continuous data. Here in this paper, we focus on one of such methods, namely Jacobian Granger causality (JGC), and extend its formulation towards count and binary data through the use of different loss functions in the same way as the aforementioned generalized linear approaches, in order to infer the connectivity behind spike train data.

The inference of neuronal networks in neural spike train data necessitates the ability of the approach to handle high-dimensional data efficiently, while also working well with sparse data. While the former has been established for JGC²⁹, we push this further in this work and also explore a feature absent in continuous data and unique to count data, namely sparsity, where a significant fraction of the data is made up of zeros. We see later that the latter is especially problematic for the mean squared error loss used in the original JGC formulation. The application of JGC (and by extension any similar neural-network based approach) therefore necessitates the use of a different loss function more suitable for such data types.

This paper is organized in the following manner. We begin by reviewing the original formulation of Jacobian Granger causality (JGC), after which we discuss other loss functions beyond what was originally formulated for JGC: those specialized against specific data types, namely Poisson loss for count data, binary cross-entropy for binary data, as well as the hurdle loss which generalizes the linear hurdle model^35,36 for highly sparse data. We then demonstrate a new graphical approach to identify under-regularized and over-regularized regimes in JGC, allowing the determination of a balanced regularization regime that allows an unambiguous identification of potential Granger causal variables. With this, we then test the different loss functions in simulated branching process data, demonstrating the value of loss functions optimized for discrete data in comparison to the regular mean squared error, particularly for sparse data. Subsequently, we compare JGC with two existing algorithms developed for count data, namely adaptive Granger causality (AGC)¹⁰ and heterogeneous Granger causality (HMML)³⁴ using simulated spike train data, evaluating also how well JGC is able to correctly identify the sign of each interaction, i.e. whether it is excitatory or inhibitory. Lastly, we apply JGC to real spike train data from monkey visual cortex and discuss its findings and implications in detail.

Methods

Jacobian Granger causality

Jacobian Granger causality (JGC) considers a general autoregressive perspective to Granger causality with the evolution of a target variable Inline graphic written as:

where Inline graphic represent variable for up to some pre-determined maximum tested time lag , and representing zero-mean noise. We note that in the original formulation of JGC, contemporaneous variables (variables with zero lag from the target variable, i.e. ) are included in for systems that can be modeled as trajectories evolving under differential equations. As this generally does not apply to count data nor neural spike trains, we exclude them in the model in this work.

From (1), we can define Granger non-causality of some variable Inline graphic on target variable at a time lag if for all :

with Inline graphic representing at all time lags except . This defines a variable to be not be Granger causal to a target variable at lag if the evolution function of is invariant of for all t. In this work, we refer to these Granger non-causal variables as irrelevant variables. By definition, a variable is a Granger cause if it is not irrelevant.

In practice, the definition above is typically approached by first finding a way to measure the importance of each variable at each given time lag in Inline graphic followed by some variable selection procedure to infer the Granger causal variables based on the variable importance scores. The resulting inference is at the level of individual time lags, which therefore incorporates time lag selection for each inferred Granger causal interaction. The resulting Granger causal network can then be drawn by drawing directed edges from the Granger causal variables to the corresponding target variables.

In JGC, Inline graphic in the above equations is modeled using a feedforward neural network, where the first hidden layer is wired one-to-one from the input layer, and each variable at each time lag is fed into separate input nodes in the neural network (Fig. 1). After training, the importance of each variable Inline graphic is measured using the Jacobian:

Due to the use of time-series data, this computation yields a time-indexed vector of derivatives for each Inline graphic . We can then condense it into a single summary statistic for variable importance by taking the absolute mean value , which carries the assumption that the Granger causal structure is stationary over the entire time series.

Fig. 1 — The neural network architecture for JGC: an MLP with one-to-one connections in the first hidden layer followed by subsequent layers of dense connectivity. In this paper, the network is trained using the Adam algorithm with the default learning rate at 1e-3, which we found to have sufficiently good performance across all test cases.

In the original formulation, JGC is trained with the following loss function:

where the error function Inline graphic used is the mean squared error, denotes the weights of the n-th layer of the neural network, while and are both regularization parameters. actively selects for salient variables in the first hidden layer by enforcing sparsity, while serves to prevent overfitting in the subsequent layers. Due to the relatively minimal direct contribution of Inline graphic in variable selection, is fixed to a small value (0.01).

Granger causal variables are subsequently selected in a two-part procedure²⁹, respectively testing for significance and consistency. Significance is tested under the assumption that the system contains a mixture of irrelevant variables and Granger causal variables, where the latter is relatively sparse compared to the former. Due to L1 regularization, the estimated signed importance score Inline graphic for irrelevant variables are clustered together with small values near zero, which we assume to be Gaussian (Fig. 2). The minimum covariance determinant estimator³⁷ is then applied to the z-score of the mean Jacobian values to estimate the mean and variance of this Gaussian null distribution, following which we impose the criterion that a variable is considered significant if its mean Jacobian is at least 1 standard deviation away from the mean of the null distribution. The criterion here is deliberately less stringent to prevent the exclusion of weak causes; the subsequent test would serve to minimize the false identification of causes.

Fig. 2 — The Jacobian as a function of time (a–c) and the corresponding distribution of the mean Jacobian (d,e,f) at different regularization regimes: (a,d) under-regularization, (b,e) balanced regularization, (c,f) over-regularization. All plots have the same target variable, and the different curves in (a–c) and data points in (d–f) correspond to different candidate causes. These plots are obtained from data in the simulated spike train section.

The subsequent consistency test seeks to find the largest stable set of variables with consistently high importance scores, which then requires multiple independent runs of the full procedure (with independent initiation and training of neural networks); in our case we find 3 independent runs to suffice. With these three replicates, the consistency test then begins with the corresponding three sets of significant variables. The three sets are then iteratively trimmed by removing all variables not in the intersection of the three sets, as well as all variables whose importance scores are lower than any of the removed variables. The latter enforces the assumption that removed variables (which do not contribute consistently) are not Granger causes, and therefore all other variables with importance scores weaker than these removed variables are also not Granger causes. This is repeated until convergence, and the variables remaining in the final convergent set are taken to be the Granger causal variables. The interaction sign of each Granger causal variable can then be extracted from the sign of the corresponding estimated signed importance score Inline graphic . The full algorithm describing this procedure is given in Algorithm 1.

Algorithm 1 — Inference of consistent variables

The formulation of JGC requires a separate neural network for each target variable. This allows two benefits: the regularization parameter Inline graphic can be individually tuned for each target variable separately, and the loss function can also be adapted to the specific data type of each target variable, which allows the analysis of heterogeneous data. In particular, we seek to replace the mean squared error term ( in (4)) with other loss functions for other data types. We note that the other procedures in JGC, including the two-part variable selection procedure (i.e. the significance and consistency tests), remain the same; only the loss functions are changed according to the data type of the target variable.

Loss functions for heterogeneous data types

JGC in its original formulation uses the mean squared error loss function. In this section, we discuss other loss functions that cater to count and binary data. In addition, we discuss a different formulation for sparse count data that may potentially present an improvement over the other loss functions, and subsequently compare these formulations.

We begin by noting that the mean squared error in the original loss function (4) can be interpreted as a maximum likelihood estimation given the assumption that the conditional distribution of the output given input is Gaussian with constant variance³⁸. This may present a problem when the true output distribution significantly deviates from this assumption, although to what extent this is a problem is unclear due to the high flexibility of neural networks. In the same manner as generalized linear models, we can construct different loss functions by starting with different fundamental assumptions on the generating distribution.

Poisson loss for count data

The most common loss function used for discrete data assumes the Poisson distribution. Other than the discrete assumption, another significant difference it has to the Gaussian assumption of mean squared error is that its variance is non-constant but scales with the mean, and that the distribution is asymmetric at small mean values. In the same manner as Poisson regression in generalized linear models, the Poisson loss for a given observation Inline graphic can be computed by minimizing its negative log-likelihood to obtain³⁹:

where Inline graphic is the mean Poisson rate, which in this case is what we seek to model using the neural network.

In the neural network implementation, the activation function of the output neuron is set to be the exponential function to encode the non-negativity constraint in count data.

Binary cross-entropy for binary data

Binary data is typically modeled using the Bernoulli distribution, and minimizing its negative log-likelihood leads us to the binary cross-entropy loss function, which is a standard loss function used for binary classification tasks in neural networks as well as logistic regression. Given a binary observation Inline graphic and an output probability , the binary cross-entropy is given by^39,40:

To enforce the probabilistic interpretation of the output Inline graphic , the activation function of the output neuron in this case has to be set to the sigmoid function, thereby keeping the output value in the (0, 1) range. We note that as we shall see subsequently, it is possible to encounter count data that are effectively binary (i.e. taking values no larger than 1) when it is very sparse. If this loss function is used for count data that is approximately binary (i.e. very low likelihood of observing counts larger than 1), the data has to be binarized prior to neural network training.

Hurdle loss for sparse count data

In the case of sparse count data, the prevalence of zeros may potentially lead to suboptimal representations. We seek to minimize this impact by introducing a separate component to the model for the zeros. In the modeling of discrete data, there are two common modeling approaches for this⁴¹, namely the zero-inflated model⁴² and the hurdle model^35,36.

In this paper, we focus on the hurdle model, which is a mixture of a Bernoulli distribution and a truncated count distribution, the latter being truncated at zero. As before, we model the count distribution using the Poisson distribution. The Bernoulli component models the two states of zero and nonzero values, while the Poisson component models the nonzero integers. Given an observation of the target variable Inline graphic , we define an indicator variable and denote as the parameter of the Bernoulli component and as the mean rate parameter of the Poisson component. We can therefore write the likelihood function of the hurdle Poisson model as^35,36,41:

Minimizing the negative log-likelihood and removing all terms independent of the parameters yields:

The standard hurdle Poisson model models Inline graphic through a generalized linear model of covariates with the appropriate link functions for respective distributions. Here, we generalize this by letting be any nonlinear function of the covariates through the use of neural networks. One significant difference from the original hurdle Poisson model is that in the original case, the Inline graphic and are separately modeled with different weights. Here we model these two quantities with weight sharing, which allows the shared weights to learn from both Poisson and Bernoulli perspectives. In particular, the neural network weights are shared in all layers except the final one. Subsequently, for a given target variable Inline graphic , the importance of each variable is then measured via the Jacobian .

Selection of

The Inline graphic term in the loss function (4) serves to control the sparsity of the representation, which naturally has to be tuned to optimally represent the system. While this tuning is typically done by examining predictive performance in a held-out dataset, this is a challenge for variable selection as it is known that a lower prediction error needs not necessarily correspond to a more accurate set of selected variables^26,43. Also, while relative stability of the variable selection performance relative to Inline graphic was observed for the original formulation of JGC²⁹, we see later that the same is not true when other loss functions are used. Due to that, a criterion for selecting is required when these other loss functions are used. In this section, we describe a new graphical approach for identifying different regularization regimes and introduce a heuristic for Inline graphic selection that makes use of these characteristics.

We begin by noting that a severely over-regularized regime can be readily identified by inspecting the plot of the Jacobian as a time series. As we see in Fig. 2c,f, severe over-regularization leads to the dropping of all input variables, thereby compressing the Jacobian into constants of very small magnitudes. The under-regularized regime, however, is not as obvious. Even then, we can see in Fig. 2a,d that when under-regularized, irrelevant variables are readily used for the prediction of the target variable leading to a stronger mixing among the null and non-null components in the mixture, causing the mean Jacobian as a whole to approach a Gaussian shape. In other words, if there is no apparent separation between potential causes and irrelevant variables, then it is likely the case that the applied regularization is too weak. While optimizing Inline graphic may be a difficult task, these observations could, at the very least, guide the user towards finding a good enough value for in between the over- and under-regularized regimes, such as that seen in Fig. 2b,e. In particular, we note the clear separation between irrelevant and significant variables in Fig. 2e, which allows even identification by eye. While Fig. 2f appears to also have this property, the very small magnitudes imply that all variables are insignificant.

In our experiments, we begin by running JGC over a range of Inline graphic values. We then tested against over-regularization by only accepting values of whose Jacobian has at least a single occurrence of value with magnitude over all time t. We highlight that this threshold of 0.01 was experimentally verified in all our experimental data by observing the Jacobian values using different values of Inline graphic covering the 3 regimes. In all our tested datasets, over-regularized Jacobian values are all while balanced and under-regularized Jacobian values have at least one value for some t. Figure 2 is representative of this observation across all our datasets. In practice, therefore, we suggest doing similar observational tests on the dataset being studied before deciding on a suitable Jacobian threshold for the over-regularized regime. We additionally present an argument for why over-regularization shrinks the Jacobian in the Supplementary Material.

Lastly, we selected from the remaining Inline graphic values the one with the largest number of Granger causal variables. Where multiple values tied with the same number of selected variables, we chose the largest .

Experimental results and discussion

In this section, we test the performance of JGC on two simulated datasets, focusing on the correct identification of the full Granger causal network, and subsequently apply JGC to one real dataset. We label the different tested loss functions by the assumed underlying distribution, with the exception of the hurdle model as it involves a mixture of distributions, therefore the mean squared error and binary cross-entropy are labeled as Gaussian and Bernoulli loss respectively. We make use of up to two separate metrics in our analyses, namely the area under the precision-recall curve (AUPRC) and the F-score. Both measures are similar in that they involve the same quantities: precision ( Inline graphic false discovery rate) and recall (false negative rate), and are bounded by [0, 1] where a larger score implies a stronger performance. On the other hand, AUPRC focuses on the raw importance scores while F-score deals with the final variable selection results. AUPRC measures how well the importance scores are able to distinguish between the true Granger causal variables from the irrelevant ones, and achieves a perfect score of 1 if a threshold could be found that perfectly separates the two groups. If not, it measures how good a theoretical best threshold is, based on how the balance between precision and recall is impacted. In contrast, the F-score simply looks at a specific variable selection outcome, and computes the harmonic mean of the resulting precision and recall. With these two metrics, we can separately measure the performance of the Jacobian as an importance score, as well as the combined performance with our variable selection procedure.

We begin by simulating a simple additive branching process in both sparse and non-sparse regimes to characterize and compare the different loss functions on JGC underlying different fundamental assumptions on the data type. Subsequently, we consider a more complex simulated dataset of neural spike trains and compare the performance of JGC against two existing approaches, AGC and HMML. In both experiments, we repeated the Granger causal inference 3 times with different data and present the aggregated results. Lastly, we apply JGC to real spike train data of monkey visual cortex.

Further information on experimental hyperparameters for JGC when applied to each of these datasets can be found in Table S1. The source code for JGC and the simulated data used in this paper can be found on https://github.com/suryadi-t/Discrete-JGC. The real spike train data can be obtained by following the citations given in the beginning of its section.

Additive branching process

In a typical branching process, we consider a network of nodes with active and passive states such that an active node i has some branching probability Inline graphic of causing a downstream connected node j to be active in the next timestep. This typically involves binary states (active and inactive), however in this simulation, we allow the activity to be additive: each “activation” adds value to a node (from a baseline value of 0), and multiple activations can occur to a single node in any given timestep, thereby allowing any nonnegative integer states, generalizing the branching process towards count data. Mathematically, we can represent the activity Inline graphic of node j with parent nodes as:

To prevent runaway activity, we set it such that activations larger than 1 in a given node have the same probability of stimulating a downstream node as activation valued at 1. At the end of each timestep, the value for each node is reset to 0, and can be activated again in the next timestep.

Here we generated random directed networks of 50 nodes with probability Inline graphic of there being an edge between any two distinct nodes. At any given timestep, each node has a random chance to gain activity with a Poisson mean rate . To ensure stability, we set the branching probability such that where is equally divided among the incoming connections. For this system we generated two different cases (Fig. 3): one which is non-sparse ( Inline graphic ) and one which is sparse (). These two cases have mean activity in the order of magnitude of 1 and 0.01 respectively.

Fig. 3 — Sample data and results for one of the experiments on the additive branching process: (a,b) sample time series of one variable, (c,e) the true adjacency matrices, (d,f) the false discovery rate (FDR) and false negative rate (FNR) computed from the resulting adjacency matrix inferred by JGC. (a,c,d) correspond to the non-sparse case while (b,e,f) correspond to the sparse case.

Figure 4 shows the AUPRC and F-score of each approach achieved at different values of the regularization parameter Inline graphic on both the non-sparse (Fig. 4a,b) and sparse (Fig. 4d,e) data, with numerical values given in Table S2. In the non-sparse case, we immediately observe that the Gaussian loss has greater stability with respect to while the other losses are much more sensitive to . This suggests that one should consider biasing Inline graphic to smaller values when using the non-Gaussian loss functions. The highest scores are comparable among the different methods with the exception of the Bernoulli loss, which makes sense as it requires the binarizing of the data, which entails a significant loss of information for the non-sparse case. We also see that in the non-sparse case, the Gaussian loss performs very well despite its continuous assumption. While the hurdle model at its best does not outperform Poisson and Bernoulli losses, we observe greater stability in the results with respect to Inline graphic .

Fig. 4 — The AUPRC (a,d) and F-score (b,e) attained by JGC using different loss functions for both the non-sparse (a,b,c) and sparse (d–f) realizations of the additive branching process (section), as a function of the regularization parameter . (c,f) shows the F-score achieved by JGC by selecting according to our proposed heuristic in section, with the bars given in the order: Hurdle, Poisson, Bernoulli, Gaussian. The exact numbers are given in the Supplementary Material.

The limitation of the Gaussian loss is immediately apparent in the sparse case. Here, the three non-Gaussian losses perform very similarly. In particular, Bernoulli loss performs comparably well to Poisson and hurdle losses due to the rarity of values larger than 1 in this sparse case, in which case binarizing the data (which is needed for the Bernoulli loss) does not lead to much loss of information.

Lastly, we also performed our Inline graphic selection heuristic and plot the resulting F-scores in Fig. 4c,f (numerical values given in Table S3). Here we emphasize that is independently selected for each target variable, as the optimal should in general differ due to different number of Granger causes and weights of each Granger causal relation. We see that the selection heuristic performs well in both non-sparse and sparse cases, where the resulting F-scores are not very far from the peaks shown in Fig. 4. Here we observe no clear advantage of using the hurdle loss over the other loss functions.

Simulated neural spike train

Here we test the performance of JGC with its different loss functions on simulated neural spike train data⁹, where the simulation contains a network of 9 self-inhibiting neurons with a mix of excitatory and inhibitory connections to other neurons (Fig. 5). We compare the performance with existing methods for count data, namely AGC¹⁰ and HMML³⁴ at three different data lengths T of 500, 1000, and 2000 in Fig. . Activity is very sparse in this dataset, where each neuron is active less than 3% of the time, and the activity is binary in this case. For this dataset, we compare the result of the full pipeline, i.e. for JGC we perform the Inline graphic selection heuristic, and compare the inferred Granger causal variables using the F-score. We see in Fig. 6a that the results for JGC correspond well to the sparse case of Fig. 4d–f, where the Gaussian loss performs much worse than the remaining three losses that perform very similarly. We also see that the top 3 loss functions in JGC significantly outperform both AGC and HMML.

Fig. 5 — Sample data and results for one of the experiments on the simulated neural spike train data: (a) raster plot of the system, (b) the true adjacency matrix, (c) the adjacency matrix inferred by JGC.

We further unpack this difference by inspecting the individual differences in the false discovery rate (FDR) and false negative rate (FNR) in Fig. 6c,d. We note that we use FDR instead of false positive rate (FPR) here because in systems like this with sparse true positives, FDR serves as a stricter and more representative measure of false positives compared to FPR. In comparison with HMML, we see that JGC with hurdle, Bernoulli, and Poisson losses have similar FDR values, which are relatively high due to the sparse data and short time series length used, which we recall has less than 3% active bins. JGC with the three losses have significantly lower FNR compared to both AGC and HMML, and it is this lower FNR that explains the significant difference in the F-scores between JGC and the two approaches. While we do observe AGC and JGC with Gaussian loss to have significantly lower FDR than the other JGC loss functions, this is not a strength as it led to much higher FNR comparable to HMML.

As this network contains a mixture of excitatory (positive) and inhibitory (negative) connections, another aspect of interest is in whether JGC is able to correctly identify the interaction sign. To properly investigate this property, we consider only the inferred connections that exist in the ground truth network and compute the fraction of these connections that has the correct sign inferred. We denote this quantity as the adjusted sensitivity. Figure 6b shows this score for only JGC and AGC as we are unable to extract the sign of the inferred connections from HMML. As Fig. 6b indicates, JGC is generally reliable in inferring the sign when the inferred connection exists in the ground truth, around 90% of the time in this simulated data. While the Gaussian loss may appear to have higher adjusted sensitivity, we note that this is due to the much fewer correct inferred interactions (as is also hinted by the high FNR). In other words, as the Gaussian loss could only infer the true connections that are easier to discover, the signs are also easier to infer correctly.

Real spike train data in monkey visual cortex

Here we apply JGC with the Poisson loss on real spike train data^44,45. In this dataset, neural spiking activity was measured in anesthetized macaque primary visual cortex using Inline graphic multi-unit electrode arrays with 400 m spacing, with spike sorting performed after measurement to identify distinct neurons underlying the measured activity. In this section, we make use of the evoked dataset, which involves repeated measurements on two different monkeys with the microelectrode array position intact over different repetitions and stimuli presentations. We analyzed separately the first two repeated measurements on each monkey presented with two stimuli: a movie of white noise (30 s) and a movie of a contiguous natural scene (a monkey wading through water, 30 s). In each case, the data was binned at time bins of 5 ms, leading to 6000 time bins in each dataset. Activity is sparse in all cases, with Inline graphic active bins in the white noise case and active bins in the natural movie case. Figure 7 shows a sample raster plot and the firing rate distribution for all the datasets studied in this work.

Fig. 7 — Raster plot and descriptive statistics for the real spike train data: (a,b) Raster plot for the first experiment on monkey 1 with (a) white noise and (b) natural movie. The experiment is 30 seconds long, but the raster plots are only plotted to 6 seconds for better resolution. (c) Shows the descriptive statistics for each monkey with white noise and natural movie stimuli. Each group has 2 box plots, one for each experiment.

As the data contains counts larger than 1, and as we did not observe any major benefit in using the hurdle model from the earlier experiments, we analyzed this dataset using JGC with Poisson loss. The sensitivity of the results with respect to Inline graphic is given in Fig. S1.

Figure 8 displays the excitatory component of the two resulting networks inferred using JGC from the first measurement of monkey 1 presented with the white noise and natural scene respectively, with the corresponding adjacency matrices given in Fig. 9a,b. The corresponding networks and adjacency matrices for the other monkey and measurements are given in Figs. S2 and S3. We focus the presentation in this figure on the excitatory component of the network (i.e. omitting the inhibitory connections) as there is a clear contrast between the two in this aspect, where the white noise case presents a relatively dense network and the natural movie case displays a much sparser and more modular network. The same pattern is observed also in the second experiment (Fig. S2a-b), suggesting an overall more structured neural activity pattern, at least in the excitatory circuit, in response to a natural visual stimulation. This contrast however is absent in the second monkey for both experiments (Fig. S2c-f). This suggests the individual subject as a potential source of variance, or possibly the experimental setup itself (where the neurons probed between the two monkeys may be quite different in function), or that JGC is unable to capture this aspect of the structure in the second monkey. It is worth noting that the number of neurons measured in monkey 2 ( Inline graphic ) is almost twice that of monkey 1 (). Despite so, we later see that a measurable difference nevertheless exists between the white noise and natural movie case, even for monkey 2. In subsequent network analyses, we include the full network into consideration, i.e. with both excitatory and inhibitory components.

Fig. 8 — The excitatory component of the inferred functional networks for the first measurement of the primary visual cortex in monkey 1 when presented with the stimulus (a) white noise; (b) natural movie. The color of the nodes labels the inferred communities, with the isolated grey nodes indicating nodes not in any community.

Fig. 9 — Details on the inferred networks. (a,b) Inferred adjacency matrices for the first measurement of the primary visual cortex in monkey 1 (section) when presented with the stimulus (a) white noise; (b) natural movie. Rows represent source nodes and columns represent target nodes. The gray areas represent the absence of edges. We observe a sparser network in (a) relative to (b), while (b) shows a larger number of self-excitatory connections along the diagonal. (c–j) Joint distribution of the in-degree and out-degree in each network. (c–f) monkey 1, (g–j) monkey 2; (c,d,g,h): white noise, (e,f,i,j) natural movie. In each monkey-stimulus pair, the plots are ordered from experiment 1 to 2. The colorbar measures the frequency of each bin.

We next examine the degree distributions in the inferred networks. Generally, for both the in-degree and out-degree (Figs. S4 and S5), we observe a preference for low degree (typically less than 5), with a small probability of higher degrees. In all cases, we do not observe power law distributions (likelihood ratio tests were performed using the powerlaw package⁴⁶, and in all cases the power law distribution is not preferred over the exponential distribution, Inline graphic ), hence there is no evidence of scale-free property in these networks. While this agrees with the study suggesting that scale-free networks are empirically rare⁴⁷, we nevertheless note that the lack of scale-free property here could potentially be due to finite size effects⁴⁸, as the sizes of the analyzed networks are not large enough to cover several orders of magnitude in the degree distribution. We additionally examined the joint distribution of both the in- and out-degrees (Fig. 9c–j) and interestingly, we observe that there appears to be a distinct preference for low in-degree with a wider spread of the out-degree, indicating that neurons in this dataset may generally have more outgoing connections than incoming ones.

Here we investigate the possibility for small-world property⁴⁹ in these networks, which is characterized by low average shortest path lengths and high average clustering coefficient relative to random networks. For directed networks, these measures can only be computed if the network is weakly connected (i.e. there exists a path between any two neurons in the undirected version of the network), hence we applied these measures to the largest weakly connected component of each inferred network. For each inferred network, we compared the two measures against 500 weakly connected random networks of the same size and sparsity (Fig. S6) and found that generally, with one exception, there is no significant difference for the two measures among the inferred networks and random networks, suggesting that these inferred networks do not exhibit small-world property.

Table 1 summarizes the information on the connectivity of the inferred network among distinct neurons, which excludes any inferred self-connections. It is notable that the resulting networks from the same monkey-stimulus pair have similar number of edges across the two experiments. The relatively small number of edges also means that there is consistent sparsity across all cases, with Inline graphic of all possible pairwise connections being present. Another interesting point of note is that among distinct neurons, there appears to be significantly more inhibitory connections than excitatory ones.

Table 1.

Summary of information on the monkey visual cortex dataset as well as the resulting network from JGC analysis on edges between distinct neurons.

ID	Neurons	Stimulus	Edges (% Sparsity)		(+,-)		Shared	% Edges Shared
ID	Neurons	Stimulus	Expt1	Expt2	Expt1	Expt2	Edges	Expt1	Expt2
Monkey 1	74	Noise	302 (5.59%)	327 (6.05%)	114, 188	162, 165	26	8.61%	7.95%
Monkey 1	74	Natural	364 (6.74%)	340 (6.29%)	55, 309	83, 257	46	12.64%	13.53%
Monkey 2	123	Noise	403 (2.69%)	377 (2.51%)	127, 276	103, 274	21	5.21%	5.57%
Monkey 2	123	Natural	433 (2.89%)	406 (2.71%)	134, 299	126, 280	35	8.08%	8.62%

Open in a new tab

The “Neurons” column indicates the total number of observed neurons in the system, which indicates the network size N. Under the “Stimulus” column, “Noise” refers to white noise, while “Natural” refers to a movie of a natural scene, both of which were presented for 30 s. Sparsity here is computed as the fraction between the number of inferred edges and the total number of possible pairwise edges between distinct neurons (for a network of size N, this number is given by Inline graphic ). The (+,-) column breaks down the inferred edges into excitatory and inhibitory connections respectively. This table analyzes connections between distinct neurons; the corresponding analysis on self-connections is found in Table 2.

Between the two types of presented stimuli, we expect white noise to lead to less structured neural activity compared to the natural movie. We sought to explore this idea by inspecting the commonality between the different measurements of the same monkey-stimulus pair. One striking observation is that in all cases, including the natural movie stimulus, only a small percentage of edges are inferred to be shared between the measurements of experiment 1 and 2. These numbers are shown in the “Shared Edges” column of Table 1. This is particularly interesting, especially since the number of edges do not significantly differ among the two measurements. This suggests that while trial-to-trial variability may exist in the functional network, the density of the network remains at a similar level.

While the fraction of shared edges is consistently small across the repeated experiments in all cases, we nevertheless observed a notable difference between the two stimuli: the percentage of edges present in both measurements is significantly higher (about 50% higher) in the natural stimulus case compared to white noise in each monkey. This agrees with our earlier intuition that the natural stimulus should lead to a more structured functional network, which manifests here through a larger set of consistent connections compared to the white noise stimulus.

Next, we consider the corresponding analysis for self-connections in Table 2, where self-connection here refers to a Granger causal connection from a neuron to itself. As has been hinted in the adjacency matrices in Fig. 9a,b, one interesting observation is that the majority of the inferred self-connections are excitatory. This suggests the possibility of bursting dynamics, where neurons fire spikes repeatedly in a rapid-fire motion over a short timeframe, which has indeed been observed in the visual cortex^50,51. We investigated this in Fig. 10, where for each dataset we segregated the data into neurons with positive self-connections inferred and the remaining neurons, denoted “non-positive”. In each group we computed the fraction of consecutive activity, i.e. the fraction between the number of times a neuron is active in two consecutive bins over its total number of active bins, and indeed, we see that the positive group on average has at least twice as many such observations (this difference is significant by Welch’s t-test with unequal variances, Inline graphic in all cases). While the non-positive neurons also exhibit this behavior to a smaller extent, JGC does not infer self-connection if this could be better and sufficiently explained by the activity of other neurons.

Table 2.

Summary of information on the monkey visual cortex dataset and the resulting network from JGC analysis on edges for self-connections.

ID	Neurons	Stimulus	Edges (% Sparsity)		(+,-)		Shared	% Edges Shared
ID	Neurons	Stimulus	Expt1	Expt2	Expt1	Expt2	Edges	Expt1	Expt2
Monkey 1	74	Noise	12 (16.22%)	20 (27.03%)	11, 1	20, 0	3	25.00%	15.00%
Monkey 1	74	Natural	20 (27.03%)	21 (28.38%)	18, 2	21, 0	10	50.00%	47.62%
Monkey 2	123	Noise	19 (15.45%)	20 (16.26%)	19, 0	18, 2	8	42.11%	40.00%
Monkey 2	123	Natural	29 (23.58%)	25 (20.33%)	28, 1	24, 1	15	51.72%	60.00%

Open in a new tab

The columns are as defined in Table 1. As this table considers only self-connections, sparsity here is computed as the ratio between the number of inferred self-edges and the network size N.

Fig. 10 — Comparison of the average percentage of consecutive activity between neurons with (blue) and without (orange) inferred positive self-connection. The percentage consecutive activity was computed for each neuron as a ratio between the number of consecutive active time bins and the total number of active time bins. The bars for each monkey-stimulus pair are plotted in two sets, one for each experiment.

While neurons in the primary visual cortex, like other brain regions, are known to spontaneously burst⁵², bursting is also associated with the encoding of visual information in visual stimuli, such as orientation and spatial frequency⁵³. Similarly, a study on awake monkeys⁵⁴ showed that restricting the analysis on bursting (or otherwise very fast firing) neurons was necessary to produce a clear response map to stimuli, which was otherwise obscured by the background activity of other neurons. Given that the natural movie stimulus is likely to contain more structures that lie within the receptive fields of neurons in the primary visual cortex compared to white noise, our finding of a larger number of bursting neurons in the former is consistent with these studies.

Lastly, we also note that with regards to shared edges across different experiments, we observe the same situation in Table 2 as in Table 1: the natural movie case leads to a significantly larger set of shared edges, further supporting our intuition that the natural movie leads to a more structured activity pattern in the brain.

An important aspect to note of this dataset is that while the experiment was being performed, the monkeys were maintained in a state of anesthesia while eye movements were minimized by paralyzing the animal, both of which were done through continuous injections of chemicals⁴⁵. In addition, the pupils were dilated and supplementary lenses were used to bring the retinal image into focus. This implies that any neural responses had to have either occurred spontaneously or by reflex in response to stimuli. The latter has been shown to be possible⁵⁵, where neurons in the primary visual cortex in anesthetized macaques and spider monkeys were observed to respond to stimuli systematically according to the receptive fields of the respective neurons. In other words, visual processing in the primary visual cortex occurs even when the subject is unconscious.

The authors in⁴⁵ noted that the impact of anesthesia on their study, which focused on correlations between observed neurons, should be minimal. Firstly, it was noted that sufentanil was used for maintaining anesthesia, which they claimed was often used in cortical studies and had no known discrepancy with data from awake animals. Their correlation studies, including those on orientation tuning, was also found to correspond well with other studies. The authors concluded that while it was not possible to rule out some influence of anesthesia, the activity should at least correspond well with awake animals at the level of correlations. Subsequent studies do suggest that opioid anesthesia, such as sufentanil used in this dataset, do impact measured activity; anesthetized responses were observed to be noisier with higher response variability⁵⁶. Subsequent analyses suggest that this could be attributed to fluctuating global network states due to anesthesia⁵⁶. This is independent of stimulation; it has been known that neural activity in the presence of stimulus constitutes a combination of ongoing simultaneous activity and evoked activity from the stimulus, where the ongoing simultaneous component could be as large as the evoked component⁵⁷. These findings could explain why the number of shared edges between trials and stimuli are small. On the other hand, the evoked activity due to stimuli has some level of determinism⁵⁷, although it is nevertheless influenced by the initial state of the network prior to stimulation⁵⁷. This could explain both the presence of shared edges between trials of the structured stimulus (natural movie), and the fact that the number of shared edges is small, as the initial brain state across the 2 trials may not be identical.

In light of the fact that the application of anesthesia does affect neural measurements, our study nevertheless shows that even when anesthetized, the increase in bursting activity in the primary visual cortex does occur when stimulated; this aspect is therefore not removed by anesthesia. The full extent of the similarities and differences between the two states, anesthesia and awake, could only be fully elucidated through a thorough study comprising datasets of both states.

Conclusion

In this work, we extended Jacobian Granger causality (JGC) towards count and binary data through the use of Poisson, Bernoulli, and hurdle loss functions. We compared the different loss functions in JGC using the additive branching process as well as simulated neural spike train data, and observed in particular the limitation of the mean squared error loss in sparse count data. This limitation is overcome by the other loss functions discussed in this work, and we subsequently show through the simulated neural spike train data that JGC outperforms other competing approaches. These results also suggest the strength of the Poisson loss in sparse count data, with no significant benefit in using the hurdle formulation. Therefore, we applied JGC with Poisson loss on real data measuring the primary visual cortex of macaque presented with two types of stimuli (white noise and natural movie) over two repeated measurements. We observed that the inferred networks are sparse in all cases, and that while only a small number of edges are shared between the repeated measurements, the percentage of edges shared is significantly larger with natural stimulus compared to white noise, suggesting a more structured activity evoked by natural stimulus compared to white noise. In addition, we observed a larger set of neurons with positive self-connections in the natural stimulus case, whose burst-like activity is known to encode more visual information. This also agrees with our expectation that natural scenes provide a more meaningful visual stimulus, which has higher information content, compared to white noise.

Supplementary Information

Supplementary Information.^{(472.4KB, pdf)}

Author contributions

S. and L.Y.C. designed the research; S. performed numerical studies and simulations; S. and L. Y. C. analyzed the results; S. and L. Y. C. wrote the manuscript; L.Y.C., Y. S. O. and S. reviewed the manuscript.

Data availability

The datasets generated and/or analyzed during the current study are available in the data repository with the following weblink: https://github.com/suryadi-t/Discrete-JGC/tree/main/Data. The real monkey brain data is available at https://crcns.org/data-sets/vc/pvc-11

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-33385-w.

References

1.Perkel, D. H., Gerstein, G. L. & Moore, G. P. Neuronal spike trains and stochastic point processes: II. Simultaneous spike trains. Biophys. J.7, 419–440 (1967). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. Reconstructing functional neuronal circuits using dynamic bayesian networks. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 5531–5534 (IEEE, 2008). [DOI] [PubMed]
3.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. G. On the use of dynamic Bayesian networks in reconstructing functional neuronal networks from spike train ensembles. Neural Comput.22, 158–189 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Patnaik, D., Laxman, S. & Ramakrishnan, N. Discovering excitatory relationships using dynamic Bayesian networks. Knowl. Inf. Syst.29, 273–303 (2011). [Google Scholar]
5.Chen, R. Causal network inference for neural ensemble activity. Neuroinformatics19, 515–527 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ito, S. et al. Extending transfer entropy improves identification of effective connectivity in a spiking cortical network model. PLoS ONE6, e27431 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Orlandi, J. G., Stetter, O., Soriano, J., Geisel, T. & Battaglia, D. Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging. PLoS ONE9, e98842 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Valdés-Sosa, P. A. et al. Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. R. Soc. B.360, 969–981 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kim, S., Putrino, D., Ghosh, S. & Brown, E. N. A granger causality measure for point process models of ensemble neural spiking activity. PLoS Comput. Biol.7, e1001110 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Sheikhattar, A. et al. Extracting neuronal functional network dynamics via adaptive granger causality analysis. Proc. Natl. Acad. Sci. USA115, E3869–E3878 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Camps-Valls, G. et al. Discovering causal relations and equations from data. Phys. Rep.1044, 1–68 (2023). [Google Scholar]
12.Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).
13.Murphy, K. P. Dynamic Bayesian Networks: Representation, Inference and Learning (University of California, 2002).
14.Shiguihara, P., Lopes, A. D. A. & Mauricio, D. Dynamic Bayesian network modeling, learning, and inference: A survey. IEEE Access9, 117639–117648 (2021). [Google Scholar]
15.Schreiber, T. Measuring information transfer. Phys. Rev. Lett.85, 461 (2000). [DOI] [PubMed] [Google Scholar]
16.Sun, J. & Bollt, E. M. Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings. Physica D267, 49–57 (2014). [Google Scholar]
17.Wibral, M. et al. Measuring information-transfer delays. PLoS ONE8, e55809 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Runge, J., Heitzig, J., Petoukhov, V. & Kurths, J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys. Rev. Lett.108, 258701 (2012). [DOI] [PubMed] [Google Scholar]
19.Papana, A., Papana-Dagiasis, A. & Siggiridou, E. Shortcomings of transfer entropy and partial transfer entropy: Extending them to escape the curse of dimensionality. Int. J. Bifurc. Chaos30, 2050250 (2020). [Google Scholar]
20.Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 424–438 (1969).
21.Barnett, L., Barrett, A. B. & Seth, A. K. Granger causality and transfer entropy are equivalent for gaussian variables. Phys. Rev. Lett.103, 238701 (2009). [DOI] [PubMed] [Google Scholar]
22.Hlavácková-Schindler, K. Equivalence of granger causality and transfer entropy: A generalization. Appl. Math. Sci.5, 3637–3648 (2011). [Google Scholar]
23.Tank, A., Covert, I., Foti, N., Shojaie, A. & Fox, E. B. Neural granger causality. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021). [DOI] [PMC free article] [PubMed]
24.Khanna, S. & Tan, V. Y. Economy statistical recurrent units for inferring nonlinear granger causality. arXiv preprint arXiv:1911.09879 (2019).
25.Nauta, M., Bucur, D. & Seifert, C. Causal discovery with attention-based convolutional neural networks. Mach. Learn. Knowl. Extract.1, 312–340 (2019). [Google Scholar]
26.Marcinkevičs, R. & Vogt, J. E. Interpretable models for granger causality using self-explaining neural networks. arXiv preprint arXiv:2101.07600 (2021).
27.Banerjee, A., Pathak, J., Roy, R., Restrepo, J. G. & Ott, E. Using machine learning to assess short term causal dependence and infer network links. Chaos29, 121104 (2019). [DOI] [PubMed] [Google Scholar]
28.Banerjee, A., Hart, J. D., Roy, R. & Ott, E. Machine learning link inference of noisy delay-coupled networks with optoelectronic experimental tests. Phys. Rev. X11, 031014 (2021). [Google Scholar]
29.Suryadi Chew, L. Y. & Ong, Y.-S. Granger causality using Jacobian in neural networks. Chaos33, 023126. 10.1063/5.0106666 (2023). [DOI] [PubMed] [Google Scholar]
30.Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2, 359–366 (1989). [Google Scholar]
31.Shojaie, A. & Fox, E. B. Granger causality: A review and recent advances. Annu. Rev. Stat. Appl.9, 289–319 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sameshima, K. & Baccalá, L. A. Using partial directed coherence to describe neuronal ensemble interactions. J. Neurosci. Methods94, 93–103 (1999). [DOI] [PubMed] [Google Scholar]
33.Behzadi, S., Hlaváčková-Schindler, K. & Plant, C. Granger causality for heterogeneous processes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining 463–475 (Springer, 2019).
34.Hlaváčková-Schindler, K. & Plant, C. Heterogeneous graphical granger causality by minimum message length. Entropy22, 1400 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Mullahy, J. Specification and testing of some modified count data models. J. Econom.33, 341–365 (1986). [Google Scholar]
36.Heilbron, D. C. Zero-altered and other regression models for count data with added zeros. Biom. J.36, 531–547 (1994). [Google Scholar]
37.Rousseeuw, P. J. & Driessen, K. V. A fast algorithm for the minimum covariance determinant estimator. Technometrics41, 212–223 (1999). [Google Scholar]
38.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
39.Bühlmann, P. & Geer, S. v. d. Generalized linear models and the lasso. In Statistics for High-Dimensional Data 45–53 (Springer, 2011).
40.Bishop, C. M. et al.Neural Networks for Pattern Recognition (Oxford University Press, 1995).
41.Feng, C. X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl.8, 1–19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lambert, D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics34, 1–14 (1992). [Google Scholar]
43.Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B72, 417–473 (2010). [Google Scholar]
44.Kohn, A. & Smith, M. A. Utah array extracellular recordings of spontaneous and visually evoked activity from anesthetized macaque primary visual cortex (V1). CRCNS.org10.6080/K0NC5Z4X (2016).
45.Smith, M. A. & Kohn, A. Spatial and temporal scales of neuronal correlation in primary visual cortex. J. Neurosci.28, 12591–12603 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Alstott, J., Bullmore, E. & Plenz, D. Powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE9, e85777 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat. Commun.10, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Serafino, M. et al. True scale-free networks hidden by finite size effects. Proc. Natl. Acad. Sci. USA118, e2013825118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Watts, D. J. & Strogatz, S. H. Collective dynamics of small-world networks. Nature393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]
50.Yinon, U. & Auerbach, E. Bursting patterns of neurons in the cat’s visual cortex. Exp. Neurol.44, 71–81 (1974). [DOI] [PubMed] [Google Scholar]
51.Onorato, I. et al. A distinct class of bursting neurons with strong gamma synchronization and stimulus selectivity in monkey v1. Neuron105, 180–197 (2020). [DOI] [PubMed] [Google Scholar]
52.Legendy, C. & Salcman, M. Bursts and recurrences of bursts in the spike trains of spontaneously active striate cortex neurons. J. Neurophysiol.53, 926–939 (1985). [DOI] [PubMed] [Google Scholar]
53.Krahe, R. & Gabbiani, F. Burst firing in sensory systems. Nat. Rev. Neurosci.5, 13–23 (2004). [DOI] [PubMed] [Google Scholar]
54.Livingstone, M., Freeman, D. & Hubel, D. Visual responses in v1 of freely viewing monkeys. In Cold Spring Harbor Symposia on Quantitative Biology vol. 61, 27–37 (Cold Spring Harbor Laboratory Press, 1996). [PubMed]
55.Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Ecker, A. S. et al. State dependence of noise correlations in macaque primary visual cortex. Neuron82, 235–248 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Arieli, A., Sterkin, A., Grinvald, A. & Aertsen, A. Dynamics of ongoing activity: Explanation of the large variability in evoked cortical responses. Science273, 1868–1871 (1996). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(472.4KB, pdf)}

Data Availability Statement

[CR1] 1.Perkel, D. H., Gerstein, G. L. & Moore, G. P. Neuronal spike trains and stochastic point processes: II. Simultaneous spike trains. Biophys. J.7, 419–440 (1967). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. Reconstructing functional neuronal circuits using dynamic bayesian networks. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 5531–5534 (IEEE, 2008). [DOI] [PubMed]

[CR3] 3.Eldawlatly, S., Zhou, Y., Jin, R. & Oweiss, K. G. On the use of dynamic Bayesian networks in reconstructing functional neuronal networks from spike train ensembles. Neural Comput.22, 158–189 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Patnaik, D., Laxman, S. & Ramakrishnan, N. Discovering excitatory relationships using dynamic Bayesian networks. Knowl. Inf. Syst.29, 273–303 (2011). [Google Scholar]

[CR5] 5.Chen, R. Causal network inference for neural ensemble activity. Neuroinformatics19, 515–527 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Ito, S. et al. Extending transfer entropy improves identification of effective connectivity in a spiking cortical network model. PLoS ONE6, e27431 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Orlandi, J. G., Stetter, O., Soriano, J., Geisel, T. & Battaglia, D. Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging. PLoS ONE9, e98842 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Valdés-Sosa, P. A. et al. Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. R. Soc. B.360, 969–981 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Kim, S., Putrino, D., Ghosh, S. & Brown, E. N. A granger causality measure for point process models of ensemble neural spiking activity. PLoS Comput. Biol.7, e1001110 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Sheikhattar, A. et al. Extracting neuronal functional network dynamics via adaptive granger causality analysis. Proc. Natl. Acad. Sci. USA115, E3869–E3878 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Camps-Valls, G. et al. Discovering causal relations and equations from data. Phys. Rep.1044, 1–68 (2023). [Google Scholar]

[CR12] 12.Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann, 1988).

[CR13] 13.Murphy, K. P. Dynamic Bayesian Networks: Representation, Inference and Learning (University of California, 2002).

[CR14] 14.Shiguihara, P., Lopes, A. D. A. & Mauricio, D. Dynamic Bayesian network modeling, learning, and inference: A survey. IEEE Access9, 117639–117648 (2021). [Google Scholar]

[CR15] 15.Schreiber, T. Measuring information transfer. Phys. Rev. Lett.85, 461 (2000). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Sun, J. & Bollt, E. M. Causation entropy identifies indirect influences, dominance of neighbors and anticipatory couplings. Physica D267, 49–57 (2014). [Google Scholar]

[CR17] 17.Wibral, M. et al. Measuring information-transfer delays. PLoS ONE8, e55809 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Runge, J., Heitzig, J., Petoukhov, V. & Kurths, J. Escaping the curse of dimensionality in estimating multivariate transfer entropy. Phys. Rev. Lett.108, 258701 (2012). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Papana, A., Papana-Dagiasis, A. & Siggiridou, E. Shortcomings of transfer entropy and partial transfer entropy: Extending them to escape the curse of dimensionality. Int. J. Bifurc. Chaos30, 2050250 (2020). [Google Scholar]

[CR20] 20.Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 424–438 (1969).

[CR21] 21.Barnett, L., Barrett, A. B. & Seth, A. K. Granger causality and transfer entropy are equivalent for gaussian variables. Phys. Rev. Lett.103, 238701 (2009). [DOI] [PubMed] [Google Scholar]

[CR22] 22.Hlavácková-Schindler, K. Equivalence of granger causality and transfer entropy: A generalization. Appl. Math. Sci.5, 3637–3648 (2011). [Google Scholar]

[CR23] 23.Tank, A., Covert, I., Foti, N., Shojaie, A. & Fox, E. B. Neural granger causality. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021). [DOI] [PMC free article] [PubMed]

[CR24] 24.Khanna, S. & Tan, V. Y. Economy statistical recurrent units for inferring nonlinear granger causality. arXiv preprint arXiv:1911.09879 (2019).

[CR25] 25.Nauta, M., Bucur, D. & Seifert, C. Causal discovery with attention-based convolutional neural networks. Mach. Learn. Knowl. Extract.1, 312–340 (2019). [Google Scholar]

[CR26] 26.Marcinkevičs, R. & Vogt, J. E. Interpretable models for granger causality using self-explaining neural networks. arXiv preprint arXiv:2101.07600 (2021).

[CR27] 27.Banerjee, A., Pathak, J., Roy, R., Restrepo, J. G. & Ott, E. Using machine learning to assess short term causal dependence and infer network links. Chaos29, 121104 (2019). [DOI] [PubMed] [Google Scholar]

[CR28] 28.Banerjee, A., Hart, J. D., Roy, R. & Ott, E. Machine learning link inference of noisy delay-coupled networks with optoelectronic experimental tests. Phys. Rev. X11, 031014 (2021). [Google Scholar]

[CR29] 29.Suryadi Chew, L. Y. & Ong, Y.-S. Granger causality using Jacobian in neural networks. Chaos33, 023126. 10.1063/5.0106666 (2023). [DOI] [PubMed] [Google Scholar]

[CR30] 30.Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw.2, 359–366 (1989). [Google Scholar]

[CR31] 31.Shojaie, A. & Fox, E. B. Granger causality: A review and recent advances. Annu. Rev. Stat. Appl.9, 289–319 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Sameshima, K. & Baccalá, L. A. Using partial directed coherence to describe neuronal ensemble interactions. J. Neurosci. Methods94, 93–103 (1999). [DOI] [PubMed] [Google Scholar]

[CR33] 33.Behzadi, S., Hlaváčková-Schindler, K. & Plant, C. Granger causality for heterogeneous processes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining 463–475 (Springer, 2019).

[CR34] 34.Hlaváčková-Schindler, K. & Plant, C. Heterogeneous graphical granger causality by minimum message length. Entropy22, 1400 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Mullahy, J. Specification and testing of some modified count data models. J. Econom.33, 341–365 (1986). [Google Scholar]

[CR36] 36.Heilbron, D. C. Zero-altered and other regression models for count data with added zeros. Biom. J.36, 531–547 (1994). [Google Scholar]

[CR37] 37.Rousseeuw, P. J. & Driessen, K. V. A fast algorithm for the minimum covariance determinant estimator. Technometrics41, 212–223 (1999). [Google Scholar]

[CR38] 38.Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

[CR39] 39.Bühlmann, P. & Geer, S. v. d. Generalized linear models and the lasso. In Statistics for High-Dimensional Data 45–53 (Springer, 2011).

[CR40] 40.Bishop, C. M. et al.Neural Networks for Pattern Recognition (Oxford University Press, 1995).

[CR41] 41.Feng, C. X. A comparison of zero-inflated and hurdle models for modeling zero-inflated count data. J. Stat. Distrib. Appl.8, 1–19 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Lambert, D. Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics34, 1–14 (1992). [Google Scholar]

[CR43] 43.Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. B72, 417–473 (2010). [Google Scholar]

[CR44] 44.Kohn, A. & Smith, M. A. Utah array extracellular recordings of spontaneous and visually evoked activity from anesthetized macaque primary visual cortex (V1). CRCNS.org10.6080/K0NC5Z4X (2016).

[CR45] 45.Smith, M. A. & Kohn, A. Spatial and temporal scales of neuronal correlation in primary visual cortex. J. Neurosci.28, 12591–12603 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Alstott, J., Bullmore, E. & Plenz, D. Powerlaw: A python package for analysis of heavy-tailed distributions. PLoS ONE9, e85777 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat. Commun.10, 1–10 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Serafino, M. et al. True scale-free networks hidden by finite size effects. Proc. Natl. Acad. Sci. USA118, e2013825118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Watts, D. J. & Strogatz, S. H. Collective dynamics of small-world networks. Nature393, 440–442 (1998). [DOI] [PubMed] [Google Scholar]

[CR50] 50.Yinon, U. & Auerbach, E. Bursting patterns of neurons in the cat’s visual cortex. Exp. Neurol.44, 71–81 (1974). [DOI] [PubMed] [Google Scholar]

[CR51] 51.Onorato, I. et al. A distinct class of bursting neurons with strong gamma synchronization and stimulus selectivity in monkey v1. Neuron105, 180–197 (2020). [DOI] [PubMed] [Google Scholar]

[CR52] 52.Legendy, C. & Salcman, M. Bursts and recurrences of bursts in the spike trains of spontaneously active striate cortex neurons. J. Neurophysiol.53, 926–939 (1985). [DOI] [PubMed] [Google Scholar]

[CR53] 53.Krahe, R. & Gabbiani, F. Burst firing in sensory systems. Nat. Rev. Neurosci.5, 13–23 (2004). [DOI] [PubMed] [Google Scholar]

[CR54] 54.Livingstone, M., Freeman, D. & Hubel, D. Visual responses in v1 of freely viewing monkeys. In Cold Spring Harbor Symposia on Quantitative Biology vol. 61, 27–37 (Cold Spring Harbor Laboratory Press, 1996). [PubMed]

[CR55] 55.Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Ecker, A. S. et al. State dependence of noise correlations in macaque primary visual cortex. Neuron82, 235–248 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Arieli, A., Sterkin, A., Grinvald, A. & Aertsen, A. Dynamics of ongoing activity: Explanation of the large variability in evoked cortical responses. Science273, 1868–1871 (1996). [DOI] [PubMed] [Google Scholar]

PERMALINK

Jacobian Granger causality for count and binary data with applications to causal network inference

Suryadi

Lock Yue Chew

Yew-Soon Ong

Abstract

Introduction

Methods

Jacobian Granger causality

Fig. 1.

Fig. 2.

Algorithm 1.

Loss functions for heterogeneous data types

Poisson loss for count data

Binary cross-entropy for binary data

Hurdle loss for sparse count data

Selection of

Experimental results and discussion

Additive branching process

Fig. 3.

Fig. 4.

Simulated neural spike train

Fig. 5.

Fig. 6.

Real spike train data in monkey visual cortex

Fig. 7.

Fig. 8.

Fig. 9.

Table 1.

Table 2.

Fig. 10.

Conclusion

Supplementary Information

Author contributions

Data availability

Declarations

Competing interests

Footnotes

Supplementary Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases