Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Feb 10.
Published in final edited form as: Neuroimage. 2013 May 3;79:241–263. doi: 10.1016/j.neuroimage.2013.04.091

Spatio-temporal Granger causality: a new framework

Qiang Luo 1,2,, Wenlian Lu 3,4,5,, Wei Cheng 3,, Pedro A Valdes-Sosa 6, Xiaotong Wen 7, Mingzhou Ding 7, Jianfeng Feng 2,3,4,5,*
PMCID: PMC4323191  NIHMSID: NIHMS654943  PMID: 23643924

Abstract

That physiological oscillations of various frequencies are present in fMRI signals is the rule, not the exception. Herein, we propose a novel theoretical framework, spatio-temporal Granger causality, which allows us to more reliably and precisely estimate the Granger causality from experimental datasets possessing time-varying properties caused by physiological oscillations. Within this framework, Granger causality is redefined as a global index measuring the directed information flow between two time series with time-varying properties. Both theoretical analyses and numerical examples demonstrate that Granger causality is a monotonically increasing function of the temporal resolution used in the estimation. This is consistent with the general principle of coarse graining, which causes information loss by smoothing out very fine-scale details in time and space. Our results confirm that the Granger causality at the finer spatio-temporal scales considerably outperforms the traditional approach in terms of an improved consistency between two resting-state scans of the same subject. To optimally estimate the Granger causality, the proposed theoretical framework is implemented through a combination of several approaches, such as dividing the optimal time window and estimating the parameters at the fine temporal and spatial scales. Taken together, our approach provides a novel and robust framework for estimating the Granger causality from fMRI, EEG, and other related data.

Introduction

Granger causality, a standard statistical tool for detecting the directional influence of system components, plays a key role in understanding systems behaviour in many different areas, including economics (Chen et al., 2011), climate studies (Evan et al., 2011), genetics (Zhu et al., 2010) and neuroscience (Ge et al., 2012; Ge et al., 2009; Guo et al., 2008; Luo et al., 2011). The concept of Granger causality was originally proposed by Wiener in 1956 (Wiener, 1956), and introduced into data analysis by Granger in 1969 (Granger, 1969). The idea can be briefly described as follows: If the historical information of time series A significantly improves the prediction accuracy of the future of time series B in a multivariate autoregressive (MVAR) model, then the Granger causality from time series A to B is identified. In classic Granger causality, time-invariant MVAR models are used to fit the experimental data of the observed time series.

However, a time-varying property is a common phenomenon in various systems. For example, the gene regulatory network in Saccharomyces cerevisiae was reported to evolve its topology (Luscombe et al., 2004) with respect to different stimuli or different life processes. A time-varying protein-protein interaction network for p53 was reported in (Tuncbag et al., 2009), and the authors subsequently suggested the use of a 4D view of a protein-protein interaction network, with time being the 4th dimension. In the primary visual cortex of anaesthetized macaque monkeys, ensembles of neurons have dynamically reorganized their effective connectivity moment to moment (Ohiorhenuan et al., 2010). The importance of a slow oscillation, such as the theta rhythm, in a neuronal system was analysed in (Smerieri et al., 2010). It should be pointed out that even if the time series data are observed to be weakly stationary (i.e., stationary in the second moment), the system configuration may be time-varying. A typical example of this is Xt = acos(ωt + Ut) + ξt, where t is time, a and ω are constants, Ut ~ U[−π, π] is a uniform distribution, and ξt is noise. It is thus natural to consider time-varying systems and attempt to understand their impact on the estimation of Granger causality.

Analysing systems with time-varying structures has recently attracted greater interest, and many statistical methods have been proposed. An adaptive multivariate autoregressive model using short sliding time windows was proposed in (Ding et al., 2000) to deal with a non-stationary, event-related potential (ERP) time series. Inspecting the directed interdependencies of electroencephalography (EEG) data, a short time window approach to define time-dependent Granger causality was proposed in (Hesse et al., 2003). Time-varying Granger causality was also modelled using Markov-switching models in (Psaradakis et al., 2005). In these models, time-varying Granger causality was modelled using a hidden discrete Markov process with a finite state space. Wavelet-based time-varying Granger causality to establish the functional connectivity maps from fMRI data was suggested in (Sato et al., 2006). Considering the time-series data as independent and identically distributed observations, a method to infer the time-varying biological and social networks was proposed in (Ahmed and Xing, 2009), but this method did not provide the directional information of the time-varying relationship between variables. In (Havlicek et al., 2010; Sommerlade et al., 2012), the dual Kalman filter was used to establish time-varying Granger causality between non-stationary time series. These approaches extended the classic Granger causality analysis to a non-stationary case through adaptive multivariate autoregressive modelling under the assumption that the coefficients in the time-varying MVAR model can be modelled by a random walk. As a response to research dealing with the time-varying properties in the MVAR model, and the definition of Granger causality as a function with respect to time, we propose the use of a robust global index for measuring the direct information flow between time series, despite the time-varying properties. Granger causality is currently a popular model for this purpose, but classic Granger causality does not consider the time-varying properties of the data. Moreover, it is a widely held misconception that the longer the time series we have, the more reliable the results that are obtainable for Granger causality.

The aims of this paper are twofold:

  1. We answer the following question: What is the impact of the temporal scale in MVAR models on the resulting directional influence of Granger causality? For Gaussian variables, Granger causality is equivalent to the directed information transfer between variables. The question therefore becomes how the temporal scale in the MVAR model influences the estimation of the information flows between each variable within a system. In (Smith et al., 2011), the authors compared the performances of Granger causality analyses with different time lengths, and found that the longer the time series was, the better the performance. In their simulations, however, the underlying circuit stayed the same. In this paper, we investigate the effects of time-varying underlying circuits on a Granger causality analysis both mathematically and empirically.

  2. The second aim of this paper is to provide an efficient algorithm for estimating the global Granger causality index between two time series without any prior knowledge of the TV-MVAR model. It should be emphasised that there is a trade-off between the fineness of the change-point set and the accuracy of the estimation of the coefficients at each time window. Time windows that are too short might prevent a reliable estimation of the parameters. Time windows that are too long, on the other hand, might increase the probability of an incorrect inference of Granger causality. Based on Bayesian information criterion (BIC) and a change-point searching algorithm, we propose a method for determining the optimal size of a change-point set and the optimal change-points as a means to achieve the optimal balance between the fineness of the Granger causality and the accuracy of the model estimation. The theoretical results and algorithms were verified by estimating the average and cumulative Granger causalities on the simulated and experimental data, both of which confirmed that a finer change-point set provides a larger overall causality measurement.

To achieve the above goals, the effect of a time-varying causal structure on a Granger causality analysis was investigated mathematically, where the following notations were used. Consider two time series x and y over time window [0,T]. The change-point set S1 = {0 = t0 < t1 < ⋯ < tm = T} defines the time-varying property of the MVAR model as follows: at each time window [tk−1,tk), the MVAR model is static, i.e., the interacting coefficients between variables are constants; in different time windows, however, these may differ. In this case, it becomes a time-varying MVAR (TV-MVAR) model. There are two alternatives for estimating the Granger causality from y to x in the TV-MVAR model with respect to the change-point set, S1. One is to estimate the local Granger causality at each time window [tk−1,tk) and then average them, which is called the average Granger causality, Fyx(a,S1). The other is to average the variances of the residual errors locally at each small time window so that the cumulative Granger causality, Fyx(c,S1), can be established by comparing the estimated variances of the residual errors of x by considering whether y can predict the future of x. The TV-MVAR model depends on the change-point set that divides the whole duration into finer time windows, as shown in Figure 1. We therefore need to address the relationship between the causality definition and the fineness of the change-point set in the TV-MVAR model.

Figure 1.

Figure 1

Monotonicity of the cumulative and average Granger causalities. If we consider finer time windows with the same length, the change-point set can be derived from the window length, and thus the causality established by different change-point sets can be equivalently denoted by the corresponding window lengths mi for Si.

We proved that both cumulative and average Granger causalities are generally monotonically increasing functions with respect to the fineness of the change-point set (see Figure 1 for a summary, and Appendices A, B, and C for theory proofs). That is, the finer the TV-MVAR model is, the larger the change-point set is, and the larger the (average and cumulative) Granger causalities that can be estimated. In particular, as shown in Theorems B1 and B4, under certain assumptions, the estimation of the coefficients in the coarser MVAR model is the (weighted) average among those of the finer model. Hence, if the “true” time-varying coefficients are nonzero but fluctuate at around zero, the “averaging” estimation may reduce the estimated Granger causality to zero and give an incorrect inference of Granger causality.

Empirically, we demonstrated the robustness of the proposed spatio-temporal Granger causality analysis by computing the Pearson’s correlation coefficients between the Granger causality patterns using two scanning sessions on the same subject from the enhanced Nathan Kline Institute-Rockland Sample (see Materials and Methods). By considering the spatio-temporal details of the fMRI data for the TV-MVAR model, Granger causality has much greater consistency across two scanning sessions for the same subject. In particular, the correlation coefficient greatly increases from 0.3588 using classic Granger causality with a static MVAR model and region-wise estimation, to 0.6059 through our approach, which includes the optimal TV-MVAR model and voxel-wise estimation.

The theoretical results have also been confirmed using two experimental fMRI datasets: a resting-state dataset and a task-associated dataset. For the resting-state fMRI dataset, the classic Granger causality analysis failed to identify any significant causal connectivity to the precuneus. In comparison, at a finer-scale for the TV-MVAR model, our Granger causality approaches indicate that the precuneus serves as a hub for information transfer in the brain. Information flows between the precuneus and visual regions were revealed, which is consistent with an experimental setting in which the data were collected when the subjects’ eyes were open. For the task-associated fMRI dataset, the estimation of the average Granger causality for the attention blocks was found to be significantly larger than that estimated through classic Granger causality based on a static MVAR model for the whole time series for all twelve subjects used in the experiment.

Materials and Methods

Generation of Time Series with Time-varying Causal Structure

1) Generation of Time Series with Continuous Time-varying Causal Structure

Consider two time series and the effective interdependencies between them, as described using the TV-MVAR model with a constant noise level. The time series were generated through the following toy model:

(xt+1yt+1)=(A11(t)A12(t)A21(t)A22(t))(xtyt)+(nxytnyxt), (1)

where

A11(t)=0.1,A12(t)=0.5(t6001)·u1,A21(t)=0.5(1t400)·u2,A22(t)=0.12.

We generated this toy model 100 times by randomly setting the parameters u1 and u2 according to a uniform distribution at an interval of [0,1]. For each model, the time series observations were generated for 1200 time steps. The parameters A12 and A21 correspond to the causal influences in the YX and XY directions, respectively. A significant nonzero causal coefficient indicates the causal influence in the corresponding direction. In this simulation, we specified a change in the causal coefficient from positive to negative.

2) Generation of Time Series with Stepwise Time-varying Causal Structure

Consider a TV-MVAR model of two components with only one directional causal influence, XY; namely, setting the corresponding coefficient A21 to have nonzero values. This model was derived from Eq. (1) with the step-wise coefficients as follows:

A11(t)=0.1,A22(t)=0.12,A12(t)=0,A21(t)={0.5u1,0<tt10,t1<tt20.5u1,t2<tt30,t3<tT (2)

where t1 = 215, t2 = 415, and t3 = 715. We generated two time series with 1200 time points and repeated this generation 100 times by randomly setting the parameter u1 from a uniform distribution at an interval of [0.5,1.5]. In this simulation, the causal coefficient A12 for the YX direction was set to zero, and thus there was no causal influence from Y to X, and the causal coefficient A21 varied across different time windows.

3) Generation of BOLD Signal with Time-varying Effective Connection

Herein, we simulated the fMRI time series of two brain regions, X and Y, for 400 s. By introducing a time-varying causal structure, the simulation scheme for the fMRI data in (Schippers et al., 2011) was adopted. First, a neuronal interaction (local field potential, or LFP) was simulated using a bi-dimensional first-order TV-MVAR model with a time step of 10 ms:

(xt+1yt+1)=A(xtyt)+(nxytnyxt), (3)

where

A11=0.9,A22=0.9,A12=0,A21={0.5,0<t180000.5.18000<t40000

The model had an causal influence from X to Y of a predetermined time-varying strength, A21, with no influence from Y to X.

Second, both signals were convolved with the default hemodynamic response models from the SPM5 toolbox, and Gaussian noises were added as physiological noise in the BOLD response. The HRF was specified through seven model parameters: delay of response relative to onset (in seconds), delay of undershoot relative to onset (in seconds), dispersion of response, dispersion of undershoot, ratio of response to undershoot, onset (in seconds), and length of kernel. To investigate the effect of hemodynamic response variability on the Granger causality analysis, we systematically varied the delay of response ranging from 0 to 5 s. To mimic the neuronal delay between the cause-region to the effect-region, time series Y was shifted by 50 ms against X before the convolution of the HRF (Deshpande et al., 2010; Schippers et al., 2011; Smith et al., 2012).

Third, BOLD signals were generated by down-sampling the convolved time series by 2 Hz as a high sampling rate, and 1 Hz as a low sampling rate (resembling an acquisition rate (TR) of an MR-scanner), and Gaussian noise was again added as acquisition noise. After each step, the signals were normalized to zero means and unit variances. The total amount of noise added was 20%.

Experimental fMRI Datasets

1) Multiband Imaging Test-Retest Pilot Dataset

This set of fMRI data comes from the enhanced Nathan Kline Institute-Rockland Sample. The whole dataset consists of resting-state fMRI recordings from two sessions for seventeen subjects (healthy, aged 19–57, thirteen males and four females).

The fMRI data were collected using 3 Tesla, and forty slices were acquired for 900 volumes. Multiband echo planar imaging approaches enable the acquisition of fMRI data with unprecedented sampling rates (TR = 0.645 s) for full-brain coverage through an acquisition of multiple slices simultaneously at the same time. For more detailed information about this data set, please see the website at http://fcon_1000.projects.nitrc.org/indi/pro/eNKI_RS_TRT/FrontPage.html.

Data pre-processing was performed using DPARSF software (Yan and Zang, 2010). The first fifty volumes were discarded to allow for scanner stabilisation. Since multiple slices were excited simultaneously, a simple slice time correction might not work well. Given its short effective TR, such a correction is probably less important, and is therefore omitted in our data pre-processing. After the realignment for head-motion correction, the standard Montreal Neurological Institute (MNI) template provided by SPM2 was used for spatial normalization with a re-sampling voxel size of 3×3×3 mm3. After smoothing (FWHM = 8 mm), the imaging data were temporally filtered (band pass, 0.01–0.08 Hz) to remove the effects of a very low-frequency drift and high-frequency noises (e.g., respiratory and cardiac rhythms). An automated anatomical labelling (AAL) atlas (Tzourio-Mazoyer et al., 2002) was used to parcellate the brain into ninety regions of interest (ROIs). To verify the principle of voxel-level Granger causality, the brain was also divided into 1024 ROIs with around 45 voxels each according to a high-resolution brain atlas provided by (Zalesky et al., 2010).

2) Resting-State fMRI Dataset

The resting-state fMRI dataset is a subset of a large database, called the 1000 Functional Connectomes Project (Biswal et al., 2010), which is freely accessible at www.nitrc.org/projects/fcon_1000/. The dataset provided by Buckner’s group at Cambridge, USA, was used for the present study. This dataset consists of 198 healthy subjects (75 males and 123 females, aged 18–30). The fMRI data (TR = 3 s) were collected using 3 Tesla, and 47 slices were acquired for 119 volumes. Further details about this dataset can be found at the website provided above.

The first five volumes were discarded to allow for scanner stabilization. DPARSF (Yan and Zang, 2010), which is based on SPM8, was used for pre-processing the fMRI data, including slice-timing correction, motion correction, co-registration, grey/white matter segmentation, and spatial normalization into a Montreal Neurological Institute (MNI) space, then and re-sampled to 3×3×3 mm3. The waveform of each voxel was detrended and passed through a band-pass filter of 0.01 to 0.08 Hz. The data were smoothed spatially (FWHM = 8 mm). As a result, time series data with 114 time points from ninety brain regions (AAL-atlas) for 198 subjects were achieved.

3) fMRI Dataset for Attention Task

The dataset of an fMRI time series for an attention-task experiment was provided by the Ding Group at the University of Florida, USA (Wen et al., 2012), which consisted of twelve subjects who successfully completed the task (eight females and four males, aged 20–28). This experiment adopted a mixed blocked/event-related design. There were twelve attention blocks and twelve passive-view blocks, along with some fixation intervals. In each attention block, the subjects performed a trial-by-trail cued visual spatial-attention task. The fMRI data were collected using 3 Tesla, and 33 slices were acquired for 180 volumes for each of the six runs with TR 2s. The dataset was pre-processed by slice timing, motion correction, co-registration to an individual anatomical image, and normalization to the Montreal Neurological Institute (MNI) template, and then resampled to 3×3×3 mm3, using DPARSF. The hemodynamic response function (HRF) was convolved by the blocked rectangular function corresponding to the given experimental condition during the GLM analysis. For more detailed information about this dataset, please see (Wen et al., 2012).

For each attention block, there were thirty data points, lasting for 60 s. The task average response was removed from each attention block by subtracting the mean of the time series data across twelve attention blocks. The first five data points (10 s) were discarded to eliminate the transient effects. The temporal mean was removed for each attention block to meet the zero mean requirement of the Granger causality analysis. Therefore, we had 300 data points for the twelve attention blocks. Herein, the causality between the right intra-parietal sulcus (rIPS) and right temporal-parietal junction (rTPJ) was studied. Time series of nineteen and seventeen voxels were used for rIPS and rTPJ, respectively (Wen et al., 2012).

Granger Causality in TV-MVAR

For two time series xt and yt, with t = 1,2,⋯, T, define a change-point set as an increasing integer series of 1 = t0 < t1 < ⋯ < tm−1 < tm = T + 1, denoted by S1. Consider the following piece-wise constant linear system to describe the directional influence from yt to xt :

xt+1=a1¯S1(k)xt+b1¯S1(k)yt+n(t),tk1t<tk,k=1,,m (4)

where a1¯S1(k) and b¯1S1(k) are the estimated time-varying coefficients from S1. In addition, when ignoring the directed causality from yt to xt, Eq. (4) becomes

xt+1=ã1S1(k)xt+ñ(t),tk1t<tk,k=1,,m (5)

where ã1S1(k) is the estimated time-varying coefficient in this model. At the kth time window, the Granger causality can be defined locally as

Fyx(k,S1)=log[t=tk1tk1var(ñ(t))t=tk1tk1var(n(t))].

The average Granger causality with respect to S1 can be estimated through the average of the Granger causalities at the time windows and weighted by the corresponding window lengths:

Fyx(a,S1)=1Tk=1mFyx(k,S1)(tktk1). (6)

If the length of each time window is uniform, it becomes

Fyx(a,S1)=1mk=1mFyx(k,S1).

An alternative way to compute Granger causality is cumulating the residual square errors across all time windows. This is called cumulative Granger causality with respect to S1, and can be estimated by

FYX(c,S1)=log[t=1Tvar(ñ(t))t=1Tvar(n(t))]=log[k=1mt=tk1tk1var(ñ(t))k=1mt=tk1tk1var(n(t))]. (7)

In particular, if random variable yt is stochastically orthogonal to xt at each time, i.e., E [(xtExt) (ytEyt)] = 0 for all t, the cumulative Granger causality can be estimated as

FYX(c,S1)=log[k=1mt=tk1tk1[a1(t)ã1S1(k)]2var(xt)+k=1mt=tk1tk1[b1(t)]2var(yt)+k=1mt=tk1tk1var(n(t))k=1mt=tk1tk1[a1(t)a1¯S1(k)]2var(xt)+k=1mt=tk1tk1[b1(t)b1¯S1(k)]2var(yt)+k=1mt=tk1tk1var(n(t))].

For details on the derivative of the Granger causality expressions, please see Appendix A. Herein, only a first-order regression model with one-dimensional variables is considered, but the approach and resulting work on a general high-order and high dimensional TV-MVAR model will be discussed in a future paper.

Since Fyx(k,S1) obeys an F-distribution after proper scaling in each time window, the average Granger causality defined above can be considered in the null hypothesis as the summation of m independent F-distributed random variables whose degrees-of-freedom can be given according to the number of free parameters and the length of each time window, particularly 1 and tktk−1 − 3. Therefore, the p-value for the significance of average Granger causality can be calculated. Similarly, cumulative Granger causality as defined above also obeys an F-distribution with degrees-of-freedom of m and T-2m−1.

Optimal Time Window Division

In practice, the true time-varying structure of the data is unknown. In particular, we do not know how many change-points there are, or the length of each time window. Therefore, an algorithm for time-window division is necessary. Equivalently, we are searching for the optimal change-point set. The optimal time-window division indicates a trade-off between the satisfactory accuracy of the model parameter estimation and the lossless causal information established by the model. Mathematically, consider the following step-wise TV-MVAR model

X(t+1)=k=1ma1kX(t)I[tk1,tk)+n(t), (8)

where I[tk−1,tk] is the characteristic function of time window [tk−1,tk), n(t) is a Gaussian white noise term, a1k represents a (constant) coefficient in the kth interval, and

S(m)={t1,,tm1|1=t0<t1<<tm1<tm=T+1}

is the change-point set. Given the change-points, the model can be fit into each time window as â1k, and the variance of the residual errors can be estimated for each time window, denoted by Σ̂k. Therefore, the accuracy of the model can be defined based on the weighted average of the variances of the residuals in each time window as follows:

err(S(m))=1mk=1m(tktk1)det(Σ^k). (9)

On the other hand, the information captured by this model can be measured based on the average Granger causality in all directions defined in the previous section, as noted by

agc(S(m))=12mk=1m(Fyx(k,S(m))+Fxy(k,S(m))). (10)

To minimize the prediction error and maximize the detected causality information, the optimal window division can be derived by optimising the following cost function with the trade-off parameter λ

Sopt(m,λ)=arg minS(m)(err(S(m))+λ/agc(S(m))). (11)

Given the trade-off parameter λ0 and lower bound l0 of the lengths of the divided time windows, the optimal change-points SOpt(m, λ0) can be established by solving the following constrained optimization problem

minS(m)(err(S(m))+λ0/agc(S(m)))s.t.tktk1l0for allk=1,2,,m. (12)

A constrained condition is required for a reliable estimation of the model coefficients in Eq. (8) at each divided time window. This constrained optimization problem can be solved based on the optimization functions provided in Matlab. In this paper, we used the fmincon function for a nonlinear constrained optimization problem.

To determine the parameter, we search for the optimal change-point set SOpt(m, λ) for different λ ∈ [λ1, λ2], and then calculate the Bayesian information criterion (BIC) for this change-point set as follows:

BIC(m,λ)=2k=1mLLFk+22mlog(T+1), (13)

where LLFk stands for the log likelihood function established for the kth window. The first step is searching for the optimal change-point set with a series of given time windows, m ∈ [0,1,2,⋯, m0], and trade-off parameter, λ ∈ [λ1, λ2]. The second step is to compare the BIC values established by different change-point sets generated from the first step, and the one with the smallest BIC is then selected to define the optimal time window. Therefore, using the fixed upper bound of the number of time windows, denoted by m0, the algorithm for the optimal time window division can be described as follows:

      Algorithm for optimal time window division
For λ = from λ1 to λ2
  For m = from 1 to m0
    Establish Sopt (m, λ) by solving the constrained optimization problem (12)
  End
  Calculate the BIC for each Sopt (m, λ) by (13)
End
Find the optimal Sopt (mopt, λopt) with the smallest BIC

Spatio-temporal Granger Causality

Furthermore, both spatial and temporal fineness are taken into the MVAR model. The idea of a spatial finer-scale for Granger causality estimation is similar to that of the time-varying Granger causality mentioned above. Consider a dataset of fMRI BOLD signals from m voxels in ROI A, and n voxels in ROI B. For each pair of voxels in these two ROIs, the Granger causality between the voxel pair is calculated for each subject, denoted by Fij, from the ith voxel in ROI A to the jth voxel in ROI B; the global Granger causality from ROI A to ROI B, namely, voxel-level Granger causality, is then defined as follows:

FAB=1mniROIAjROIBFij.

Furthermore, the temporal and the spatially fine-scales are combined together to give the optimal estimation of Granger causality by looking into the temporal details for each pair of voxels, which is called spatio-temporal Granger causality (stGC):

FAB(e,S)=1mniROIAjROIBFij(e,S),

with e = a or c for average and cumulative (time-varying) Granger causalities, respectively. In comparison, classic Granger causality usually estimates the causality between two ROIs by averaging the time series data among all voxels for each ROI with a static MVAR model.

A Matlab package for the estimation of the spatio-temporal GC is available at http://www.dcs.warwick.ac.uk/~feng/causality.html.

Results

Monotonicity of Granger Causality with Respect to Change-point Set

To demonstrate the monotonicity of the proposed Granger causality measurements, the proposed algorithms were applied to a simulation dataset with a continuous time-varying causal structure. We used three different time-window lengths of 50, 200, and 400, and the corresponding change-point sets for these time windows denoted as Si, i = 1,2,3, respectively. Theorems B1 and B4 in Appendix B show that FYXa,S1FYXa,S2FYXa,S3 and FYXc,S1FYXc,S2FYXc,S3 hold if the parameters are precisely estimated since S1S2S3. To demonstrate this, 95% confidence intervals of D1a=FYXa,S1FYXa,S2,D2a=FYXa,S2FYXa,S3 and D3a=FYXa,S1FYXa,S3 were established for the causality results in 100 runs of the simulated toy model. Similarly, Dic, i = 1,2,3 were defined, and their confidence intervals established. From Table 1, we can see that the estimated Granger causality for the same pair of time series decreases with respect to the length of the time windows. That is, the more change-points that are used in the TV-MVAR model, i.e., the finer the model is, the larger the Granger causality that can be estimated.

Table 1.

The 95% confidence intervals of the differences between the cumulative causality measurements established from time windows with different lengths.

direction XY YX
quantile 0.025th 0.975th 0.025th 0.975th
D1a
0.0078 0.0321 0.0053 0.0341
D2a
0.0085 0.0448 0.0109 0.0453
D3a
0.0002 0.0128 0.0003 0.0229
D1c
0.0078 0.0285 0.0055 0.0333
D2c
0.0105 0.0437 0.0116 0.0480
D3c
0.0002 0.0156 0.0003 0.0264

To show the accuracies of the model estimation, we compared the differences in the variances of the model residual errors given by different algorithms, including the static MVAR model by the whole time series, denoted by Err[1,1200], and the average variances of the model residual errors for the TV-MVAR model over the time windows, Errsi¯, as Di=Err[1,1200]Errsi¯ for i =1,2,3. As shown in Figure 2A, the TV-MVAR models with different time-window lengths all have smaller variances than the static MVAR model fit onto the whole time series for all 100 toy models, i.e., Di > 0 holds for the 100 toy model runs. Among the models with different time window sets, the one with the smallest time-window length, which had S1 as the change-point set, provided the most accurate estimation of the simulated time series.

Figure 2.

Figure 2

Results of the simulation model: (A) Residual variance comparison between the models established from the whole time series and those models fitted on different time window sets. (B) Optimally detected change-points for 100 simulations. (C) Mean of the TP and FP rates given by different methods for each HRF delay in 100 simulations. (D) The TP rate is plotted against the FP rate given by different approaches for each threshold of the p-values in 100 simulations

Significance of Granger Causality

To compare the significance of the results detected by our time-varying Granger causality approach with those detected by classic Granger causality, we applied these algorithms on the simulation dataset with a stepwise causal structure. When the p-value was lower than the threshold, a significant directional influence was detected. In this simulation setup, a causal influence existed from X to Y, but not from Y to X. The usual definitions of the truth positive (TP), false positive (FP), truth negative (TN) and false negative (FN) were used. In addition, the maximum number of time windows was set to m0 = 5, and the trade-off parameter ranged from λ1 = 0.02 to λ2 = 1with a step size of 0.02. Five types of Granger causalities, classic Granger causality (classic GC), average Granger causality (average GC), cumulative Granger causality (cumulative GC), average Granger causality with optimally divided time windows (Opt average GC), and cumulative Granger causality with optimally divided time windows (Opt cumulative GC) were calculated based on the simulation data using a significance test.

As shown in Table 2, the classic GC failed to identify any causal influence between these two time series. Cumulative GC and average GC provided better results in terms of higher TP and TN rates than classic GC. Compared to other algorithms, average GC and cumulative GC with optimally time window division provided the best performances in terms of the TP and TN rates among all of the causalities. In particular, as shown in Theorems B1 and B4, under our assumption, in the coarser MVAR model, the estimation of the coefficients is the (weighted) average of those of the finer model. As an intuitive interpretation, if the “true” time-varying coefficients are nonzero but have fluctuating signs, for example, they equal 1 at the first time interval and −1 at the last time interval with same length, the “averaging” estimation becomes zero owing to the neutralisation, even if the coefficient parameters are precisely estimated. Thus, in the static MVAR model, we will incorrectly infer that no Granger causality exists. A similar argument holds for a comparison of the finer and coarser MVAR models. Therefore, the coarseness of the TV-MVAR model might increase the probability of an incorrect inference of Granger causality.

Table 2. Performance comparison of different methods.

The term ‘Classical GC’ is short for classic Granger causality, ‘Cumulative GC’ stands for cumulative Granger causality, ‘Average GC’ is the average Granger causality, ‘Opt Cumulative GC’ stands for the cumulative Granger causality with optimally divided time windows, and ‘Opt Average GC’ indicates the average Granger causality with the optimally determined time windows. XY and YXare the average values of the Granger causality measurements for over 100 simulations, and BIC¯ is the mean value of the BIC for 100 simulations. The threshold for significance is 10−12.

Method Window
Length
TP FP TN FN FXY FYX
BIC¯
Classical GC 1200 0% 0% 100% 100% 0.0023 0.0007 6.9489
Cumulative GC 10 50% 0% 100% 50% 0.2484 0.1331 8.5869
50 72% 0% 100% 28% 0.1303 0.0208 7.2041
100 75% 0% 100% 25% 0.1177 0.0101 7.0188
300 70% 0% 100% 30% 0.0709 0.0032 6.9417
600 0% 0% 100% 100% 0.0071 0.0015 6.9819
Average GC 10 39% 0% 100% 61% 0.2620 0.1516 8.5869
50 85% 0% 100% 15% 0.1219 0.0212 7.2041
100 93% 0% 100% 7% 0.1094 0.0102 7.0188
300 92% 0% 100% 8% 0.0681 0.0032 6.9417
600 14% 0% 100% 86% 0.0074 0.0015 6.9819
Opt Cumulative GC 90% 3% 97% 10% 0.1320 0.0131 6.8780
Opt Average GC 94% 2% 98% 6% 0.1453 0.0080 6.8780

To demonstrate the performance of the optimal GC estimations using time windows with equal lengths, we compared the accuracies of the results given by the optimal GCs with different time-window lengths. We found that better performances were achieved if their time-window division was similar to the real structure of the simulation data. Since the real change-points of this simulation were 215, 415, and 715, both algorithms presented better results when the time-window lengths were 100 or 300. In comparison, the performances worsened when the time-window lengths were either longer or shorter.

To test whether the larger magnitude of Granger causality estimated by the optimal GCs increases the false positive (FP) rate, Table 2 also lists the FP rates given by different GCs with different window lengths. We found that the FP rates of both cumulative GC and average GC with different window lengths were zero when the threshold of the significance was 10−12 (for the F statistics). Therefore, the FP rates of these two algorithms did not increase with respect to the GC values, as the lengths of the time windows shortened. Since the degrees-of-freedom of the F statistics depended on the number of change-points, the GC value increased with shorter time windows. However, since the corresponding F distribution also changed with the number of change-points, the FP rates might not have increased.

To assess the rationality of the BIC-based optimal time-window dividing algorithm, the BIC values were also reported and compared among the simulations. As shown in Table 2, good BIC values were achieved when the change-point set for the average and cumulative GCs was similar to the true structure of the simulated data. This suggests that the BIC values can work for choosing change-points to achieve the best performance. We chose the change-point set optimally instead of using time windows with equal length. Table 2 also shows that, compared to algorithms with an equal time window division, algorithms with optimally change-point searching provide better TP rates, but slightly worse FP rates, namely, 3% for the opt cumulative GC, and 2% for the opt average GC, in the simulation data. Figure 2B shows that the real change-points for 100 simulations using the proposed BIC-based optimal algorithm were successfully identified for most of the simulations.

To compare the computational complexities among the different algorithms, we reported the running time of each algorithm on the simulated dataset. As listed in Table 3, because the method for optimally dividing the time windows is very time-consuming, the greater the number of time windows we used, the greater the amount of time that was required to run the algorithm. In practice, since the underlying time-varying structure of the data is unknown, we can either run the optimal time-window dividing algorithm, or try different time-window lengths and select the optimal length through a comparison of their BICs.

Table 3. Comparison of running time (in seconds) for different methods.

The classic Granger causality treats the whole time series as one time window. The average Granger causality and cumulative Granger causality were applied to the data by dividing the time series into time windows with equal lengths. The average Granger causality and cumulative Granger causality were also used after optimally dividing the time windows using the proposed algorithm. This simulation was carried out by a computer with an Intel® Core™ 2 CPU T5600 @ 1.83GHz, 1.83GHz, and 1.5G RAM.

One time
window
Time windows with equal length Optimally
divided time
windows
Window length 1200 600 300 100 50 10
Running time 0.0073 0.2936 0.4697 0.9969 1.9747 9.8709 346.5863

Effect of Regional Variation in HRF on Granger Causality Analysis

The effects of the HRF delay of the response on the Granger causality analysis were simulated by setting the delay of the response relative to the onset of the HRF for brain region X as the parameter for brain region Y plus a delay ranging from 0 to 3 s. Therefore, the underlying causal influence existed from X to Y, but the HRF of cause-region X was slower than that of effect-region Y. We refer to this delay as the opposite HRF delay. The longer this delay is, the more difficult it is for a Granger causality analysis to detect the causal influence correctly. Setting the threshold of the p-value for a significant causality as 10−6, Figure 2C plots the TP and FP rates of different algorithms as the opposite HRF delay varies from 0 to 3 s. Opt average GC and opt cumulative GC performed similarly during the simulation, and therefore, only the results for opt average GC are shown. We can see that the optimal Granger causality performed well as long as the opposite HRF delay was less than 100 ms. When the opposite HRF delay was greater than 100 ms, the FP rate increased and the TP rate dropped rapidly. The TP rate increased again when the opposite HRF delay exceeded 0.4 s because an opposite HRF delay was generated by changing the shape of the HRF (Deshpande et al., 2010). We also carried out our simulations by changing the onset time of the HRF instead of changing its shape, and obtained similar results (data not shown).

To test whether the opposite delay in the HRF can be corrected for the optimal GC algorithms, we realigned the simulated BOLD signal according to the HRF delay between two regions by assuming that the regional HRF delay, especially the relative HRF delay between these two regions, can be accurately estimated. For example, when the HRF of region Y was estimated to be 3 s faster than that of region X, we realigned the time series of region X against that of region Y by discarding the first three and last three data points of the time series of regions X and Y, respectively, at a sampling rate of 1 Hz. The regional HRF delay was simulated by setting different parameters of the response relative to the onset in the canonical HRF in the SPM with the default settings, and the opposite HRF delay was varied from 2 to 5 s. Figure 3A shows the results given by the GC algorithm with BOLD signal realignment, when the sampling rate of the BOLD signal was 1 Hz. Setting a threshold of 10−9 for the p-value, classical GC failed to detect any causality in this case, but the proposed GC algorithms achieved much better TP and FP rates. However, the BOLD realignment worked for those integer HRF delays matching the sampling rate, but not for those delays that are not the integer times of the sampling period, which herein is 2.5, 3.5, and 4.5 s. Therefore, we tried to increase the sampling rate to 2 Hz, and simulated the BOLD signal again. In Figure 3B, without the BOLD realignment, the proposed GC algorithms failed to reliably estimate the causality because both the TP and FP rates are high. The BOLD realignment improved the performances of the proposed GC algorithms with a 100% TP rate and lower than 20% FP rate, as shown in Figure 3C. We can barely see the results for classic GC in Figure 3, since classic GC failed to detect any significant causal causality in all cases.

Figure 3.

Figure 3

Effects of the HRF delay and down sampling on the Granger causality analysis (GCA). (A) Performances of GCAs after a realignment of the BOLD signals between two regions to correct the opposite HRF delay when the sampling rate was 1 Hz. (B) Performances of GCAs when the sampling rate was 2 Hz. (C) Performances of GCAs after a realignment of the BOLD signals when the sampling rate was 2 Hz.

As demonstrated above, the proposed optimal GC algorithms may detect the right direction, the reversed direction, or the bi-direction of the causal influence between two regions as significant. However, what if there is no causal influence between the two regions? To test whether the down sampling and HRF convolution introduce false causal connections between pairs of regions without any causal influence in neuronal activities, the proposed algorithms were applied to other simulation data by setting the causal coefficient, A21, in model (3) to zero, the neuronal delay to 50 ms, and the opposite HRF delay to 3 s. Setting the threshold of the p-value for significant causality as 10−6, the false positive rates for opt average GC and opt cumulative GC were 0.24% and 0.12%, respectively.

Performance Comparison of Simulated fMRI Dataset

First, classic Granger causality (classic GC), optimal GC approaches (opt average GC and opt cumulative GC), and dual Kalman filter cumulative GC (Dkf cumulative GC), which is defined in Appendix D, were applied to the simulated fMRI dataset described in the Materials and Methods section for a performance comparison. All results were obtained by repeating the simulation 100 times. Neither the neuronal delay nor the negative HRF delay was included in this simulation, as none of the lag-based methods work well in this case (Smith et al., 2012); however, such a performance comparison is informative when the lag-based method is applicable (Friston et al., 2012; Wen et al., 2012). When coefficient matrix A was time-invariant, all approaches could detect the causality correctly, as expected. When the interaction coefficients were time-varying, in particular, with positive and negative values alternatively in different time intervals, as defined in Eq. (3), the optimal GC approaches were much more powerful than both classic GC and dual Kalman filter cumulative GC. To obtain a more global view of the results, the threshold of the p-values was varied from 0.05 to 0.001. We calculated the TP and FP rates of these three approaches, i.e., opt average GC, opt cumulative GC, and Dkf cumulative GC, accordingly, by repeating the simulation 100 times. As shown in Figure 2D, the proposed optimal GC approaches outperformed dual Kalman filter cumulative GC.

Increased Test-retest Reliability Obtained from Multiband Resting-State Dataset

Herein, the reliability of Granger causality can be measured based on the correlation between the results inferred for two series of scans of the same subjects. Granger causality was estimated between all directional pairs of brain regions, and the Pearson’s correlation coefficients were then calculated between these causality measurements for the two series scans. For each scan in the multiband test-retest pilot dataset, the Granger causality for each direction was averaged over seventeen subjects to provide the group Granger causality. The correlations of the group Granger causality between two series of scans demonstrate the reliability of Granger causality. Larger correlations might result in a higher reliability. As shown in Figure 4, the correlations in the group Granger causality between two series of scans increased monotonically with respect to the number of change-points. A significant correlation (r = 0.4751, p < 0.001) in the group Granger causality between two series of scans was observed when the Granger causality was calculated by employing nineteen time windows, while the correlation was around 0.3105 in the classical case.

Figure 4.

Figure 4

Correlation of the mean of the group Granger causalities (GC) between two series of scans upon the same subject set, versus the number of time windows for the average Granger causality. The inset plots show the correlation between two scans of the selected number of windows, where each circle represents the GC between two ROIs parcelled by AAL atlas.

To further demonstrate the effect of the spatial fine-scale details on the Granger causal inference, we compared the correlations established by voxel-level Granger causality with those by classic Granger causality in 100 randomly selected regions from 1024 ROIs by averaging the time series in the same ROI. By calculating the voxel-level spatial Granger causality instead of the classic Granger causality, the correlation increased from 0.3588 to 0.5125. Furthermore, the combined effects of the temporal and spatial fine-scale details were demonstrated on the test-retest reliability of the Granger causality for 100 regions randomly selected from 1024 ROIs. In Figure 5, by calculating the spatio-temporal Granger causalities (stGC), the correlation (r = 0.6059, p < 0.001) between the two scans was significantly improved to 0.6059.

Figure 5.

Figure 5

Correlation between the Granger causality between two series of scans for the same set of subjects. The causality measurements were calculated through different methods: (A) traditional Granger causality, (B) voxel-level Granger causality, (C) average Granger causality, and (D) spatio-temporal Granger causality.

Validating the Results from the Resting-State fMRI Dataset

1) Monotonicity and significance of Granger causality

In this example, the Granger causality was estimated using time windows with different lengths. In each time series, the first eighty time points were divided into two sets of time windows, including eight time windows with ten time points per window, i.e., change-point set S1 = {1,10, 20, 40,50,60,70,80}, and two time windows with forty time points per window, i.e., change-point set S2 = {1, 40,80}. The cumulative and average Granger causalities with S1 and S2 were estimated for all directions between all pairs of brain regions for each subject. A 95% confidence interval of the differences Djia=Fjia,S1Fjia,S2 and Djic=Fjic,S1Fjic,S2 was established for each possible direction {ij | i, j =1,2,⋯, 90 and ij}. For all directions, the lower bounds of these differences were still larger than 0, which is exactly consistent with our theoretical results, as shown in Figure 6B.

Figure 6.

Figure 6

Granger causality results of the resting-state dataset. (A) Average Granger causality versus cumulative Granger causality on the resting state dataset. (B) Comparison of the average Granger causality established by different time window lengths.

The results of the average and cumulative Granger causalities were well correlated, as shown in Figure 6A. Actually, if the data are generated by the TV-MVAR model, which is perfectly static in each time window, the cumulative Granger causality is larger than the average Granger causality (See Theorem C1 in Appendix C). Under the null hypothesis of non-causality, both Granger causalities approach zero as the size of the data becomes sufficiently large. Moreover, the average Granger causality converges to zero quicker than the cumulative Granger causality (Theorem C2 in Appendix C), i.e., the p-value of the significance of the average Granger causality may be smaller, as was also shown from the simulation results in the previous section (Table 2) in which the average Granger causality performed better than the cumulative Granger causality in terms of detecting the non-causality. Therefore, in the following, the average Granger causality is calculated.

As discussed in Appendix B (Corollary B5), some causal connectivity may be missed if the Granger causality is estimated using the static MVAR model for the whole time series, owing to the correlation of the causality measurement and the time-varying causal coefficients. We studied individually the correlations between the causality measurements, the sum of the absolute values of the estimated causal coefficients, and the absolute value of the sum of the estimated causal coefficients across all time windows defined by the change-point set (S1) in the TV-MVAR model. For 198 subjects, the absolute values of the median of this summation were plotted in Figure 7A against the median of the Granger causality for each direction. This correlation between the average Granger causality and the sum of the causal coefficients decreased for finer time windows, as compared with the classic Granger causality. In contrast, this correlation increased when the absolute value of the median of the sum of the causal coefficients was considered (Figure 7B). As shown in Eq. (A2) in Appendix A, summing the positive and negative causal coefficients in different time windows may lead to an elimination of both positive and negative causal influences. In other words, the classic Granger causality, or Granger causality with a coarser-scale, tends to give a null prediction when the sum of the causal coefficients is near zero; however, a zero sum may be given by significant non-zero coefficients with different signs in different time windows.

Figure 7.

Figure 7

Granger causality versus the sum of the causal coefficients across time windows. The causal coefficients were estimated for each time window defined by S1 for each subject. The medians of the causality among 198 subjects were established using different change-point sets, including S1, S2, S3, and the whole time series without a change-point (specified as the titles for subplots in the figure). Different change-point sets gave different Granger causality values, since the Granger causality value increased as the time window lengths decreased. (A) Correlation to the absolute value of the median of the sums of the causal coefficients over all time windows. (B) Correlation to the median of the sums of the absolute values of the causal coefficients over all time windows. The p-values for all correlations are below the significant threshold.

The results for some particular examples are given in Figure 8. The classic Granger causality using the whole time series data gave near-zero1 causality measurements when the summations of the causal coefficients across all time windows were near zero for the directions ‘Precuneus_R→Hippocampus_R’ and ‘Thalamus_R→Precuneus_L’. However, both the average Granger causality and the classic Granger causality detected significant causality for the other three directions, as shown in Figure 8, since the sum of the causal coefficients across all time windows was larger than zero.

Figure 8.

Figure 8

Boxplot for the sum of the causal coefficients across all time windows of the change-point set, S1.

2) Granger causality mapping from the precuneus

The approaches discussed above were used to identify the Grange causality mapping from the precuneus, which is believed to be the core of many cognitive behaviours and self-conscience, and has been called the ‘mind’s eye’ (Cavanna and Trimble, 2006), to other brain regions. The proposed average Granger causality with the optimal time-window dividing algorithm (AGC-OTWDA) was used for the resting state dataset by setting the maximum number of time-windows to three. The significance of the causality influence was detected through a statistical test (see Materials and Methods). In contrast, an analysis was also carried out for each subject using the classic Granger causality.

The classic Granger causality based on the whole time series failed to detect any significant2 causal connectivity from the precuneus, while the AGC-OTWDA identified directional neural circuits centred at the precuneus, as shown in Figure 9. Since this dataset was collected when the subjects’ eyes were open, the information flows from the precuneus and visual recognition network of the brain regions, marked in green in Figure 9, were very significant.

Figure 9.

Figure 9

Information flows from precuneus inferred by the average Granger causality based on the optimal time window dividing algorithm. The brain regions for visual recognition are marked in green, the primary visual cortex is marked in yellow, the sensory motor areas are marked in red, and the attention areas are marked in purple. The arrows marked by dotted lines indicate potentially false predictions owing to the regional variation of the HRF. The brain regions are defined by AAL90, as in DPARSF (Yan and Zang, 2010).

To ascertain that the relative variation of the HRF is not a significant confounding factor for the results of the precuneus, the cross-correlation function between the BOLD signals of two regions in each causal connection was examined. The peaks of the cross-correlation function appeared to have zero lag in more than 90% of the subjects for most of the pairs, except for those between the right precuneus (PCUN.R), the right Precental gyrus (preCG.R), and the opercular part of the right inferior frontal gyrus (IFGoperc.R), which had only 68% peaks with zero lag. Therefore, the relative variation of the HRF was not a significant factor in the causality results between the precuneus and the visual recognition network.

Validating Results on the Attention-Task fMRI Dataset

For the attention task, we detected the causality between rIPS and rTPJ. Granger causalities were estimated for all possible pairs of voxels and averaged as the spatial Granger causality. Two methods were used to calculate the Granger causality. One was to concatenate the time series data in each attention block together into a long data series, and then compute the Granger causality. The other was to calculate the Granger causality for each attention block and average them, i.e., the average Granger causality defined above. For comparison, we applied these two methods to estimate the Granger causality of two directions, rIPS→rTPJ and rTPJ→rIPS, for twelve subjects.

As shown in Figure 10, the average Granger causality is clearly larger than the classic Granger causality. For both directions, the differences between the causalities established through the two different methods were calculated for all twelve subjects, and a paired two-sample t-test was conducted to examine the difference, i.e., the average Granger causality subtracting the classic Granger causality. The right-tailed t-test suggested that the differences in both directions are significantly larger than 0 with p-values equal to 6.6482×10−6 for rIPS→rTPJ, and 9.0040×10−5 for rTPJ→rIPS. These results are consistent with our theoretical analysis, i.e., the average Granger causality analysis across many shorter time series provided by multi-trails provides larger measurements than a single long-term series observation.

Figure 10.

Figure 10

Comparison between the classic Granger causality (classical GC) and average Granger causality (average GC) for an attention-task dataset.

Discussion

Danger of Smoothing Out Causal Information in Long-term Recordings

When we have long-term recordings of two time series observations, how can we reliably estimate the Granger causality between the time series? A naive and intuitive approach to estimate the Granger causality is to apply all recordings into the MVAR model. This approach is based on the widely-accepted statistical belief that the more data that are used, the closer the result will be to the true value. However, in this paper, our theoretical analysis and numerical examples demonstrate that this may not be the case in an fMRI data analysis. The reliability of the statistical inference depends not only on how many datasets there are, despite their importance, but also on how finely the model describes the data.

In this paper, we discussed the effects of the fine-scaled details in the MVAR model on the Granger causality for detecting the directional information flows between time series data and applied the results to the fMRI data analysis. This effect was mathematically analysed, and it was concluded that both the temporal and spatial characteristics of the MVAR model affect the reliability of the Granger causality estimation. A smaller change-point set implies a coarser model, and a larger one implies a finer model. As we proved, the Granger causalities in the coarser model (with fewer change-points), including both the cumulative and average causalities, are smaller than those in the finer model (with more change-points). As demonstrated by the numerical simulations, the classic Granger causality becomes the lower bound of the average and cumulative Granger causalities (Corollaries B2 and B5), while the causality established using the real change-point set provides the upper bound. Our results demonstrate that the Granger causality depends on the model configuration, and thus ‘the devil is in the details’.

The Granger causality was proved to be equivalent to the transfer information (entropy) between Gaussian processes. It has been widely argued that the definition of the information strongly depends on the modelling configuration for the physical system (Jaynes, 1985). As argued by (Lloyd, 1989), coarse-grained modelling (such as imperfectly determined network evolution) may lead to information loss. Hence, the calculation of Granger causality, or the transfer of information, definitely suffers from the modelling configuration issue.

Trade-off between Preciseness of Estimation and Fineness of Modelling

For a given data set, if we use too many change-points for the TV-MVAR model to have a sufficient number of data points at each time window, we may obtain an inaccurate estimation of the coefficients for the model. In other words, a larger change-point set implies a finer model (possibly a larger Granger causality), but this may become an obstacle for the precise estimation of the Granger causality. Therefore, an optimal change-point set should be a trade-off between the preciseness of the statistic estimation and the fineness of the modelling. In this paper, we propose a novel algorithm for detecting the optimal change-point set based on the Bayesian information criterion (BIC). As illustrated through a numerical simulation, this algorithm can correctly identify the change-points in the model and increase the reliability and significance for a Granger causality analysis. However, compared to the models with time windows of equal length, the optimal time-window dividing algorithm was shown to be time consuming. When we focus on the global index measuring the directed information flow instead of the exact evolution course of the underlying structure, the optimal time window can be determined by either the optimal time window dividing algorithm or by comparing the BICs given by the models with equally divided time windows with different lengths.

Effects of HRF Delay and Down-sampling on the Proposed Methods

The effects of the HRF delay on the Granger causality analysis have been discussed by many researches. In this paper, we found that, as the opposite delay increased, the TP rate dropped down and the FP rate rose, which is consistent with the previous results (Smith et al., 2012). In (Deshpande et al., 2010), the authors convolved the HRF with the local field potentials (LFP) recorded from a macaque, and found that even if the HRF delay opposed the underlying neuronal delays is as long as 2.5 seconds, the minimum detectable neuronal delay will still be on the order of a hundred milliseconds. Most recently, Schippers et al. (2011) conducted another simulation-based investigation for the same issue, and found that Granger causal inference can successfully detect over 80% of the cases when the influences flowing toward a region with a faster hemodynamic delay if the neuronal delays are above 1 s. These results suggest that the Granger causality analysis (GCA) performs well when the HRF delay between regions is short; however, when the HRF delay is long, additional procedures must be taken to minimize the effects of the HRF delay on the results given by the GCA. In this paper, we tried to de-convolute a neuronal signal from a BOLD signal using an advanced Kalman filter (Havlicek et al., 2011), but no significant improvement was observed (data not shown). Assuming that a regional HRF delay can be estimated accurately, the performance of the GCA can be improved by realigning the BOLD signals from two regions to control the HRF delay. Note that, to make this realignment work, the sampling rate of the BOLD signal must be finer than the HRF delay between the two regions of interest. Typically, the TR from an fMRI is around 2 to 3 s for whole brain imaging, and by sacrificing the spatial coverage and spatial resolution, the temporal resolution can be as high as 500 ms (Arichi et al., 2012). Fortunately, the speed of an fMRI has been rapidly increasing (Feinberg and Yacoub, 2012), and a sub-second whole-brain fMRI has already been made available (Feinberg et al., 2010). In fact, the most recent advance in MRI technology has enabled a temporal resolution of as fast as 50 ms (Boyacioglu and Barth, 2012). Meanwhile, the accurate and robust estimation of HRF in a BOLD signal has been a fundamental and hot issue for a long time in the area of fMRI data analysis, and many estimation methods have been proposed, including the Friston et al.’s classical paper (1994) and the most recent development by (Wang et al., 2011), among many others. Therefore, an accurate estimation of the regional HRF and a realignment of the BOLD signal to correct the HRF delay are some of our future aims for a GCA of fMRI data.

Comparison with Filter-based Approaches

Considering the regional variation of HRF and physiological noise, we simulated the fMRI time series. Based on this dataset, we compared the performances of the proposed optimal Granger causality (GC) approaches with the classic GC and dual Kalman filter cumulative GC, and discussed the effects of the regional variation of the HRF on the Granger causality analysis. The optimal GC approaches outperformed the other two methods for this simulation. We do not intend to imply through this example that our approaches must be better than the dual Kalman filter approach. However, the extension of our GC definition to other approaches handling time-varying dynamics is definitely an interesting and important issue that may provide new insight into GC and time-varying dynamics theories, and is one of our future research aims.

Precuneus Role as a Hub during a Resting State

For another resting-state fMRI dataset, the proposed approach succeeded in detecting a number of Granger causal interdependencies, from the precuneus to other brain regions, which cannot be inferred by the classic Granger causality, based on the static MVAR model for the whole BOLD time courses. In particular, a circuit centred at the precuneus to the visual network provided proof of the pivotal role played by the precuneus in visual cognition.

Possibility of Detecting a Status Change in fMRI Data

In attention-task fMRI data, our approach with spatio-temporally finer-scale details detected the information transfer flows between the brain areas of rIPS and rTPJ. One possible application for this method is detecting the status change in the data. However, the experimental design used in this study for the attention-task was a mixed blocked/ER design. The stimuli were randomized for each subject and block (see the experimental design). The responses were not required for all stimuli, and were therefore also randomized. Both the randomized stimuli and responses might impact the dynamics of the BOLD signal, as well as the block onset and offset. This could cause the detection of unpredictable change-points within the block. Therefore, the current experimental design is not optimal for this purpose, which may be a separate issue of importance. An additional experimental design and the development of a new method may be required in the future.

Further Directions in Spatio-temporal Granger Causality Algorithm

The proposed framework, called spatio-temporal Granger causality, consists of several modules, including change-point detection, parameter estimation, and causality estimation. We emphasize that the current algorithm is not the optimal, since a more sophisticated method for each module may improve the overall performance of the analysis. Our future work will aim at finding better algorithms for a more precise estimation of the global Granger causality under the current framework, using up-to-date approaches for each module and a comparison with the existing algorithms (Cribben et al., 2012; Havlicek et al., 2010; Hemmelmann et al., 2009; Hesse et al., 2003).

Conclusions

The estimation of Granger causality is heavily influenced by the model used. Our results show that a coarse-grained approach/model may average out the meaningful information, since ‘devil is in the details’. The widely held belief that better statistics (Granger causality) result from a longer recording of a dataset is not always true if the whole long-term time series is incorporated into the coarse-grained MVAR model. Instead, we suggest that the optimal strategy is to divide a long-term recording into a number of time windows using some optimal BIC-based algorithms. A reliable estimation of the Granger causality by a finer-scale MVAR model both in time and space can be achieved.

We proposed a new framework for inferring the Granger causality between groups of times series by taking the finer-scale details into the MVAR model. Our approach shows power to detect an information transfer between brain regions based on fMRI BOLD signals and to enhance the reliability of the estimation. This idea and approach may give rise to a new angle toward the debate of the reliability of Granger causality for fMRI data, particularly the resting-state fMRI time courses.

Highlights.

  • Granger causality increases monotonically with temporal resolution.

  • A new framework was proposed for Granger causality at finer spatial-temporal scale.

  • Higher reliability of causality was achieved at fine spatial-temporal scale.

  • Information flows between the precuneus and the visual regions were revealed.

Acknowledgements

We acknowledge two anonymous reviewers for their insightful comments and suggestions. JF is a Royal Society Wolfson Research Merit Award holder, partially supported by National Centre for Mathematics, Interdisciplinary Sciences (NCMIS) of the Chinese Academy of Sciences and the Key Program of National Natural Science Foundation of China (No. 91230201). QL is partially supported by grant from the National Natural Sciences Foundation of China (No.s 11101429, 11271121, 71171195), Research Fund for the Doctoral Program of Higher Education of China (No. 20114307120019), and the National Basic Research Program of China (No. 2011CB707802). LWL is jointly supported by the Marie Curie International Incoming Fellowship from the European Commission (FP7-PEOPLE-2011-IIF-302421), the National Natural Sciences Foundation of China (No. 61273309), the Foundation for the Author of National Excellent Doctoral Dissertation of PR China (No. 200921), Shanghai Rising-Star Program (No. 11QA1400400), and also by the Laboratory of Mathematics for Nonlinear Science, Fudan University.

Appendix A: Solution of the time-varying linear regression

To build up a theoretical analysis of the Granger causality, we assume that the time series are generated by the following the first-order (discrete-time) time-varying multivariate autoregressive (TV-MVAR) model:

x(t+1)=a1(t)xt+b1(t)yt+n(t),t=1,2,,T, (A1)

where nt is white Gaussian noise statistically independent of x and y:

En(t)=0,E[n(t)n(t')]=σn2(t)δt,t'.

Here, δt,t' is the Kronecker delta. Without loss of generality, we can suppose that xt and yt are centred, i.e., all means are equal to zeros, and the variances of xt and yt both equal to 1, by multiplying coefficients a1 (t) and b1 (t) by their variances, respectively. Moreover, we assume that the correlation between xt and yt are stationary, i.e., E(xtyt) = c for a constant c∈[0,1]. Thus, we can perform a simple linear transformation to make x and y orthogonal:

zt=(ytcxt)1c2,

which implies that zt has its mean equal to 0 and its variance equal to 1, and is uncorrelated with xt. Thus, (A1) becomes:

xt+1=ã1(t)xt+b˜1(t)zt+nt

with ã1 (t) = a1 (t) + b1 (t)c, b˜1(t)=b1(t)1c2. Hence, we can discuss this problem assuming that xt and yt are uncorrelated that will not lose generality. Therefore, in the following, we assume that xt and yt are uncorrelated.

Considering the time-varying linear regression system (A1), we estimate the Granger causality with different time-window split. More generally, we consider (2) or (3) to replace the intrinsic system. To estimate the theoretical values of the time-varying Granger causalities by averaging or cumulating as mentioned in the main text, first, we are to estimate the parameters a1¯S1(k) and b1¯(S1)(k) by minimizing the following residual square errors across the whole time interval:

t=1TE[(xt+1a1¯S1(k)xtb1¯S1(k)yt)2]=t=1TE{[(a1(t)a1¯S1(k))xt+(b1(t)b1¯S1(k))xt+nt]2}=k=1mt=tk1tk1E{[(a1(t)a1¯S1(k))xt+(b1(t)b1¯S1(k))yt+nt]2}=k=1mt=tk1tk1[(a1(t)a1¯S1(k))2+(b1(t)b1¯S1(k))2+σn2(t)]

which is equivalent to a series of minimisation problems:

mina1¯S1(k)t=tk1tk1(a1(t)a1¯S1(k))2,mina1¯S1(k)t=tk1tk1(b1(t)b1¯S1(k))2

for all k = 1,⋯, m. It can be seen that the (expectation of) the solution should be

a1¯S1(k)=1tktk1t=tk1tk1a1(t),b1¯S1(k)=1tktk1t=tk1tk1b1(t),for allk. (A2)

Appendix B: Monotonicity of the Ganger causalities of TV-MVAR models

Monotonicity of cumulative Granger causality

By the estimation of the coefficients, Eq. (A2), the cumulative Granger causality with the given time window lengths can be estimated as:

FYX(c,S1)=log(US1+t=1T(b1(t))2+t=1Tσn2(t)US1+VS1+t=1Tσn2(t)),

where

US1=k=1mt=tk1tk1(a1(t)a1¯S1(k))2,VS1=k=1mt=tk1tk1(b1(t)b1¯S1(k))2.

We have the following result.

Theorem B1

For two change-point sets S1 and S2, if S1S2, then

FYX(c,S1)FYX(c,S2)
Proof

Let S1 be composed of the following integer series:

1=t0<t1<<tm1<tm=T+1

Since the increasing integer series S2 contains S1, we can denote S2 as follows;

1=(t0=)t11<t12<<t1n1<t1n1+1=t21(=t1)<<tm1nm1+1=tm1(=tm1)<tm2<<tmnm(=tm)=T+1.

In other words, in each time interval defined by S1, for instance, from tk−1 to tk we denote tk1<tk2<<tknk<tknk+1 as the integers in S2, which are located between tk−1 and tk. For simplicity, we let tknk+1=tk+11. Then, the TV-MVAR model with respect to S2 can be formulated as

xt+1=a1¯S2(k,q)xt+b1¯S2(k,q)xt+ñ(t),tkqt<tkq+1,q=1,,nk,k=1,,m. (B1)

First, we are to prove that US1US2 and VS1VS2 that are essentially the same. So, we need to prove one of them, for instance, VS1VS2.

In fact, we rewrite VS2 as follows:

VS2=k=1|S1|q=1nkt=tkqtkq+11(b1(t)b1¯S2(τ(tkq)))2

where τ(tkq) denotes the order of tkq in the ordered integer set S2.

Thus, it is sufficient to show that in each time window of S1, it holds that

t=tk1tk1(b1(t)b1¯S1(k))2q=1nkt=tkqtkq+11(b1(t)b1¯S2(τ(tkq)))2.

We note that

t=tk1tk1(b1(t)b1¯S1(k))2=t=tk1tk1[b1(t)]2(tktk1)[b1¯S1(k)]2,q=1nkt=tkqtkq+11(b1(t)b1¯S2(τ(tkq)))2=t=tk1tk1b12(t)q=1nk(tkq+1tkq)(b1¯S2(τ(tkq)))2

and

b1¯S1(k)=1tktk1q=1nk(tkq+1tkq)b1¯S2(τ(tkq)),q=1nk(tkq+1tkq)=tktk1.

Hence,

(tktk1)[b1¯S1(k)]2=(tktk1)[1tktk1q=1nk(tkq+1tkq)b1¯S2(τ(tkq))]2(tktk1)1tktk1q=1nk(tkq+1tkq)[b1¯S2(τ(tkq))]2=q=1nk(tkq+1tkq)[b1¯S2(τ(tkq))]2 (B2)

owing to the well-known fact that the weighted algebraic average is less than the square average with the weighting. This implies VS1 ≥ VS2. So, it is with US1 ≥ US2.

From VS1 ≥ VS2, we immediately have

t=1T(b1(t))2+t=1Tσn2(t)VS1+t=1Tσn2(t)t=1T(b1(t))2+t=1Tσn2(t)VS2+t=1Tσn2(t)

Combined by US1 ≥ US2, we can derive

US1+t=1T(b1(t))2+t=1Tσn2(t)US1+VS1+t=1Tσn2(t)US2+t=1T(b1(t))2+t=1Tσn2(t)US2+VS2+t=1Tσn2(t)

This means FYX(c,S1)FYX(c,S2). From (B2), one can see that the inequality holds if and only if

b1¯S2(τ(tkq))=b1¯S1(k) (B3)

holds for all q = 1,⋯, nk and all k.

From Theorem B1 and its proof, in particular Eq. (B3) as the sufficient and necessary condition for FYX(c,S1)=FYX(c,S2). We immediately have the upper and lower bounds of the cumulative Granger causality.

Corollary B2

Let S0 = {1,T +1} and S* be the ordered time point set that exactly comprise of the change-points in the TV-MVAR. Then, for any ordered time point set S, we have

FYX(c,S0)FYX(c,S)FYX(c,S*)

This corollary shows that the static (classic) Granger causality actually is the lower-bound of the cumulative Granger causality. And, if the time series are exactly generated by TV-MVAR (A1) with the change-point set S*, the cumulative Granger causality based on it is the upper bounds of all.

We should point out that Theorem B1 holds under the condition that one switching time set is contained in the other. It does not imply that the more change-points are, the larger cumulative Granger causality it will have.

Conjecture 1

If |S2|≥|S1|, then FYX(c,S1)FYX(c,S2).

We claim that this conjecture is not true by a simple counter example. Let T = 6, S1 = {1,4,7}, S2 = {1,3,5,7}, and S = {1,7}. We suppose that the data is produced by the model (5) with a constant a1 and b1 (t) is periodic with a period 2, i.e, b1 (1) = b1 (3) = b1 (5) and b1 (2) = b1 (4) = b1 (6). But the two values do not equal pair-wisely. It is clear that for S2, the parameters can be estimated as

b1¯S2(1)=b1¯S2(2)=b1¯S2(3)=12(b1(1)+b1(2)),

which equals to the whole average

b1¯=16(b1(1)+b1(2)+b1(3)+b1(4)+b1(5)+b1(6)).

So, the corresponding Granger Causality with S2 can be estimated equal to that of the static MVAR model (the change-point is composed of S), i.e., FYX(c,S2)=FYX(c,S).

Noting that FYX(c,S)<FYX(c,S1), where the strict inequality is because of b1¯S1(1)=13(b1(1)+b1(2)+b3(2))b1¯. So, we have FYX(c,S2)<FYX(c,S1) despite |S2|>|S1|. Let us consider a numerical example with a1 (t) = 1, σn1(t)=1 for all t, and b1 (1) = b1 (3) = b1 (5) = 1 and b1 (2) = b1 (4) = b1 (6) = 0. Direct calculations lead that

FYX(c,S1)=log(t=16(b1(t))2+6k=12t=3(k1)+13k(b1(t)b1¯i(k))2+6)=log6+38/9+6=log8162.

However,

FYX(c,S2)=log(t=16(b1(t))2+6k=13t=2(k1)+12k(b1(t)b1¯i(k))2+6)=log6+33/2+6=log65<log8162=FYX(c,S1).

Monotonicity of average Granger causality

Another approach to estimate the Granger causality of TV-MVAR model is to estimate the Granger causality at each time windows (between switching) can average them according to the length of each time window. Recall S1 = {1 = t0 < t1 < ⋯ < tm−1 < tm = T+1} as an increasing integer sequence that denotes the change-point and the TV-MVAR model as

xt+1=a1¯S1(k)xt+b1¯S1(k)xt+n(t),tk1t<tk,k=1,,m (B4)

At each time window, the Granger causality can be estimated as

FYX(k,S1)=log(Uk+t=tk1tk1(b1(t))2+t=tk1tk1σn2(t)Uk+Vk+t=tk1tk1σn2(t))

With

Uk=t=tk1tk1(a1(t)a1¯S1(k))2,Vk=t=tk1tk1(b1(t)b1¯S1(k))2.

Then, we estimate the Granger Causality by the TV-MVAR model (B4) as follows:

FYX(a,S1)=1Tk=1mFYX(k,S1)(tktk1),

named the average Granger causality, the weighted average according to the lengths of the time windows. To investigate the relationship between the average Granger causality and the fineness of temporal resolution, we need the following lemma:

Lemma B3

For any positive integer T, any m real constants [ct]t=1T, any T nonnegative constants [pt]t=1T with t=1Tpt=1 and any positive constants [σt2]t=1T, we have the following inequality

log(t=1Tct2pt+σ2t=1t(ctc¯)2pt+σ2)t=1tptlog(ct2+σt2σt2)

where c¯=t=1Tptct and σ2=t=1Tptσt2.

Proof

Let us consider the following function with respect to C=[ct]t=1T and Σ=[σt2]t=1T

V=logσ2+c2σ2+νt=1Tptlog(σt2+ct2σt2)

with

c2=t=1Tct2pt,ν=t=1T[c(t)c¯]2pt=c2(c¯)2.

We make minor modifications on the problem according to the following three facts.

First, noting

ν=t=1T[ctc¯]2pt=t=1T[ct)]2pt(t=1Tctpt)2t=1T|ct|2pt(t=1T|ct|pt)2,

if we replace ct by |ct| in V, which is denoted by , then we have V.

Therefore, without loss of generality, it is sufficient to prove ≤ 0 by considering the case that all ct are nonnegative.

Second, considering

Vct|ct=0=2c¯ptσ2+ν,

which is positive in the case that there is at least one positive ct with nonzero pt.

So, it is sufficient to consider the case that all ct are positive;

Third, it holds

t=1Tptlog(σt2)log(σ2)

owing to the Jensen’s inequality.

In summary, we can consider the following function instead of V:

(y,ν,Σ)=logσ2+c2σ2+νt=1Tptlog(σt2+ytσ2).

Letting yt = (ct)2, owing to the fact that all ct are nonnegative, y = [yt], with fixed c2 and σ2, we are going to show that is nonnegative by considering the following maximization problem:

max(y,ν,Σ)s.t.t=1Tytpt=c2,t=1Tytpt=c2ν,t=1Tptσt2=σ2,σt2>0,yt>0,t (B5)

To solve (B5), we introduce the following auxiliary Lagrange function:

L(y,ν,Σ,λ,μ,γ)=(y,ν,Σ)+λ(t=1Tytptb2)+μ(t=1Tytptc2ν)+γ(t=1Tσt2ptσ2)

By the Karush-Kuhn-Tucker conditions, the necessary conditions of the minimum of (B5) include:

Lyt=pt[1(σt2+yt)+λ+μ2yt]=0,Lν=1σ2+ν+μ2c2ν=0,L(σt2)=pt[1(σt2+yt)+γ]=0

This leads (i) pt = 0 or (ii) σt2+yt=1/γ and

yt=[μ2(γλ)]2. (B6)

In other words, for these yt with nonzero pt, if we are to solve yt from the above equalities as a function with respect to σ, λ, μ, γ, c2 and ν, which are independent of the index t, then we can only have one expression from (B6). It should be emphasized that we are not solving the values of yt but its expression with respect to the t-independent quantities, σ, λ, μ, γ, c2 and ν. Therefore, the possible minimum points of R(y, ν) only has one single value of yt. So it is with

σt2=1γ[μ2(γλ)]2

It can be seen that if yt and σt2 can only have a single value respectively, then (y, ν, Σ) = 0. So, the maximum of (y, ν, Σ) is zero. Hence, the intrinsic V has its maximum equal to zero. Therefore, lemma B2 is proved and the equality holds if and only if yt and σt2 can only have a single value respectively.

Theorem B4

Let S1 and S be2 two sequences of increasing integers. If S1S2, then

FYX(a,S1)FYX(a,S2).
Proof

We denote the sets S1 and S2 by the symbols as in the proof of Theorem B1. According to (B1), the Granger causality at the time window of S2, the k-th window of S1 and the q-th can be written as:

FYX(k,q,S2)=log(Uk,q+t=tkqtkq+11(b1(t))2+t=tkqtkq+11σn2(t)Uk,q+Vk,q+t=tkqtkq+11σn2(t))

where

Uk,q=t=tkqtkq+11(a1(t)a1¯S2(τ(tkq)))2,Vk,q=t=tkqtkq+11(b1(t)b1¯S2(τ(tkq)))2.

Then, the average Granger causality with respect to S2 is

FYX(a,S2)=1Tk=1|S1|q=1nkFYX(k,q,S2)(tkq+1tkq).

Note

q=1nka1¯S2(τ(tkq))(tkq+1tkq)=a1¯S1(k),q=1nkb1¯S2(τ(tkq))(tkq+1tkq)=b1¯S1(k),

and

t=tkqtkq+11[a1(t)]2=t=tkqtkq+11[a1(t)a1¯S2(τ(tkq))]2+[a1¯S2(τ(tkq))]2(tkq+1tkq),t=tkqtkq+11[b1(t)]2=t=tkqtkq+11[a1(t)b1¯S2(τ(tkq))]2+[b1¯S2(τ(tkq))]2(tkq+1tkq).

Compared with FYX(a,S2), we can rewrite the term of FYX(a,S1), i.e., FYX(k,S1), as follows:

FYX(k,S1)=log(Uk+t=tk1tk1(b1(t))2+t=tk1tk1σn2(t)Uk+Vk+t=tk1tk1σn2(t))=log(q=1nk[a1¯S1(k)a1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nk[b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq))log(q=1nk[a1¯S1(k)a1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nk[b1¯S1(k)b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq))

where

εn2(q)=1tkq+1tkqt=tkqtkq+11{[a1(t)a1¯S2(τ(tkq))]2+[b1(t)b1¯S2(τ(tkq))]2+σn2(t)}.

Since

q=1nk[a1¯S1(k)a1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nk[b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq)q=1nk[a1¯S1(k)a1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nk[b1¯S1(k)b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq)q=1nk[b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq)q=1nk[b1¯S1(k)b1¯S2(τ(tkq))]2(tkq+1tkq)+q=1nkεn2(q)(tkq+1tkq)

Theorem B3 can be derived by directly employing Lemma B3. In addition, the inequality holds if and only if b1¯S2(τ(tkq)) and εn2(q) can only pick values independent of the index q (but possibly depending on the index k), respectively.

Similar to Corollary B2, from Theorem B4 and its proof, in particular the sufficient and necessary condition for FYX(a,S1)=FYX(a,S2). We immediately have the upper and lower bounds of the cumulative Granger causality.

Corollary B5

Let S0 = {1,T +1} and S* be the ordered time point set that exactly comprise of the change-points in the TV-MVAR model. Then, for any ordered time point set S, we have

FYX(a,S0)FYX(a,S)FYX(a,S*).

This corollary shows that the static (classic) Granger causality actually is the lower-bound of the average Granger causality. And, if the time series are exactly generated by TV-MVAR (A1) with the change-point set S*, the average Granger causality based on it is the upper bounds of all.

We should also emphasize that the following conjecture is not true.

Conjecture B2

If |S2|≥|S1|, then FYX(a,S1)FYX(a,S2).

That is to say, the average Granger causality is monotonic with respect to the containing relation between the change-point set, but not monotonic with respect to the size of the change-point sets. A counter-example can be easily established by the same way as in Remark 1.

Appendix C: Comparison between cumulative and average Ganger causalities

Magnitude comparison

Actually, the two sorts of Granger causalities of the TV-MVAR model do not have definite magnitude relation. First, we show in the following theorem, the relation that cumulative Granger causality is greater than the average Granger causality with the same change-point set is conditional.

Theorem C1

Let S be a sequence of increasing integers. If the following quantity

1tktk1[t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t)b1¯(k))2+t=tk1tk1σn2(t)]

with

a1¯(k)=1tktk1t=tk1tk1a1(t),b1¯(k)=1tktk1t=tk1tk1b1(t)

is independent of the index k, then

FYX(a,S)FYX(c,S).
Proof

Let S = {1 = t0 < t1 < ⋯ < tm−1 <tm = T + 1}. And, with the same notations we used above, we have

FYX(c,S)=log(k=1mt=tk1tk1(a1(t)a1¯(k))2+t=1T(b1(t))2+t=1Tσn2(t)k=1mt=tk1tk1(a1(t)a1¯(k))2+k=1mt=tk1tk1(b1(t)b1¯(k))2t=1Tσn2(t)),

and

FYX(a,S)=1Tk=1m(tktk1)log(t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t))2+t=tk1tk1σn2(t)t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t)b1¯(k))2t=tk1tk1σn2(t))

Let b1¯=1Tt=1Tb1(t). Thus, we can rewrite them as

FYX(c,S)=log(k=1mt=tk1tk1(a1(t)a1¯(k))2+t=1m(b1¯(k))2(tktk1)+t=1mt=tk1tk1(b1(t)b1¯(k))2+t=1Tσn2(t)k=1mt=tk1tk1(a1(t)a1¯(k))2+k=1mt=tk1tk1(b1(t)b1¯(k))2t=1Tσn2(t))

and

FYX(a,S)=1Tk=1m(tktk1)×log(t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t)b1¯(k))2+(b1¯(k))2(tktk1)+t=tk1tk1σn2(t)t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t)b1¯(k))2+t=tk1tk1σn2(t)).

In addition, letting

θk=1tktk1[t=tk1tk1(a1(t)a1¯(k))2+t=tk1tk1(b1(t)b1¯(k))2+t=tk1tk1σn2(t)],αk=1tktk1t=tk1tk1(b1¯(k))2,pk=(tktk1)T

due to the condition, θk is independent of k, which is denoted by θ. Thus, we have

FYX(c,S)=log(t=1mαkpk+k=1mθkpkk=1mθkpk)=log(t=1mαkpk+θθ),

and

FYX(a,S)=k=1mpklog(αk+θkθk)=k=1mpklog(αk+θθ).

Thus, we can conclude FYX(a,S)FYX(c,S), owing to the Jensen’s inequality. This completes the poof.

From Theorem C1, if the TV-MVAR system is time-varying with the segments well known, which implies that at each time window, the system is static, we can conclude that the average Grange causality is smaller than the cumulative Granger causality. On the other hand, if the condition in Theorem C1 is not satisfied, then it will not be surprising that FYX(a,S)>FYX(c,S) holds. Here is a counter-example. A special situation is to solve the time-varying regression model (A1) as a static one, i.e., taking all time points as the change-point set, i.e., S = {1,⋯, T + 1}. By a proper transformation, we can still let the variances of y and x equal to 1 for all time. Let a1 (t) = 0 for all t. But the variance of the noise may be time-varying. The defined Granger causalities become:

FYXS=log(1Tt=1Tσn2(t)+1Tt=1T[b1(t)]21Tt=1Tσn2(t)),F^YXS=1Tt=1Tlog(σn2(t)+[b1(t)]2σn2(t)).

Pick T = 3, b1 =1, b2=2,b3=3,σ1=3,σ2=2, σ3 = 1. Then,

FYX=log2<F^YX=13[log(43)+log(2)+log(4)].

Comparison between the asymptotic square moments under null hypothesis

For simplicity, we suppose that the time-varying system (A1) is a switching system with equal-length time windows and the segment points are exactly known. The general case will be treated in our future paper. Thus, (A1) becomes a series of static linear system as follows:

xt+1=a1¯(k)xt+b1¯(k)yt+n(t),tkt<tk+1,k=1,2,m, (C1)

Here, tk+1tk=nm for all k. Under the null hypothesis, namely, the coefficients b1 (t) = 0 hold for all t, (C1) becomes

xt+1=a1¯(k)xt+ñ(t),tkt<tk+1,k=1,2,m. (C2)

Their residual squared errors at each time window are

RSS1(k)=t=tktk+11n2(t)andRSS0(k)=t=tktk+11ñ2(t).

Then, the CGS and AGC can be formulated as follows respectively:

FYXc=log(k=1mRSS0(k)k=1mRSS1(k))=log(1+k=1m[RSS0(k)RSS1(k)]k=1mRSS1(k))~k=1m[RSS0(k)RSS1(k)]k=1mRSS1(k)FYXa=1mk=1mlog(RSS0(k)RSS1(k))=1mk=1mlog(1+RSS0(k)RSS1(k)RSS1(k))~1mk=1mRSS0(k)RSS1(k)RSS1(k)

as n goes to infinity. Therefore, the cumulative Granger causality converges the following in distribution:

FYXc×n2m1mk=1m[RSS0(k)RSS1(k)]k=1m[RSS1(k)×n2m1m~F(m,n2m1)

and the average Granger causality converges to:

FYXa×n/m311mk=1mRSS0(k)RSS1(k)RSS1(k)×n/m31

with each

RSS0(k)RSS1(k)RSS1(k)×n/m31~F(1,nm3)

So, their asymptotic expectations are

E(FYXc)mn2m1n2m1n2m3=1nm23m,E(FYXc)nm3nm51nm3=1nm5,

as n → ∞.

The dominant converge rates are same, equivalently mn. Then, let us take a look at their asymptotic square moments. By the square moment of the F-distribution and simple algebras, we have

E{[FYXc]2}(1+2m)(mn)2.

As for the AGC, with fk=RSS0(k)RSS1(k)RSS1(k)~F(1,nm3), we have

E{[FYXa]2}(1n/m3)2E{1mk=1mfk}2(1n/m3)2(1m)k=1mE{fk}2

With

E{fk2}3(nm3nm5)2,

which implies

E{[FYXa]2}<3m(mn)2

holds in the asymptotic sense, i.e, if n is a sufficiently large. Therefore, in the asymptotic squared meaning, for m > 1, we have

E{[FYXa]2}<E{[FYXc]2}

asymptotically. In other words, the average Granger causality converges to zero more quickly than the cumulative Granger causality.

Theorem C2

Under the setup as mentioned above, lim supnE{[FYXa]2}E{[FYXc]2}<1 for m > 1.

And, the larger m is, the higher asymptotic converge rate the average Granger causality is than the cumulative one.

Appendix D: Dual Kalman filter cumulative Granger causality (Dkf cumulative GC)

We used the dual Kalman filter as in (Havlicek et al., 2010; Sommerlade et al., 2012), which can be described as the following MVAR

[xt+1yt+1]=k=1pAk(t)[xtkytk]+[n1tn2t] (D1)

where Ak (t) are the time-varying coefficients and n1,2t are the white noises. Define

zt=[xtyt],w(t)=[(zt)T,,(ztp+1)T]T,a(t)=vec[A1(t)T,,Ak(t)T],

and Eq. (D1) can be rewritten as

zt=w(t1)Ta(t)+ηt (D2)

associated with a random walk process for the time-varying coefficients

a(t+1)=a(t)+νt. (D3)

By the dual Kalman filter approach, the time-varying coefficients can be estimated, and then the residuals of Eq. (D2) can be used to define a cumulative GC by the same fashion as the cumulative GC in the paper

dkfGCYX=log(t=1Tvar(ηyxt)t=1Tvar(ηxt)), (D4)

where T is the length of the time course, ηyxt is the noise term in Eq. (D2) for the x-component with considering the inter-dependence from y-component, and ηxt is the noise term in Eq. (D2) without considering the inter-dependence from y-component. As in (Havlicek et al., 2010), the dkfGCYX and the dkfGCXY can be computed by estimating the model parameters, and the p-value of these causality statistics can be established by bootstrap. The readers are refer to (Havlicek et al., 2010) for more details about the parameter estimation procedure and the bootstrap for significance.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

The magnitude of the causality measurement is significantly larger than 0 if the lower bound of the 95% confidence interval of the causality in 198 subjects is greater than 0.0002 for the classical Granger causality, and 0.0726 for the average Granger causality.

2

A significant causality was identified when its p-value was less than 0.05 in at least 73% of the subjects.

References

  1. Ahmed A, Xing E. Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences. 2009;106:11878–11883. doi: 10.1073/pnas.0901910106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arichi T, Fagiolo G, Varela M, Melendez-Calderon A, Allievi A, Merchant N, Tusor N, Counsell SJ, Burdet E, Beckmann CF, Edwards AD. Development of BOLD signal hemodynamic responses in the human brain. NeuroImage. 2012;63:663–673. doi: 10.1016/j.neuroimage.2012.06.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Biswal B, Mennes M, Zuo X-N, et al. Toward discovery science of human brain function. Proceedings of the National Academy of Sciences. 2010;107:4734–4739. doi: 10.1073/pnas.0911855107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boyacioglu R, Barth M. Generalized iNverse imaging (GIN): Ultrafast fMRI with physiological noise correction. Magn Reson Med Up coming. 2012 doi: 10.1002/mrm.24528. [DOI] [PubMed] [Google Scholar]
  5. Cavanna AE, Trimble MR. The precuneus: a review of its functional anatomy and behavioural correlates. Brain. 2006;129:564–583. doi: 10.1093/brain/awl004. Epub 2006 Jan 2006. [DOI] [PubMed] [Google Scholar]
  6. Chen M-p, Lee C-C, Hsu Y-C. The impact of American depositary receipts on the Japanese index: Do industry effect and size effect matter? Economic Modelling. 2011;28:526–539. [Google Scholar]
  7. Cribben I, Haraldsdottir R, Atlas LY, Wager TD, Lindquist MA. Dynamic connectivity regression: determining state-related changes in brain connectivity. Neuroimage. 2012;61:907–920. doi: 10.1016/j.neuroimage.2012.03.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Deshpande G, Sathian K, Hu X. Effect of hemodynamic variability on Granger causality analysis of fMRI. NeuroImage. 2010;52:884–896. doi: 10.1016/j.neuroimage.2009.11.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ding M, Bressler SL, Yang W, Liang H. Short-window spectral analysis of cortical event-related potentials by adaptive multivariate autoregressive modeling: data preprocessing, model validation, and variability assessment. Biological Cybernetics. 2000;83:35–45. doi: 10.1007/s004229900137. [DOI] [PubMed] [Google Scholar]
  10. Evan K, Snigdhansu C, Auroop RG. Exploring Granger causality between global average observed time series of carbon dioxide and temperature. Theoretical and Applied Climatology. 2011;104:325–335. [Google Scholar]
  11. Feinberg DA, Moeller S, Smith SM, Auerbach E, Ramanna S, Glasser MF, Miller KL, Ugurbil K, Yacoub E. Multiplexed Echo Planar Imaging for Sub-Second Whole Brain FMRI and Fast Diffusion Imaging. PLoS ONE. 2010;5:e15710. doi: 10.1371/journal.pone.0015710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Feinberg DA, Yacoub E. The rapid development of high speed, resolution and precision in fMRI. NeuroImage. 2012;62:720–725. doi: 10.1016/j.neuroimage.2012.01.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Friston K, Moran R, Seth AK. Analysing connectivity with Granger causality and dynamic causal modelling. Current Opinion in Neurobiology. 2012;23:1–7. doi: 10.1016/j.conb.2012.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Friston KJ, Jezzard P, Turner R. Analysis of functional MRI time-series. Human brain mapping. 1994;1:153–171. [Google Scholar]
  15. Ge T, Feng J, Grabenhorst F, Rolls E. Componential Granger causality, and its application to identifying the source and mechanisms of the top-down biased activation that controls attention to affective vs sensory processing. NeuroImage. 2012;59:1846–1858. doi: 10.1016/j.neuroimage.2011.08.047. [DOI] [PubMed] [Google Scholar]
  16. Ge T, Kendrick K, Feng J. A Novel Extended Granger Causal Model Approach Demonstrates Brain Hemispheric Differences during Face Recognition Learning. PLoS Comput Biol. 2009;5:e1000570. doi: 10.1371/journal.pcbi.1000570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Granger CWJ. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica. 1969;37:424–438. [Google Scholar]
  18. Guo S, Wu J, Ding M, Feng J. Uncovering Interactions in the Frequency Domain. PLoS Comput Biol. 2008;4:e1000087. doi: 10.1371/journal.pcbi.1000087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Havlicek M, Friston KJ, Jan J, Brazdil M, Calhoun VD. Dynamic modeling of neuronal responses in fMRI using cubature Kalman filtering. NeuroImage. 2011;56:2109–2128. doi: 10.1016/j.neuroimage.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Havlicek M, Jan J, Brazdil M, Calhoun VD. Dynamic Granger causality based on Kalman filter for evaluation of functional network connectivity in fMRI data. Neuroimage. 2010;53:65–77. doi: 10.1016/j.neuroimage.2010.05.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hemmelmann D, Ungureanu M, Hesse W, Wustenberg T, Reichenbach JR, Witte OW, Witte H, Leistritz L. Modelling and analysis of time-variant directed interrelations between brain regions based on BOLD-signals. Neuroimage. 2009;45:722–737. doi: 10.1016/j.neuroimage.2008.12.065. [DOI] [PubMed] [Google Scholar]
  22. Hesse W, Moller E, Arnold M, Schack B. The use of time-variant EEG Granger causality for inspecting directed interdependencies of neural assemblies. Journal of Neuroscience Methods. 2003;124:27–44. doi: 10.1016/s0165-0270(02)00366-7. [DOI] [PubMed] [Google Scholar]
  23. Jaynes ET. Some random observations. Synthese. 1985;63:115–138. [Google Scholar]
  24. Lloyd S. Use of mutual information to decrease entropy: implications for the second law of thermodynamics. Physical Review A. 1989;39:5378–5386. doi: 10.1103/physreva.39.5378. [DOI] [PubMed] [Google Scholar]
  25. Luo Q, Ge T, Feng J. Granger causality with signal-dependent noise. NeuroImage. 2011;57:1422–1429. doi: 10.1016/j.neuroimage.2011.05.054. [DOI] [PubMed] [Google Scholar]
  26. Luscombe NM, Madan Babu M, Yu H, Snyder M, Teichmann SA, Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature. 2004;431:308–312. doi: 10.1038/nature02782. [DOI] [PubMed] [Google Scholar]
  27. Ohiorhenuan IE, Mechler F, Purpura KP, Schmid AM, Hu Q, Victor JD. Sparse coding and high-order correlations in fine-scale cortical networks. Nature. 2010;466:617–621. doi: 10.1038/nature09178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Psaradakis Z, Ravn MO, Sola M. Markov switching causality and the money–output relationship. Journal of Applied Econometrics. 2005;20:665–683. [Google Scholar]
  29. Sato JoR, Junior EA, Takahashi DY, de Maria Felix M, Brammer MJ, Morettin PA. A method to produce evolving functional connectivity maps during the course of an fMRI experiment using wavelet-based time-varying Granger causality. Neuroimage. 2006;31:187–196. doi: 10.1016/j.neuroimage.2005.11.039. [DOI] [PubMed] [Google Scholar]
  30. Schippers M, Renken R, Keysers C. The effect of intra- and inter-subject variability of hemodynamic responses on group level Granger causality analyses. NeuroImage. 2011;57:22–36. doi: 10.1016/j.neuroimage.2011.02.008. [DOI] [PubMed] [Google Scholar]
  31. Smerieri A, Rolls ET, Feng J. Decision Time, Slow Inhibition, and Theta Rhythm. The Journal of neuroscience. 2010;30:14173–14181. doi: 10.1523/JNEUROSCI.0945-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Smith SM, Bandettini PA, Miller KL, Behrens TE, Friston KJ, David O, Liu T, Woolrich MW, Nichols TE. The danger of systematic bias in group-level FMRI-lag-based causality estimation. NeuroImage. 2012;59:1228–1229. doi: 10.1016/j.neuroimage.2011.08.015. [DOI] [PubMed] [Google Scholar]
  33. Smith SM, Miller KL, Salimi-khorshidi G, Webster M, Beckmann CF, Nichols TE, Ramsey JD, Woolrich MW. Network modelling methods for FMRI. NeuroImage. 2011;54:875–891. doi: 10.1016/j.neuroimage.2010.08.063. [DOI] [PubMed] [Google Scholar]
  34. Sommerlade L, Thiel M, Platt B, Plano A, Riedel G, Grebogi C, Timmer J, Schelter B. Inference of Granger causal time-dependent influences in noisy multivariate time series. J Neurosci Methods. 2012;203:173–185. doi: 10.1016/j.jneumeth.2011.08.042. [DOI] [PubMed] [Google Scholar]
  35. Tuncbag N, Kar G, Gursoy A, Keskin O, Nussinov R. Towards inferring time dimensionality in protein-protein interaction networks by integrating structures: the p53 example. Mol. BioSyst. 2009;5:1770–1778. doi: 10.1039/b905661k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage. 2002;15:273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
  37. Wang J, Zhu H, Fan J, Giovanello K, Lin W. Adaptively and spatially estimating the hemodynamic response functions in fMRI. Med Image Comput Comput Assist Interv. 2011;14:269–276. doi: 10.1007/978-3-642-23629-7_33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Wen X, Yao L, Liu Y, Ding M. Causal interactions in attention networks predict behavioral performance. The Journal of neuroscience. 2012;32:1284–1292. doi: 10.1523/JNEUROSCI.2817-11.2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wiener N. The Theory of Prediction. In: BeckenBach E, editor. Modern mathematics for the engineer. New York: McGraw-Hill; 1956. pp. 165–190. [Google Scholar]
  40. Yan C, Zang Y. DPARSF: a MATLAB toolbox for "pipeline" data analysis of resting-state fMRI. Frontiers in Systems Neuroscience. 2010;4 doi: 10.3389/fnsys.2010.00013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zalesky A, Fornito A, Harding IH, Cocchi L, Yucel M, Pantelis C, Bullmore ET. Whole-brain anatomical networks: does the choice of nodes matter? NeuroImage. 2010;50:970–983. doi: 10.1016/j.neuroimage.2009.12.027. [DOI] [PubMed] [Google Scholar]
  42. Zhu J, Chen Y, Leonardson AS, Wang K, Lamb JR, Emilsson V, Schadt EE. Characterizing Dynamic Changes in the Human Blood Transcriptional Network. PLoS Comput Biol. 2010;6:e1000671. doi: 10.1371/journal.pcbi.1000671. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES