ASTRACT
Brain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.
Keywords: brain connectivity analysis, expectation-maximization, functional magnetic resonance imaging, simultaneous inference, tensor regression, vector autoregression
1. INTRODUCTION
A high-dimensional time series model provides a useful tool for a wide range of scientific applications. Our motivation is brain-effective connectivity analysis, which quantifies directed influence of one neural element or region over another (Friston, 2011). A central question of interest is how effective connectivity pattern is affected by variations of subject conditions, which in turn reveals useful insights of neurological disorder pathology as well as normal brain development (Wu et al., 2010; Zhang et al., 2015). Vector autoregression (VAR) is a commonly used model in brain connectivity analysis (Bullmore and Sporns, 2009; Chen et al., 2011). In this article, we study the high-dimensional VAR model with measurement error and multiple subjects, and the transition matrix inference under this model.
Specifically, suppose the observed data
consist of n subjects, where
is the p-dimensional time series, and
is the d-dimensional covariate vector, for the ith subject, i = 1, …, n, t = 1, …, T. Let
contain the constant one, so the model includes the intercept term. Let
denote the latent signal process of the ith subject, and
can be viewed as the observed copy of
with an additive measurement error. Suppose
admits a lag-1 autoregressive structure, and the transition matrix depends on the covariates
through a linear tensor regression model. That is,
![]() |
(1) |
where
is the sparse transition matrix for the ith subject, but
’s do not necessarily share the same sparsity pattern,
is the sparse population transition organized as a three-way tensor, and × 3 is the mode-3 product between a tensor and a vector. The term
is the measurement error for the observed time series,
is the white noise of the latent signal, and
is the random error that quantifies the remaining variation in
that cannot be explained by the covariates
. We assume the error terms
and
are independent and identically distributed (i.i.d.) following a multivariate normal distribution with mean zero and covariance
and
, respectively. We also assume
is sparse, and is i.i.d. with marginally mean-zero nondegenerate entries and a bounded support, but otherwise do not impose its distribution. In model (1), we focus on the lag-1 autoregressive structure and homoscedastic errors for
, and we discuss potential extensions to other lag or error structures in the online Supplementary Materials. We also assume the time series
and
are stationary, by requiring that
, where ‖ · ‖2 is the spectral norm.
Our primary interest is the simultaneous inference, with a proper false discovery control, regarding the population transition tensor
under model (1). We consider the hypotheses,
![]() |
(2) |
where [p] = {1, …, p}, and
is the index set of interest, and aj, k, l’s are given constants. For our motivating example,
encodes the directed influences among the neural elements or regions, and how it is affected by the subject covariates. For instance, consider a simple example where
only contains the constant one, and a binary indicator if the ith subject is in the disorder group or healthy control. In this case,
encodes the baseline transition matrix, and
encodes the difference of the transition matrix between the disorder group and the baseline. Then (2) allows us to test if any pairs of connections Aj, k, l, for (j, k) ∈ [p] × [p], l = 1, 2, equals a given constant aj, k, l, for example, zero, or not. It thus addresses the question such as if and how the effective connectivity pattern is modified by the disorder status. For our inference problem, we consider the high-dimensional setting, in that the transition matrix dimension p2 can far exceed the time series length T.
We propose a simultaneous testing procedure for the hypotheses in (2). Our proposal involves 3 key steps. In the first step, we propose a modified expectation-maximization (EM) algorithm, by obtaining the E-step estimate using Kalman filtering and smoothing, then obtaining the M-step estimate of
through an intermediate estimator. This intermediate estimator allows us to effectively pool information across all subjects. We also establish the uniform consistency of the resulting EM estimator, in the form of the maximal estimation error across all
’s for i = 1, …, n. We show the statistical error of the EM estimation decays with the number of subjects, which in turn guarantees the statistical properties of the subsequent test. In the second step, we construct our test statistic, by first obtaining a bias-corrected estimator from the lagged auto-covariance of the noise that is reconstructed from the EM step, then regressing this estimator on the covariates
through a tensor regression. This tensor regression step again pools information across multiple subjects to build the test statistic. We show the resulting test statistic follows a normal distribution asymptotically. In the third step, we conduct a simultaneous testing procedure, by properly thresholding the test statistic. We show that our test achieves a consistent false discovery rate (FDR) control, and has the true positive rate (TPR) approaching one asymptotically.
There has been some research related to our problem. One line of relevant research involves various VAR models that study brain-effective connectivity. Notably, Gorrostieta et al. (2012) proposed a mixed-effect VAR model, where the connectivity structure was decomposed into a group-specific fixed-effect shared by all subjects in the same group, and a subject-specific random-effect. Gorrostieta et al. (2013) instead proposed a Bayesian hierarchical VAR to account for subject-specific variation, and induced group-specific components through a elastic net prior. Chiang et al. (2017) developed a multisubject VAR that allows for simultaneous inference on effective connectivity at both the subject and group level. These pioneering works have provided a useful statistical framework for modeling and comparing connectivity patterns between different groups. Nevertheless, they only work with discrete groups defined by some categorical covariates, but are not applicable for continuous-valued covariates. In addition, none tackles the inference problem like (2) in a frequentist way.
Another line of relevant research involves community-based VAR models with applications to financial and social networks. Specifically, Zhu et al. (2023) proposed a network VAR that leverages a latent group structure to model heterogeneous transition patterns among network nodes, and simultaneously estimated both model parameters and node memberships. Chen et al. (2023) modeled the community effect using the true community profiles from a stochastic block model, and incorporated the unknown cross-sectional dependence through non-community-related latent factors.
The third line of relevant research studies general VAR models, in terms of both sparse estimation of the transition matrix (Hsu et al., 2008; Song and Bickel, 2011; Negahban and Wainwright, 2011; Han et al., 2015, among others), and its inference (Krampe et al., 2018; Zheng and Raskutti, 2019). However, they all treat the time series as fully observed, and cannot work with the data with measurement error. Besides, the existing inference methods can not handle the simultaneous testing problem in (2) for individual pairs, but only perform global testing of the entire transition matrix. More recently, Lyu et al. (2022) proposed a simultaneous inference method for VAR with measurement error, but focused on a single-subject setting. By contrast, we target a multiple subjects setting, and consequently, our proposal is utterly different from Lyu et al. (2022) in several ways. First of all, although one can naively apply the EM method of Lyu et al. (2022) to estimate
for each subject separately, this approach does not pool information across multiple subjects effectively. We instead propose an intermediate estimator that integrates information from all subjects to estimate each
. We also show numerically that our method performs better than the naive solution. Second, when establishing the EM estimation consistency, we do not impose any strong distributional assumption on the error
, so do not seek the consistency of the estimator of the population transition
itself. Instead, we obtain a form of uniform consistency across all estimators of individual
’s for i = 1, …, n, which suffices to ensure the theoretical properties of the subsequent testing procedure. Third, our inferential target is the population transition
, instead of the individual
that Lyu et al. (2022) targets. As such, we employ an additional tensor regression step to pool information across subjects, which leads to a different test statistic and a different theoretical analysis from Lyu et al. (2022). Finally, when establishing the theoretical guarantees of our test, we allow the time series dimension p to grow exponentially with the time series length T, in contrast to the polynomial growth rate as in Lyu et al. (2022).
The rest of the article is organized as follows: we develop the modified EM algorithm in Section 2. We construct the test in Section 3. We report the simulations in Section 4, and illustrate with a brain connectivity analysis example in Section 5. We relegate all proofs and discussions of potential extensions to online Supplementary Materials.
2. EM ESTIMATION
2.1. Modified EM algorithm
To test the population transition
, we first turn our attention to the set of parameters
. It is difficult and restrictive to impose a distribution on the error
in model (1) due to the facts that
and
’s may display different supports for different subjects. On the other hand, for the purpose of inference, it is sufficient to estimate
’s, along with
and
, based on which we can construct our test statistic. Therefore, we develop a modified EM algorithm conditioning on
to estimate
.
Let
denote the observed time series for subject i, i = 1, …, n, and
the estimate of Θ at the (h − 1)th iteration. The E-step of the classical EM algorithm computes
![]() |
(3) |
where the conditional expectation
, for any t, t′ ∈ [T], is obtained via Kalman filtering and smoothing (Ghahramani and Hinton, 1996).
The usual M-step obtains an estimate of
as
![]() |
(4) |
Lyu et al. (2022) considered a sparse estimator of
using a generalized Dantzig selector. We instead consider a dense estimator of
in (4), because we are to further update this estimator later. We note that both the estimator in Lyu et al. (2022) and the one in (4) treat each subject i separately, and do not effectively aggregate information across multiple subjects. To address this issue, we next propose a modified estimate of
for the M-step, by introducing an intermediate estimator, to pool information from multiple subjects together.
Stack together
for i = 1, …, n to form a tensor
, with
, and let
,
. We consider the following intermediate estimator as the solution to the optimization problem,
![]() |
(5) |
where ‖ · ‖max denotes the matrix element-wise max norm. The optimization in (5) can be solved by linear programming in a fiber-by-fiber parallel fashion, whereas the tolerance parameter τ(h) is tuned via validation. We then obtain the M-step estimate of
as
![]() |
(6) |
The motivation to use the intermediate estimator
to estimate
is as follows. After a sufficient number of iterations, the maximizer in the M-step would be close to
, in that
, for all i = 1, …, n. Combining with model (1) yields that
. Rearranging the terms yields a tensor regression of the form,
![]() |
where
with
. This regression enables us to integrate information from all subjects, by first obtaining an intermediate estimator for
, then estimating each individual
. Taking derivative to
with respect to
yields that
, where ‖ · ‖F denotes the Frobenius norm. Together with the sparsity for
, it motivates the optimization formulation in (5).
We next update the noise variance estimate as
![]() |
(7) |
where the conditional expectation
is obtained via Kalman filtering and smoothing in the E-step.
We terminate the algorithm when the parameter estimates in 2 consecutive iterations are close. We summarize our estimation procedure is in Algorithm 1.
2.2. EM estimation consistency
We next establish a form of uniform consistency for our estimator from Algorithm 1, which is needed to guarantee the asymptotic properties of the subsequent test. Let
denote the distribution of the random error
.
Theorem 1
Suppose the following conditions hold almost surely in
.
The initial parameter
is in a neighborhood specified in Section 1.1 of online Supplementary Materials.
Suppose
and
, where
for any matrix
.
Suppose nT ≥ c1log p, and p ≥ c2d2, for some positive constants c1, c2.
The number of iterations h satisfies that h ≥ c3⌈log (Tnp)⌉, for some positive constant c3, where ⌈m⌉ denotes the largest integer smaller than m.
For any l ≤ h, the tolerance parameter
, for some constant
.
Then, the EM estimator
of Algorithm 1 at the hth iteration satisfies that,
(8) holds with probability at least
, for some positive constants c4 to c8, where
for any matrix
.
We make a few remarks about this theorem and its conditions. First of all, Condition (a) requires the initial value to be reasonably close to the truth in some technical sense, and we discuss it in more detail in Section 1.1. Condition (b) is mild, because in numerous applications including our real data example, n and p are in tens to hundreds, while d is small, so
is invertible. Besides, the variables in
are usually standardized. Condition (c) specifies the divergence rate of n, p with respect to T, d. It is weaker than a similar requirement in Lyu et al. (2022). We allow T to be fixed, while Lyu et al. (2022) required T to diverge. Condition (d) ensures a sufficient number of iterations, so that the computational error is dominated by the statistical error, and thus only the latter shows up in the error bounds in (8). Condition (e) specifies the tolerance parameter to ensure that the M-step updates are reasonably accurate. All these conditions are reasonable and mild.
Second, both the number of subjects n and the length of time series T appear in the denominator of error bounds in (8), which implies that the estimator is consistent as either n or T diverges to infinity. Moreover, the estimation of
is consistent if the time series dimension p scales logarithmically with nT. Meanwhile, the estimation error for
decays with a diverging p, indicating a phenomenon of “blessing of dimensionality”.
Finally, this theorem does not seek to establish the consistency of the estimator of the population transition
, because, as we explained earlier, we do not impose any distributional assumption on
. Nevertheless, Theorem 1 is sufficient to guarantee the statistical properties of the subsequent simultaneous testing.
3. SIMULTANEOUS INFERENCE
3.1. Test statistic
We first construct the test statistic for the population transition tensor
in the hypotheses in (2). The construction involves 2 key components: we employ the auto-covariance of some residual term to approximate the individual
, then we regress this auto-covariance on the covariates through a tensor regression for the final test statistic for
.
First, we observe that the time series
follows an autoregressive structure under model (1),
where the residual term
. Then, the lag-1 auto-covariance of
is:
![]() |
which suggests that we can approximate
through
. Since
is not observed, we consider a sample estimate, based on the estimated model parameters, as
![]() |
Moreover, recognizing that
is biased due to the estimation error, we further consider a bias-corrected version,
![]() |
Note that
can be any estimators of
, and we show later that the estimators from the modified EM algorithm are sufficient for our testing purpose.
Next, we fit a tensor regression to obtain an estimate of the transition tensor
to build our test statistic. Specifically, stack together
for i = 1, …, n to form a tensor
, with
. We regress
on the covariates
to obtain a least squares type estimate,
![]() |
The next proposition shows that
centers around
, and its dominating randomness comes from
, where we recall
. Let
,
, and
.
Proposition 1
There exist positive constants c1 to c4 and sr, such that, if nT ≥ c1log p, and
, then,
We remark that, as we expect the transition matrix
to be sparse, we require the error matrix
to be sparse too. The condition
implies that the row-wise maximum number of nonzero entries of
is bounded by O(sr), which helps curb the approximation error in Proposition 1.
We now construct our test statistic. For the null hypothesis
, consider
![]() |
(9) |
where ○ denotes the vector outer product, and the denominator is a standard deviation estimator of
. The next theorem shows that this test statistic follows a standard normal distribution under the null asymptotically and uniformly over all entries.
Theorem 2
Suppose the following conditions hold.
Suppose
, and
.
Suppose log p = o(T∧n), where ∧ denotes the minima.
Suppose
, Δ2 = op(n−1/4), Δη = op(n−1/2), and Δϵ = op(n−1/2).
Then,
uniformly over all (j, k, l) ∈ [p] × [p] × [d] as n → ∞.
3.2. Simultaneous testing procedure
We next develop a simultaneous testing procedure for the hypotheses (2) with a proper FDR control. We summarize the procedure in Algorithm 2.
Let
denote the set of true null hypotheses, and
the set of true alternatives. Given our test statistic
, we reject H0; jkl if
for some thresholding value u. Then the false discovery proportion (FDP), the FDR, and the TPR of our testing problem are,
![]() |
where ∨ denotes the maxima, and
is the indicator function.
Our key idea is to pick a thresholding value u that rejects as many true alternatives as possible, meanwhile controls the false discovery at the prespecified level α. In other word, we choose
. Since
in FDP(t) is unknown, but
by the normality of
under the null, where Φ(·) is the cumulative distribution function of a standard normal distribution, we propose to estimate the false rejections
in FDP(t) using
. Moreover, we restrict the search of u to the range
, since the solution
satisfies that
, as we show later in the proof of Theorem 3.
We next establish the theoretical guarantees of our testing procedure. We introduce 2 regularity conditions.
Assumption 1
Let
, and σj, k, l denote the standard deviation of the (j, k, l)th entry of
. There exist positive constants c1 and c2, such that
Assumption 2
Let π = (j, k, l) denote an index tuple,
denote the absolute correlation between the πth and the π′th entries of
, and
denote the maximal absolute correlation between any 2 different entries in
. There exist constants
, and c2, c3 > 0, such that
Assumption 1 is mild, as the required number of strong alternatives is logarithm of logarithm of
. Intuitively, if the number of alternatives is too small, then
for any u, and the resulting FDR is close to one regardless the thresholding value. Liu and Shao (2014) showed that this assumption is nearly necessary in the sense that the FDR control for large-scale simultaneous testing would fail if the number of true alternatives is fixed. Assumption 2 is also mild, which imposes a bound
on the number of strongly correlated entry pairs. The bound is weak in the sense that total possible number of pairs is only
, while a significant portion of entires are allowed to be strongly correlated. A similar assumption was also adopted in Xia et al. (2018); Lyu et al. (2022).
The next theorem shows that our proposed test controls FDR and FDP, asymptotically.
Theorem 3
Suppose the following conditions hold almost surely in
.
Suppose Condition (b) of Theorem 1 holds.
Suppose log (p)log (p2d) = o(T), and
for some positive constant c1.
Suppose
for some positive constant c2.
Suppose
, Δ2 = op({nlog (p2d)}−1/4), Δη = op({nlog (p2d)}−1/2), and Δϵ = op({nlog (p2d)}−1/2).
Then, for any
,
We make a few remarks. First, the FDR control requires a stronger estimation consistency, that is, Condition (e) of Theorem 3, when compared with that in the asymptotic normality, that is, Condition (c) of Theorem 2. Besides, the dimension p is allowed to grow polynomially with the subject size n. This is reasonable because, intuitively, the normality only regulates the marginal behavior, whereas the FDR control deals with all entries simultaneously. Second, the dimension p is allowed to grow exponentially with the time series lenght T, in contrast to the polynomial growth rate as in Lyu et al. (2022). This is because we pool all subjects to construct the test statistic. Finally, the slight deflation
in the limiting FDR comes from substituting
with
in the false rejection approximation.
The next theorem shows that the power of our proposed test approaches one asymptotically.
Theorem 4
Suppose the same conditions of Theorem 3 hold. Furthermore, suppose
. Then, when
,
, as
.
The above 2 theorems establish the sufficient conditions for a general set of estimators. The next corollary shows that, if we employ the EM estimator from Algorithm 1 to construct the test statistic
, then our proposed test enjoys the desired FDR and FDP control, as well as the power property.
Corollary 1
Suppose the following conditions hold almost surely in
.
Suppose Conditions (a), (b), (d), (e) in Theorem 1, and Conditions (a), (c), (d) in Theorem 3 hold.
Suppose (sr∨d)4log (p) = o(nT).
If we adopt the EM estimators
from Algorithm 1 to construct the test statistic
, then for any
,
Moreover, if
, then
, as
.
4. SIMULATIONS
4.1. Simulation setup
We carry out intensive simulations to study the finite-sample performance of both EM estimation and simultaneous inference. We also compare with some alternative solutions.
We consider 3 common network structures for the baseline population transition matrix
, including the banded, Erdös-Rényi, and stochastic block structures; see Figure 1 for illustration. For those nonzero entries of
, we first randomly assign values −1 or 1, then multiple all entries by
, so that
. We next randomly pick 20 entries of
to be nonzero, and randomly assign values −0.25 or 0.25. We set the rest of entries of
as zero. We set σϵ = ση = 0.8. We generate the covariates
from a multivariate truncated normal on [ − 1, 1] with mean zero and Toeplitz covariance matrix [0.5|j − k|]jk. We randomly set 20 entries in
nonzero, and randomly sample their values from Uniform[ − 0.2, 0.2]. We fix (d, T) = (5, 100) to mimic the dimension of the real data example. Meanwhile, we vary (n, p) = (80, 30), (120, 30), (120, 70) to study the effect of subject size and the dimension of time series. We repeat each simulation 100 times. In our implementation, we tune the tolerance parameter τh in (5) via validation, where we use the first 25% of data points for testing, the last 60% for training, and discard the middle 15% to reduce the temporal dependence between the training and testing samples. We choose the value that minimizes the average prediction error of the testing samples. For initialization of the individual EM algorithm, we initialize every
at
, and the error variances at 1e−3. We have experimented with some other initialization approaches in Section S4.2 of Lyu et al. (2022), and the results are relatively stable. For initialization of the joint EM algorithm, we initialize every
at
, and the error variances at 1e−3. Alternatively, one may apply the individual EM or lasso to get a sparse estimator to initialize A, and average the subject-specific error variance estimators across subjects to initialize
and
. We terminate our EM algorithm when the consecutive estimates are close, and we find our algorithm converges fast, usually within 10 iterations.
FIGURE 1.
Network structures of the baseline transition matrix
, where the black dots represent the nonzero entries.
4.2. Simulation results
We evaluate our method in three ways: first, the estimation accuracy for the set of individual parameters
, since our test statistic is built on those estimates; second, the estimation accuracy for the population transition tensor
, which is an intermediate step of our method; and finally, the selection accuracy of our simultaneous inference for
, which is our main target of this article. We also compare with 2 alternative solutions. The first alternative is to apply the lasso method in the regression of
on
to obtain a sparse estimate of
for each subject i = 1, …, n, separately. We note that this method essentially ignores the measurement error
in model (1), and treats the observed
as if it were the same as the true signal
. The second alternative is to apply the individual EM method of Lyu et al. (2022) to obtain a sparse estimate of
for each subject i = 1, …, n, again separately. For both of these alternative solutions, after obtaining the individual estimator
for each
, we further regress each element of
on
through ordinary least squares (OLS), then apply hard thresholding to obtain a sparse estimate for
. We adopt the common thresholding level
(Donoho and Johnstone, 1994), where
is the usual variance estimate in the regression of
on
.
First, we evaluate the estimation accuracy for
. Figure 2 reports the results based on 100 data replications. It is seen that our method performs the best in all settings. Meanwhile, the individual EM method suffers when the time series dimension p increases, since it does not pool information from all subjects.
FIGURE 2.
Estimation accuracy of
under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from top to bottom). Four evaluation criteria are reported:
,
, and
(from left to right). Three methods are compared: the proposed modified EM method (red solid line), the individual EM method (blue dashed line), and the lasso method (black dotted line). For
, the lasso method is not included because it does not produce an estimate for
nor
.
Next, we evaluate the estimation accuracy for
. We first note that, in our method, we can obtain an estimator of
through the optimization in (5), but it only serves as an intermediate quantity for our subsequent estimation and inference. This intermediate estimator helps pool information across different subjects, and helps produce a modified EM estimator for each individual
. We then establish the uniform consistency across all these modified EM estimators, and build our test statistic based upon them. But we do not seek to establish the consistency of this intermediate estimator of
itself, because this way we do not have to impose any distributional assumption on the error term
. Figure 3 reports the results based on 100 data replications. It is seen that our method again performs the best. Moreover, the lasso method that ignores the measurement error performs poorly when estimating the intercept
, while the individual EM method that does not pool information across subjects performs poorly when estimating the slope
.
FIGURE 3.
Estimation accuracy of A under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from left to right). Two evaluation criteria are reported: the estimation error in Frobenius norm of the intercept
, and the slope
(from top to bottom). Three methods are compared: the proposed modified EM method (red solid line), the individual EM method (blue dashed line), and the lasso method (black dotted line). For the latter 2 methods, they are coupled with an additional OLS regression with hard thresholding.
Finally, we evaluate the selection accuracy for
. We note that the 2 alternative solutions described above perform sparse estimation, not hypothesis testing. They can produce a sparse estimator of
, but do not produce any P-value quantification, nor any explicit FDR control. For hypothesis testing, we also add the individual EM-based testing method of Lyu et al. (2022) for comparison. Figure 4 reports the results based on 100 data replications. It is seen that our test both controls FDR at the nominal level of 5%, and maintains a reasonably high TPR. By contrast, the individual EM-based test suffers a low TPR on the slope
in all settings. Besides, its FDR on the slope inflates, and its TPR on the intercept
drops sharply when the dimension p increases. Moreover, the 2 sparse estimation solutions suffer a nearly zero TPR on both the intercept and slope.
FIGURE 4.
Selection accuracy of A under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from top to bottom). Two sets of evaluation criteria are report: FDR and TPR on intercept
, and FDR and TPR on slope
(from left to right). Four methods are compared: the proposed test (red solid line), the individual EM-based test (black dash-dotted line), the individual EM sparse estimation method (blue dashed line), and the lasso sparse estimation method (black dotted line).
5. APPLICATION ON BRAIN CONNECTIVITY ANALYSIS
We illustrate our proposed methods via the analysis of the task-evoked functional magnetic resonance imaging (fMRI) data in the Human Connectome Project (HCP Van Essen et al., 2013). Our analysis focuses on the language task fMRI data where the participants answer questions about Aesop’s fables (story condition) or math problems (math condition). The data includes n = 927 subjects from the HCP-1200 release after removing the missing values and quality controls (Sripada et al., 2020). The raw fMRI time series have been preprocessed via the HCP minimally preprocessed pipeline (Glasser et al., 2013) and registered into the standard MNI 2mm space. The voxel-level data have been summarized into region level time series of length T = 316 following the Power brain atlas (Power et al., 2011), which consists of 264 regions of interest (ROIs) in 13 functional modules. Our analysis aims to study the variations of the brain connectivity patterns among multiple subjects for p = 89 ROIs in the three functional modules: auditory (AD), default mode (DM), and salience (SA), which are related to language and cognitive processes (Poldrack, 2006). We consider the effects of d = 3 covariates, including age, sex and language task performance score. The initialization of the joint EM algorithm is done in the same way as in the simulation study.
Table 1 reports the selection frequency of the connectivity patterns within each functional module as well as for all pairs of functional modules. It also reports the estimated population baseline connectivity pattern (
) and the covariate effects (
) with stratifications of the positive effects and the negative effects. Figure 5 reports the selected connections based on the proposed FDR control method with the significance level
, and Figure 6 reports the summary of key findings in the selected connections within the functional modules and the across region pairs. In particular, for the population baseline connectivity with positive effects, the within-module ROI pairs are more frequently connected (45.56% for AD, 12.07% for DM, 32.72 % for SA) than the between-module ROIs (4.70% for the maximum frequency). Moreover, the AD presents intensive positive connectivities between the left and right hemispheres, while the SA has more activities in the prefrontal cortex and anterior cingulate cortex (Saur et al., 2008). For the covariate effects, the language task accuracy has relatively more connections with positive effects in SA (1.54%) but has relatively more connections with negative effects between DM and AD (3.58%) as well as between DM and SA (2.30%). This implies that, the higher accuracy promotes the brain activities within the SA, in particular between PFC and ACC (Pulvermüller, 2018), but reduces the activities between DM and AD, and between DM and SA (Zhang et al., 2019). Such findings agree well with the literature.
TABLE 1.
Selection frequency, that is, the selected connections over all possible number of connections, in percentage, among three functional modules: AD, DM, and SA.
| baseline | sex | age | language | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AD | DM | SA | AD | DM | SA | AD | DM | SA | AD | DM | SA | |
| positive | ||||||||||||
| AD | 45.56 | 0.93 | 3.85 | 0.00 | 0.13 | 0.00 | 0.59 | 1.06 | 0.85 | 0.00 | 0.53 | 0.43 |
| DM | 1.72 | 12.07 | 2.39 | 0.13 | 0.86 | 0.48 | 0.66 | 0.21 | 0.86 | 0.66 | 0.36 | 0.57 |
| SA | 4.70 | 3.45 | 32.72 | 0.43 | 0.29 | 0.00 | 0.85 | 0.77 | 0.00 | 0.85 | 0.38 | 1.54 |
| negative | ||||||||||||
| AD | 2.37 | 0.93 | 0.00 | 1.18 | 1.06 | 1.28 | 1.78 | 0.27 | 0.43 | 0.59 | 2.12 | 0.85 |
| DM | 2.25 | 0.33 | 2.68 | 0.53 | 0.51 | 0.29 | 0.40 | 0.42 | 0.48 | 1.46 | 0.36 | 1.25 |
| SA | 0.43 | 1.53 | 0.31 | 2.56 | 0.77 | 0.93 | 0.85 | 0.57 | 0.00 | 1.28 | 1.05 | 0.00 |
The direction of effectivity connectivity is from column module to row module. The upper table shows positive effects, and the bottom shows negative ones.
FIGURE 5.
Selected entries by the proposed testing method. The color scale ranges from blue (negative test statistics) to red (positive ones).
FIGURE 6.
Summary of key findings in the selected connections within the functional modules and the across region pairs. The notations + and − represent the connections with positive effects and negative effects, respectively.
We also run additional data analyses. One analysis is to compare the prediction accuracy of our proposed method with the 2 alternatives, the lasso method and the individual EM method of Lyu et al. (2022). For each subject, we withhold the observations of first 50 time points as testing data, and train the model on the rest of time points. We obtain the estimated time series
for the testing data from the 2 methods, then compute the prediction error ratio as
. The three methods achieve an error ratio of
,
, and
, respectively, where our method performs the best. Another analysis is to include additional covariates into
. In the above analysis, we have included age, sex and language task performance score, as those are commonly studied in the literature. We next include additionally 4 more covariates, including language math accuracy, processing speed, general cognitive ability, and race. We apply our method to the entire time series, obtain the fitted values
, compute the R2 value as
for each subject, then compute the average R2 across all subjects. The average R2 with d = 3 and d = 7 covariates are 0.792 and 0.795, respectively, which suggests that the three covariates have captured about the same variation as the 7 covariates. As such, we keep the analysis results with the 3 covariates.
ACKNOWLEDGMENTS
We thank 2 reviewers, the Associate Editor, and the Editor for their constructive comments, which help improve the paper.
Supplementary Material
Web Appendix containing the proofs of theorems in Section 2.2 and Section 3, auxiliary lemmas, additional discussions, and the computer code are available with this paper at the Biometrics website on Oxford Academic.
Contributor Information
Xiang Lyu, Division of Biostatistics, University of California, Berkeley, CA 94720, United States.
Jian Kang, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States.
Lexin Li, Division of Biostatistics, University of California, Berkeley, CA 94720, United States.
FUNDING
Kang’s research was partially supported by NIH grants R01DA048993, R01MH105561, and NSF grant IIS2123777. Li’s research was partially supported by NIH grants R01AG061303, R01AG062542, and NSF grant CIF-2102227.
CONFLICT OF INTEREST
None declared.
DATA AVAILABILITY
The Human Connectome Project (HCP) Open Access data used in Section 5 is publicly available (Human Connectome Project, 2024), and can be downloaded from the ConnectomeDB (https://db.humanconnectome.org/) for registered users.
References
- Bullmore E., Sporns O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews. Neuroscience, 10, 186–198. [DOI] [PubMed] [Google Scholar]
- Chen E. Y., Fan J., Zhu X. (2023). Community network auto-regression for high-dimensional time series. Journal of Econometrics, 235, 1239–1256. [Google Scholar]
- Chen G., Glen D., Saad Z., Hamilton J. P., Thomason M., Gotlib I. et al. (2011). Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis. Computers in Biology and Medicine, 41, 1142–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiang S., Guindani M., Yeh H. J., Haneef Z., Stern J. M., Vannucci M. (2017). Bayesian vector autoregressive model for multi-subject effective connectivity inference using multi-modal neuroimaging data. Human Brain Mapping, 38, 1311–1332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Donoho D. L., Johnstone I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455. [Google Scholar]
- Friston K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity, 1, 13–36. [DOI] [PubMed] [Google Scholar]
- Ghahramani Z., Hinton G. E. (1996). Parameter estimation for linear dynamical systems. Technical report. Toronto, Canada: University of Toronto. [Google Scholar]
- Glasser M. F., Sotiropoulos S. N., Wilson J. A., Coalson T. S., Fischl B., Andersson J. L. et al. (2013). The minimal preprocessing pipelines for the human connectome project. Neuroimage, 80, 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gorrostieta C., Fiecas M., Ombao H., Burke E., Cramer S. (2013). Hierarchical vector auto-regressive models and their applications to multi-subject effective connectivity. Frontiers in Computational Neuroscience, 7, 159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gorrostieta C., Ombao H., Bédard P., Sanes J. N. (2012). Investigating brain connectivity using mixed effects vector autoregressive models. NeuroImage, 59, 3347–3355. [DOI] [PubMed] [Google Scholar]
- Han F., Lu H., Liu H. (2015). A direct estimation of high dimensional stationary vector autoregressions. The Journal of Machine Learning Research, 16, 3115–3150. [Google Scholar]
- Hsu N.-J., Hung H.-L., Chang Y.-M. (2008). Subset selection for vector autoregressive processes using lasso. Computational Statistics and Data Analysis, 52, 3645–3657. [Google Scholar]
- Human Connectome Project. (2024). Connectomedb. https://db.humanconnectome.org/.
- Krampe J., Kreiss J., Paparoditis E. (2018). Bootstrap based inference for sparse high-dimensional time series models. arXiv preprint arXiv:1806.11083.
- Liu W., Shao Q.-M. (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics, 42, 2003–2025. [Google Scholar]
- Lyu X., Kang J., Li L. (2023). Statistical inference for high-dimensional vector autoregression with measurement error. Statistica Sinica, 1-25. [Google Scholar]
- Negahban S., Wainwright M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 39, 1069–1097. [Google Scholar]
- Poldrack R. A. (2006). Can cognitive processes be inferred from neuroimaging data?. Trends in Cognitive Sciences, 10, 59–63. [DOI] [PubMed] [Google Scholar]
- Power J. D., Cohen A. L., Nelson S. M., Wig G. S., Barnes K. A., Church J. A. et al. (2011). Functional network organization of the human brain. Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pulvermüller F. (2018). Neural reuse of action perception circuits for language, concepts and communication. Progress in Neurobiology, 160, 1–44. [DOI] [PubMed] [Google Scholar]
- Saur D., Kreher B. W., Schnell S., Kümmerer D., Kellmeyer P., Vry M.-S. et al. (2008). Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences, 105, 18035–18040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song S., Bickel P. J. (2011). Large vector auto regressions. arXiv preprint arXiv:1106.3915.
- Sripada C., Angstadt M., Rutherford S., Taxali A., Shedden K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41, 3186–3197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Essen D. C., Smith S. M., Barch D. M., Behrens T. E., Yacoub E., Ugurbil K. et al. (2013). The wu-minn human connectome project: an overview. Neuroimage, 80, 62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu T., Chan P., Hallett M. (2010). Effective connectivity of neural networks in automatic movements in parkinson’s disease. Neuroimage, 49, 2581–2587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia Y., Cai T., Cai T. T. (2018). Multiple testing of submatrices of a precision matrix with applications to identification of between pathway interactions. Journal of the American Statistical Association, 113, 328–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang M., Savill N., Margulies D. S., Smallwood J., Jefferies E. (2019). Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific reports, 9, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang T., Wu J., Li F., Caffo B., Boatman-Reich D. (2015). A dynamic directional model for effective brain connectivity using electrocorticographic (ecog) time series. Journal of the American Statistical Association, 110, 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng L., Raskutti G. (2019). Testing for high-dimensional network parameters in auto-regressive models. Electronic Journal of Statistics, 13, 4977–5043. [Google Scholar]
- Zhu X., Xu G., Fan J. (2023). Simultaneous estimation and group identification for network vector autoregressive model with heterogeneous nodes. Journal of Econometrics, 105564. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Web Appendix containing the proofs of theorems in Section 2.2 and Section 3, auxiliary lemmas, additional discussions, and the computer code are available with this paper at the Biometrics website on Oxford Academic.
Data Availability Statement
The Human Connectome Project (HCP) Open Access data used in Section 5 is publicly available (Human Connectome Project, 2024), and can be downloaded from the ConnectomeDB (https://db.humanconnectome.org/) for registered users.




































































