High-dimensional multisubject time series transition matrix inference with application to brain connectivity analysis

Xiang Lyu; Jian Kang; Lexin Li

doi:10.1093/biomtc/ujae021

. 2024 Apr 3;80(2):ujae021. doi: 10.1093/biomtc/ujae021

High-dimensional multisubject time series transition matrix inference with application to brain connectivity analysis

Xiang Lyu ¹, Jian Kang ², Lexin Li ^3,^✉

PMCID: PMC10988359 PMID: 38567733

ASTRACT

Brain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.

Keywords: brain connectivity analysis, expectation-maximization, functional magnetic resonance imaging, simultaneous inference, tensor regression, vector autoregression

1. INTRODUCTION

A high-dimensional time series model provides a useful tool for a wide range of scientific applications. Our motivation is brain-effective connectivity analysis, which quantifies directed influence of one neural element or region over another (Friston, 2011). A central question of interest is how effective connectivity pattern is affected by variations of subject conditions, which in turn reveals useful insights of neurological disorder pathology as well as normal brain development (Wu et al., 2010; Zhang et al., 2015). Vector autoregression (VAR) is a commonly used model in brain connectivity analysis (Bullmore and Sporns, 2009; Chen et al., 2011). In this article, we study the high-dimensional VAR model with measurement error and multiple subjects, and the transition matrix inference under this model.

Specifically, suppose the observed data Inline graphic consist of n subjects, where is the p-dimensional time series, and is the d-dimensional covariate vector, for the ith subject, i = 1, …, n, t = 1, …, T. Let contain the constant one, so the model includes the intercept term. Let denote the latent signal process of the ith subject, and Inline graphic can be viewed as the observed copy of with an additive measurement error. Suppose admits a lag-1 autoregressive structure, and the transition matrix depends on the covariates through a linear tensor regression model. That is,

(1)

where Inline graphic is the sparse transition matrix for the ith subject, but ’s do not necessarily share the same sparsity pattern, is the sparse population transition organized as a three-way tensor, and × ₃ is the mode-3 product between a tensor and a vector. The term is the measurement error for the observed time series, Inline graphic is the white noise of the latent signal, and is the random error that quantifies the remaining variation in that cannot be explained by the covariates . We assume the error terms and are independent and identically distributed (i.i.d.) following a multivariate normal distribution with mean zero and covariance Inline graphic and , respectively. We also assume is sparse, and is i.i.d. with marginally mean-zero nondegenerate entries and a bounded support, but otherwise do not impose its distribution. In model (1), we focus on the lag-1 autoregressive structure and homoscedastic errors for , and we discuss potential extensions to other lag or error structures in the online Supplementary Materials. We also assume the time series Inline graphic and are stationary, by requiring that , where ‖ · ‖₂ is the spectral norm.

Our primary interest is the simultaneous inference, with a proper false discovery control, regarding the population transition tensor Inline graphic under model (1). We consider the hypotheses,

(2)

where [p] = {1, …, p}, and Inline graphic is the index set of interest, and a_{j, k, l}’s are given constants. For our motivating example, encodes the directed influences among the neural elements or regions, and how it is affected by the subject covariates. For instance, consider a simple example where only contains the constant one, and a binary indicator if the ith subject is in the disorder group or healthy control. In this case, Inline graphic encodes the baseline transition matrix, and encodes the difference of the transition matrix between the disorder group and the baseline. Then (2) allows us to test if any pairs of connections A_{j, k, l}, for (j, k) ∈ [p] × [p], l = 1, 2, equals a given constant a_{j, k, l}, for example, zero, or not. It thus addresses the question such as if and how the effective connectivity pattern is modified by the disorder status. For our inference problem, we consider the high-dimensional setting, in that the transition matrix dimension p² can far exceed the time series length T.

We propose a simultaneous testing procedure for the hypotheses in (2). Our proposal involves 3 key steps. In the first step, we propose a modified expectation-maximization (EM) algorithm, by obtaining the E-step estimate using Kalman filtering and smoothing, then obtaining the M-step estimate of Inline graphic through an intermediate estimator. This intermediate estimator allows us to effectively pool information across all subjects. We also establish the uniform consistency of the resulting EM estimator, in the form of the maximal estimation error across all ’s for i = 1, …, n. We show the statistical error of the EM estimation decays with the number of subjects, which in turn guarantees the statistical properties of the subsequent test. In the second step, we construct our test statistic, by first obtaining a bias-corrected estimator from the lagged auto-covariance of the noise that is reconstructed from the EM step, then regressing this estimator on the covariates Inline graphic through a tensor regression. This tensor regression step again pools information across multiple subjects to build the test statistic. We show the resulting test statistic follows a normal distribution asymptotically. In the third step, we conduct a simultaneous testing procedure, by properly thresholding the test statistic. We show that our test achieves a consistent false discovery rate (FDR) control, and has the true positive rate (TPR) approaching one asymptotically.

There has been some research related to our problem. One line of relevant research involves various VAR models that study brain-effective connectivity. Notably, Gorrostieta et al. (2012) proposed a mixed-effect VAR model, where the connectivity structure was decomposed into a group-specific fixed-effect shared by all subjects in the same group, and a subject-specific random-effect. Gorrostieta et al. (2013) instead proposed a Bayesian hierarchical VAR to account for subject-specific variation, and induced group-specific components through a elastic net prior. Chiang et al. (2017) developed a multisubject VAR that allows for simultaneous inference on effective connectivity at both the subject and group level. These pioneering works have provided a useful statistical framework for modeling and comparing connectivity patterns between different groups. Nevertheless, they only work with discrete groups defined by some categorical covariates, but are not applicable for continuous-valued covariates. In addition, none tackles the inference problem like (2) in a frequentist way.

Another line of relevant research involves community-based VAR models with applications to financial and social networks. Specifically, Zhu et al. (2023) proposed a network VAR that leverages a latent group structure to model heterogeneous transition patterns among network nodes, and simultaneously estimated both model parameters and node memberships. Chen et al. (2023) modeled the community effect using the true community profiles from a stochastic block model, and incorporated the unknown cross-sectional dependence through non-community-related latent factors.

The third line of relevant research studies general VAR models, in terms of both sparse estimation of the transition matrix (Hsu et al., 2008; Song and Bickel, 2011; Negahban and Wainwright, 2011; Han et al., 2015, among others), and its inference (Krampe et al., 2018; Zheng and Raskutti, 2019). However, they all treat the time series as fully observed, and cannot work with the data with measurement error. Besides, the existing inference methods can not handle the simultaneous testing problem in (2) for individual pairs, but only perform global testing of the entire transition matrix. More recently, Lyu et al. (2022) proposed a simultaneous inference method for VAR with measurement error, but focused on a single-subject setting. By contrast, we target a multiple subjects setting, and consequently, our proposal is utterly different from Lyu et al. (2022) in several ways. First of all, although one can naively apply the EM method of Lyu et al. (2022) to estimate Inline graphic for each subject separately, this approach does not pool information across multiple subjects effectively. We instead propose an intermediate estimator that integrates information from all subjects to estimate each . We also show numerically that our method performs better than the naive solution. Second, when establishing the EM estimation consistency, we do not impose any strong distributional assumption on the error Inline graphic , so do not seek the consistency of the estimator of the population transition itself. Instead, we obtain a form of uniform consistency across all estimators of individual ’s for i = 1, …, n, which suffices to ensure the theoretical properties of the subsequent testing procedure. Third, our inferential target is the population transition Inline graphic , instead of the individual that Lyu et al. (2022) targets. As such, we employ an additional tensor regression step to pool information across subjects, which leads to a different test statistic and a different theoretical analysis from Lyu et al. (2022). Finally, when establishing the theoretical guarantees of our test, we allow the time series dimension p to grow exponentially with the time series length T, in contrast to the polynomial growth rate as in Lyu et al. (2022).

The rest of the article is organized as follows: we develop the modified EM algorithm in Section 2. We construct the test in Section 3. We report the simulations in Section 4, and illustrate with a brain connectivity analysis example in Section 5. We relegate all proofs and discussions of potential extensions to online Supplementary Materials.

2. EM ESTIMATION

2.1. Modified EM algorithm

To test the population transition Inline graphic , we first turn our attention to the set of parameters . It is difficult and restrictive to impose a distribution on the error in model (1) due to the facts that and ’s may display different supports for different subjects. On the other hand, for the purpose of inference, it is sufficient to estimate Inline graphic ’s, along with and , based on which we can construct our test statistic. Therefore, we develop a modified EM algorithm conditioning on to estimate .

Let Inline graphic denote the observed time series for subject i, i = 1, …, n, and the estimate of Θ at the (h − 1)th iteration. The E-step of the classical EM algorithm computes

(3)

where the conditional expectation Inline graphic , for any t, t′ ∈ [T], is obtained via Kalman filtering and smoothing (Ghahramani and Hinton, 1996).

The usual M-step obtains an estimate of Inline graphic as

(4)

Lyu et al. (2022) considered a sparse estimator of Inline graphic using a generalized Dantzig selector. We instead consider a dense estimator of in (4), because we are to further update this estimator later. We note that both the estimator in Lyu et al. (2022) and the one in (4) treat each subject i separately, and do not effectively aggregate information across multiple subjects. To address this issue, we next propose a modified estimate of Inline graphic for the M-step, by introducing an intermediate estimator, to pool information from multiple subjects together.

Stack together Inline graphic for i = 1, …, n to form a tensor , with , and let , . We consider the following intermediate estimator as the solution to the optimization problem,

(5)

where ‖ · ‖_max denotes the matrix element-wise max norm. The optimization in (5) can be solved by linear programming in a fiber-by-fiber parallel fashion, whereas the tolerance parameter τ^(h) is tuned via validation. We then obtain the M-step estimate of Inline graphic as

(6)

The motivation to use the intermediate estimator Inline graphic to estimate is as follows. After a sufficient number of iterations, the maximizer in the M-step would be close to , in that , for all i = 1, …, n. Combining with model (1) yields that . Rearranging the terms yields a tensor regression of the form,

where Inline graphic with . This regression enables us to integrate information from all subjects, by first obtaining an intermediate estimator for , then estimating each individual . Taking derivative to with respect to yields that , where ‖ · ‖_F denotes the Frobenius norm. Together with the sparsity for Inline graphic , it motivates the optimization formulation in (5).

We next update the noise variance estimate as

(7)

where the conditional expectation Inline graphic is obtained via Kalman filtering and smoothing in the E-step.

We terminate the algorithm when the parameter estimates in 2 consecutive iterations are close. We summarize our estimation procedure is in Algorithm 1.

2.2. EM estimation consistency

We next establish a form of uniform consistency for our estimator from Algorithm 1, which is needed to guarantee the asymptotic properties of the subsequent test. Let Inline graphic denote the distribution of the random error .

Theorem 1

Suppose the following conditions hold almost surely in .

The initial parameter is in a neighborhood specified in Section 1.1 of online Supplementary Materials.

Suppose and , where for any matrix .

Suppose nT ≥ c₁log p, and p ≥ c₂d², for some positive constants c₁, c₂.

The number of iterations h satisfies that h ≥ c₃⌈log (Tnp)⌉, for some positive constant c₃, where ⌈m⌉ denotes the largest integer smaller than m.

For any l ≤ h, the tolerance parameter , for some constant .

Then, the EM estimator of Algorithm 1 at the hth iteration satisfies that,

(8)

holds with probability at least , for some positive constants c₄ to c₈, where for any matrix .

We make a few remarks about this theorem and its conditions. First of all, Condition (a) requires the initial value to be reasonably close to the truth in some technical sense, and we discuss it in more detail in Section 1.1. Condition (b) is mild, because in numerous applications including our real data example, n and p are in tens to hundreds, while d is small, so Inline graphic is invertible. Besides, the variables in are usually standardized. Condition (c) specifies the divergence rate of n, p with respect to T, d. It is weaker than a similar requirement in Lyu et al. (2022). We allow T to be fixed, while Lyu et al. (2022) required T to diverge. Condition (d) ensures a sufficient number of iterations, so that the computational error is dominated by the statistical error, and thus only the latter shows up in the error bounds in (8). Condition (e) specifies the tolerance parameter to ensure that the M-step updates are reasonably accurate. All these conditions are reasonable and mild.

Second, both the number of subjects n and the length of time series T appear in the denominator of error bounds in (8), which implies that the estimator is consistent as either n or T diverges to infinity. Moreover, the estimation of Inline graphic is consistent if the time series dimension p scales logarithmically with nT. Meanwhile, the estimation error for decays with a diverging p, indicating a phenomenon of “blessing of dimensionality”.

Finally, this theorem does not seek to establish the consistency of the estimator of the population transition Inline graphic , because, as we explained earlier, we do not impose any distributional assumption on . Nevertheless, Theorem 1 is sufficient to guarantee the statistical properties of the subsequent simultaneous testing.

3. SIMULTANEOUS INFERENCE

3.1. Test statistic

We first construct the test statistic for the population transition tensor Inline graphic in the hypotheses in (2). The construction involves 2 key components: we employ the auto-covariance of some residual term to approximate the individual , then we regress this auto-covariance on the covariates through a tensor regression for the final test statistic for .

First, we observe that the time series Inline graphic follows an autoregressive structure under model (1), where the residual term . Then, the lag-1 auto-covariance of is:

which suggests that we can approximate Inline graphic through . Since is not observed, we consider a sample estimate, based on the estimated model parameters, as

Moreover, recognizing that Inline graphic is biased due to the estimation error, we further consider a bias-corrected version,

Note that Inline graphic can be any estimators of , and we show later that the estimators from the modified EM algorithm are sufficient for our testing purpose.

Next, we fit a tensor regression to obtain an estimate of the transition tensor Inline graphic to build our test statistic. Specifically, stack together for i = 1, …, n to form a tensor , with . We regress on the covariates to obtain a least squares type estimate,

The next proposition shows that Inline graphic centers around , and its dominating randomness comes from , where we recall . Let , , and .

Proposition 1

There exist positive constants c₁ to c₄ and s_r, such that, if nT ≥ c₁log p, and , then,

We remark that, as we expect the transition matrix Inline graphic to be sparse, we require the error matrix to be sparse too. The condition implies that the row-wise maximum number of nonzero entries of is bounded by O(s_r), which helps curb the approximation error in Proposition 1.

We now construct our test statistic. For the null hypothesis Inline graphic , consider

(9)

where ○ denotes the vector outer product, and the denominator is a standard deviation estimator of Inline graphic . The next theorem shows that this test statistic follows a standard normal distribution under the null asymptotically and uniformly over all entries.

Theorem 2

Suppose the following conditions hold.

Suppose , and .

Suppose log p = o(T∧n), where ∧ denotes the minima.

Suppose , Δ₂ = o_p(n^−1/4), Δ_η = o_p(n^−1/2), and Δ_ϵ = o_p(n^−1/2).

Then, uniformly over all (j, k, l) ∈ [p] × [p] × [d] as n → ∞.

3.2. Simultaneous testing procedure

We next develop a simultaneous testing procedure for the hypotheses (2) with a proper FDR control. We summarize the procedure in Algorithm 2.

Let Inline graphic denote the set of true null hypotheses, and the set of true alternatives. Given our test statistic , we reject H_{0; jkl} if for some thresholding value u. Then the false discovery proportion (FDP), the FDR, and the TPR of our testing problem are,

where ∨ denotes the maxima, and Inline graphic is the indicator function.

Our key idea is to pick a thresholding value u that rejects as many true alternatives as possible, meanwhile controls the false discovery at the prespecified level α. In other word, we choose Inline graphic . Since in FDP(t) is unknown, but by the normality of under the null, where Φ(·) is the cumulative distribution function of a standard normal distribution, we propose to estimate the false rejections in FDP(t) using . Moreover, we restrict the search of u to the range , since the solution Inline graphic satisfies that , as we show later in the proof of Theorem 3.

We next establish the theoretical guarantees of our testing procedure. We introduce 2 regularity conditions.

Assumption 1

Let , and σ_{j, k, l} denote the standard deviation of the (j, k, l)th entry of . There exist positive constants c₁ and c₂, such that

Assumption 2

Let π = (j, k, l) denote an index tuple, denote the absolute correlation between the πth and the π′th entries of , and denote the maximal absolute correlation between any 2 different entries in . There exist constants , and c₂, c₃ > 0, such that

Assumption 1 is mild, as the required number of strong alternatives is logarithm of logarithm of Inline graphic . Intuitively, if the number of alternatives is too small, then for any u, and the resulting FDR is close to one regardless the thresholding value. Liu and Shao (2014) showed that this assumption is nearly necessary in the sense that the FDR control for large-scale simultaneous testing would fail if the number of true alternatives is fixed. Assumption 2 is also mild, which imposes a bound Inline graphic on the number of strongly correlated entry pairs. The bound is weak in the sense that total possible number of pairs is only , while a significant portion of entires are allowed to be strongly correlated. A similar assumption was also adopted in Xia et al. (2018); Lyu et al. (2022).

The next theorem shows that our proposed test controls FDR and FDP, asymptotically.

Theorem 3

Suppose the following conditions hold almost surely in .

Suppose Assumptions 1 and 2 hold.

Suppose Condition (b) of Theorem 1 holds.

Suppose log (p)log (p²d) = o(T), and for some positive constant c₁.

Suppose for some positive constant c₂.

Suppose , Δ₂ = o_p({nlog (p²d)}^−1/4), Δ_η = o_p({nlog (p²d)}^−1/2), and Δ_ϵ = o_p({nlog (p²d)}^−1/2).

Then, for any ,

We make a few remarks. First, the FDR control requires a stronger estimation consistency, that is, Condition (e) of Theorem 3, when compared with that in the asymptotic normality, that is, Condition (c) of Theorem 2. Besides, the dimension p is allowed to grow polynomially with the subject size n. This is reasonable because, intuitively, the normality only regulates the marginal behavior, whereas the FDR control deals with all entries simultaneously. Second, the dimension p is allowed to grow exponentially with the time series lenght T, in contrast to the polynomial growth rate as in Lyu et al. (2022). This is because we pool all subjects to construct the test statistic. Finally, the slight deflation Inline graphic in the limiting FDR comes from substituting with in the false rejection approximation.

The next theorem shows that the power of our proposed test approaches one asymptotically.

Theorem 4

Suppose the same conditions of Theorem 3 hold. Furthermore, suppose . Then, when , , as .

The above 2 theorems establish the sufficient conditions for a general set of estimators. The next corollary shows that, if we employ the EM estimator from Algorithm 1 to construct the test statistic Inline graphic , then our proposed test enjoys the desired FDR and FDP control, as well as the power property.

Corollary 1

Suppose the following conditions hold almost surely in .

Suppose Conditions (a), (b), (d), (e) in Theorem 1, and Conditions (a), (c), (d) in Theorem 3 hold.

Suppose (s_r∨d)⁴log (p) = o(nT).

If we adopt the EM estimators from Algorithm 1 to construct the test statistic , then for any ,

Moreover, if , then , as .

4. SIMULATIONS

4.1. Simulation setup

We carry out intensive simulations to study the finite-sample performance of both EM estimation and simultaneous inference. We also compare with some alternative solutions.

We consider 3 common network structures for the baseline population transition matrix Inline graphic , including the banded, Erdös-Rényi, and stochastic block structures; see Figure 1 for illustration. For those nonzero entries of , we first randomly assign values −1 or 1, then multiple all entries by , so that . We next randomly pick 20 entries of to be nonzero, and randomly assign values −0.25 or 0.25. We set the rest of entries of Inline graphic as zero. We set σ_ϵ = σ_η = 0.8. We generate the covariates from a multivariate truncated normal on [ − 1, 1] with mean zero and Toeplitz covariance matrix [0.5^{|j − k|}]_jk. We randomly set 20 entries in nonzero, and randomly sample their values from Uniform[ − 0.2, 0.2]. We fix (d, T) = (5, 100) to mimic the dimension of the real data example. Meanwhile, we vary (n, p) = (80, 30), (120, 30), (120, 70) to study the effect of subject size and the dimension of time series. We repeat each simulation 100 times. In our implementation, we tune the tolerance parameter τ_h in (5) via validation, where we use the first 25% of data points for testing, the last 60% for training, and discard the middle 15% to reduce the temporal dependence between the training and testing samples. We choose the value that minimizes the average prediction error of the testing samples. For initialization of the individual EM algorithm, we initialize every Inline graphic at , and the error variances at 1e⁻³. We have experimented with some other initialization approaches in Section S4.2 of Lyu et al. (2022), and the results are relatively stable. For initialization of the joint EM algorithm, we initialize every at , and the error variances at 1e⁻³. Alternatively, one may apply the individual EM or lasso to get a sparse estimator to initialize A, and average the subject-specific error variance estimators across subjects to initialize Inline graphic and . We terminate our EM algorithm when the consecutive estimates are close, and we find our algorithm converges fast, usually within 10 iterations.

Network structures of the baseline transition matrix , where the black dots represent the nonzero entries.

4.2. Simulation results

We evaluate our method in three ways: first, the estimation accuracy for the set of individual parameters Inline graphic , since our test statistic is built on those estimates; second, the estimation accuracy for the population transition tensor , which is an intermediate step of our method; and finally, the selection accuracy of our simultaneous inference for , which is our main target of this article. We also compare with 2 alternative solutions. The first alternative is to apply the lasso method in the regression of Inline graphic on to obtain a sparse estimate of for each subject i = 1, …, n, separately. We note that this method essentially ignores the measurement error in model (1), and treats the observed as if it were the same as the true signal . The second alternative is to apply the individual EM method of Lyu et al. (2022) to obtain a sparse estimate of Inline graphic for each subject i = 1, …, n, again separately. For both of these alternative solutions, after obtaining the individual estimator for each , we further regress each element of on through ordinary least squares (OLS), then apply hard thresholding to obtain a sparse estimate for . We adopt the common thresholding level Inline graphic (Donoho and Johnstone, 1994), where is the usual variance estimate in the regression of on .

First, we evaluate the estimation accuracy for Inline graphic . Figure 2 reports the results based on 100 data replications. It is seen that our method performs the best in all settings. Meanwhile, the individual EM method suffers when the time series dimension p increases, since it does not pool information from all subjects.

Estimation accuracy of under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from top to bottom). Four evaluation criteria are reported: , , and (from left to right). Three methods are compared: the proposed modified EM method (red solid line), the individual EM method (blue dashed line), and the lasso method (black dotted line). For , the lasso method is not included because it does not produce an estimate for nor .

Next, we evaluate the estimation accuracy for Inline graphic . We first note that, in our method, we can obtain an estimator of through the optimization in (5), but it only serves as an intermediate quantity for our subsequent estimation and inference. This intermediate estimator helps pool information across different subjects, and helps produce a modified EM estimator for each individual Inline graphic . We then establish the uniform consistency across all these modified EM estimators, and build our test statistic based upon them. But we do not seek to establish the consistency of this intermediate estimator of itself, because this way we do not have to impose any distributional assumption on the error term Inline graphic . Figure 3 reports the results based on 100 data replications. It is seen that our method again performs the best. Moreover, the lasso method that ignores the measurement error performs poorly when estimating the intercept , while the individual EM method that does not pool information across subjects performs poorly when estimating the slope Inline graphic .

Estimation accuracy of A under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from left to right). Two evaluation criteria are reported: the estimation error in Frobenius norm of the intercept , and the slope (from top to bottom). Three methods are compared: the proposed modified EM method (red solid line), the individual EM method (blue dashed line), and the lasso method (black dotted line). For the latter 2 methods, they are coupled with an additional OLS regression with hard thresholding.

Finally, we evaluate the selection accuracy for Inline graphic . We note that the 2 alternative solutions described above perform sparse estimation, not hypothesis testing. They can produce a sparse estimator of , but do not produce any P-value quantification, nor any explicit FDR control. For hypothesis testing, we also add the individual EM-based testing method of Lyu et al. (2022) for comparison. Figure 4 reports the results based on 100 data replications. It is seen that our test both controls FDR at the nominal level of 5%, and maintains a reasonably high TPR. By contrast, the individual EM-based test suffers a low TPR on the slope Inline graphic in all settings. Besides, its FDR on the slope inflates, and its TPR on the intercept drops sharply when the dimension p increases. Moreover, the 2 sparse estimation solutions suffer a nearly zero TPR on both the intercept and slope.

Selection accuracy of A under three network structures: the banded, Erdös-Rényi, and stochastic block structures (from top to bottom). Two sets of evaluation criteria are report: FDR and TPR on intercept , and FDR and TPR on slope (from left to right). Four methods are compared: the proposed test (red solid line), the individual EM-based test (black dash-dotted line), the individual EM sparse estimation method (blue dashed line), and the lasso sparse estimation method (black dotted line).

5. APPLICATION ON BRAIN CONNECTIVITY ANALYSIS

We illustrate our proposed methods via the analysis of the task-evoked functional magnetic resonance imaging (fMRI) data in the Human Connectome Project (HCP Van Essen et al., 2013). Our analysis focuses on the language task fMRI data where the participants answer questions about Aesop’s fables (story condition) or math problems (math condition). The data includes n = 927 subjects from the HCP-1200 release after removing the missing values and quality controls (Sripada et al., 2020). The raw fMRI time series have been preprocessed via the HCP minimally preprocessed pipeline (Glasser et al., 2013) and registered into the standard MNI 2mm space. The voxel-level data have been summarized into region level time series of length T = 316 following the Power brain atlas (Power et al., 2011), which consists of 264 regions of interest (ROIs) in 13 functional modules. Our analysis aims to study the variations of the brain connectivity patterns among multiple subjects for p = 89 ROIs in the three functional modules: auditory (AD), default mode (DM), and salience (SA), which are related to language and cognitive processes (Poldrack, 2006). We consider the effects of d = 3 covariates, including age, sex and language task performance score. The initialization of the joint EM algorithm is done in the same way as in the simulation study.

Table 1 reports the selection frequency of the connectivity patterns within each functional module as well as for all pairs of functional modules. It also reports the estimated population baseline connectivity pattern ( Inline graphic ) and the covariate effects () with stratifications of the positive effects and the negative effects. Figure 5 reports the selected connections based on the proposed FDR control method with the significance level , and Figure 6 reports the summary of key findings in the selected connections within the functional modules and the across region pairs. In particular, for the population baseline connectivity with positive effects, the within-module ROI pairs are more frequently connected (45.56% for AD, 12.07% for DM, 32.72 % for SA) than the between-module ROIs (4.70% for the maximum frequency). Moreover, the AD presents intensive positive connectivities between the left and right hemispheres, while the SA has more activities in the prefrontal cortex and anterior cingulate cortex (Saur et al., 2008). For the covariate effects, the language task accuracy has relatively more connections with positive effects in SA (1.54%) but has relatively more connections with negative effects between DM and AD (3.58%) as well as between DM and SA (2.30%). This implies that, the higher accuracy promotes the brain activities within the SA, in particular between PFC and ACC (Pulvermüller, 2018), but reduces the activities between DM and AD, and between DM and SA (Zhang et al., 2019). Such findings agree well with the literature.

TABLE 1.

Selection frequency, that is, the selected connections over all possible number of connections, in percentage, among three functional modules: AD, DM, and SA.

	baseline			sex			age			language
	AD	DM	SA	AD	DM	SA	AD	DM	SA	AD	DM	SA
	positive
AD	45.56	0.93	3.85	0.00	0.13	0.00	0.59	1.06	0.85	0.00	0.53	0.43
DM	1.72	12.07	2.39	0.13	0.86	0.48	0.66	0.21	0.86	0.66	0.36	0.57
SA	4.70	3.45	32.72	0.43	0.29	0.00	0.85	0.77	0.00	0.85	0.38	1.54
	negative
AD	2.37	0.93	0.00	1.18	1.06	1.28	1.78	0.27	0.43	0.59	2.12	0.85
DM	2.25	0.33	2.68	0.53	0.51	0.29	0.40	0.42	0.48	1.46	0.36	1.25
SA	0.43	1.53	0.31	2.56	0.77	0.93	0.85	0.57	0.00	1.28	1.05	0.00

Open in a new tab

The direction of effectivity connectivity is from column module to row module. The upper table shows positive effects, and the bottom shows negative ones.

Selected entries by the proposed testing method. The color scale ranges from blue (negative test statistics) to red (positive ones).

Summary of key findings in the selected connections within the functional modules and the across region pairs. The notations + and − represent the connections with positive effects and negative effects, respectively.

We also run additional data analyses. One analysis is to compare the prediction accuracy of our proposed method with the 2 alternatives, the lasso method and the individual EM method of Lyu et al. (2022). For each subject, we withhold the observations of first 50 time points as testing data, and train the model on the rest of time points. We obtain the estimated time series Inline graphic for the testing data from the 2 methods, then compute the prediction error ratio as . The three methods achieve an error ratio of , , and , respectively, where our method performs the best. Another analysis is to include additional covariates into . In the above analysis, we have included age, sex and language task performance score, as those are commonly studied in the literature. We next include additionally 4 more covariates, including language math accuracy, processing speed, general cognitive ability, and race. We apply our method to the entire time series, obtain the fitted values Inline graphic , compute the R² value as for each subject, then compute the average R² across all subjects. The average R² with d = 3 and d = 7 covariates are 0.792 and 0.795, respectively, which suggests that the three covariates have captured about the same variation as the 7 covariates. As such, we keep the analysis results with the 3 covariates.

ACKNOWLEDGMENTS

We thank 2 reviewers, the Associate Editor, and the Editor for their constructive comments, which help improve the paper.

Supplementary Material

ujae021_Supplemental_Files

Web Appendix containing the proofs of theorems in Section 2.2 and Section 3, auxiliary lemmas, additional discussions, and the computer code are available with this paper at the Biometrics website on Oxford Academic.

ujae021_supplemental_files.zip^{(293.1KB, zip)}

Contributor Information

Xiang Lyu, Division of Biostatistics, University of California, Berkeley, CA 94720, United States.

Jian Kang, Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, United States.

Lexin Li, Division of Biostatistics, University of California, Berkeley, CA 94720, United States.

FUNDING

Kang’s research was partially supported by NIH grants R01DA048993, R01MH105561, and NSF grant IIS2123777. Li’s research was partially supported by NIH grants R01AG061303, R01AG062542, and NSF grant CIF-2102227.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The Human Connectome Project (HCP) Open Access data used in Section 5 is publicly available (Human Connectome Project, 2024), and can be downloaded from the ConnectomeDB (https://db.humanconnectome.org/) for registered users.

References

Bullmore E., Sporns O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews. Neuroscience, 10, 186–198. [DOI] [PubMed] [Google Scholar]
Chen E. Y., Fan J., Zhu X. (2023). Community network auto-regression for high-dimensional time series. Journal of Econometrics, 235, 1239–1256. [Google Scholar]
Chen G., Glen D., Saad Z., Hamilton J. P., Thomason M., Gotlib I. et al. (2011). Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis. Computers in Biology and Medicine, 41, 1142–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chiang S., Guindani M., Yeh H. J., Haneef Z., Stern J. M., Vannucci M. (2017). Bayesian vector autoregressive model for multi-subject effective connectivity inference using multi-modal neuroimaging data. Human Brain Mapping, 38, 1311–1332. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donoho D. L., Johnstone I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455. [Google Scholar]
Friston K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity, 1, 13–36. [DOI] [PubMed] [Google Scholar]
Ghahramani Z., Hinton G. E. (1996). Parameter estimation for linear dynamical systems. Technical report. Toronto, Canada: University of Toronto. [Google Scholar]
Glasser M. F., Sotiropoulos S. N., Wilson J. A., Coalson T. S., Fischl B., Andersson J. L. et al. (2013). The minimal preprocessing pipelines for the human connectome project. Neuroimage, 80, 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gorrostieta C., Fiecas M., Ombao H., Burke E., Cramer S. (2013). Hierarchical vector auto-regressive models and their applications to multi-subject effective connectivity. Frontiers in Computational Neuroscience, 7, 159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gorrostieta C., Ombao H., Bédard P., Sanes J. N. (2012). Investigating brain connectivity using mixed effects vector autoregressive models. NeuroImage, 59, 3347–3355. [DOI] [PubMed] [Google Scholar]
Han F., Lu H., Liu H. (2015). A direct estimation of high dimensional stationary vector autoregressions. The Journal of Machine Learning Research, 16, 3115–3150. [Google Scholar]
Hsu N.-J., Hung H.-L., Chang Y.-M. (2008). Subset selection for vector autoregressive processes using lasso. Computational Statistics and Data Analysis, 52, 3645–3657. [Google Scholar]
Human Connectome Project. (2024). Connectomedb. https://db.humanconnectome.org/.
Krampe J., Kreiss J., Paparoditis E. (2018). Bootstrap based inference for sparse high-dimensional time series models. arXiv preprint arXiv:1806.11083.
Liu W., Shao Q.-M. (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics, 42, 2003–2025. [Google Scholar]
Lyu X., Kang J., Li L. (2023). Statistical inference for high-dimensional vector autoregression with measurement error. Statistica Sinica, 1-25. [Google Scholar]
Negahban S., Wainwright M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 39, 1069–1097. [Google Scholar]
Poldrack R. A. (2006). Can cognitive processes be inferred from neuroimaging data?. Trends in Cognitive Sciences, 10, 59–63. [DOI] [PubMed] [Google Scholar]
Power J. D., Cohen A. L., Nelson S. M., Wig G. S., Barnes K. A., Church J. A. et al. (2011). Functional network organization of the human brain. Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pulvermüller F. (2018). Neural reuse of action perception circuits for language, concepts and communication. Progress in Neurobiology, 160, 1–44. [DOI] [PubMed] [Google Scholar]
Saur D., Kreher B. W., Schnell S., Kümmerer D., Kellmeyer P., Vry M.-S. et al. (2008). Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences, 105, 18035–18040. [DOI] [PMC free article] [PubMed] [Google Scholar]
Song S., Bickel P. J. (2011). Large vector auto regressions. arXiv preprint arXiv:1106.3915.
Sripada C., Angstadt M., Rutherford S., Taxali A., Shedden K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41, 3186–3197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Essen D. C., Smith S. M., Barch D. M., Behrens T. E., Yacoub E., Ugurbil K. et al. (2013). The wu-minn human connectome project: an overview. Neuroimage, 80, 62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu T., Chan P., Hallett M. (2010). Effective connectivity of neural networks in automatic movements in parkinson’s disease. Neuroimage, 49, 2581–2587. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xia Y., Cai T., Cai T. T. (2018). Multiple testing of submatrices of a precision matrix with applications to identification of between pathway interactions. Journal of the American Statistical Association, 113, 328–339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang M., Savill N., Margulies D. S., Smallwood J., Jefferies E. (2019). Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific reports, 9, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang T., Wu J., Li F., Caffo B., Boatman-Reich D. (2015). A dynamic directional model for effective brain connectivity using electrocorticographic (ecog) time series. Journal of the American Statistical Association, 110, 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng L., Raskutti G. (2019). Testing for high-dimensional network parameters in auto-regressive models. Electronic Journal of Statistics, 13, 4977–5043. [Google Scholar]
Zhu X., Xu G., Fan J. (2023). Simultaneous estimation and group identification for network vector autoregressive model with heterogeneous nodes. Journal of Econometrics, 105564. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ujae021_Supplemental_Files

ujae021_supplemental_files.zip^{(293.1KB, zip)}

Data Availability Statement

[bib1] Bullmore E., Sporns O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews. Neuroscience, 10, 186–198. [DOI] [PubMed] [Google Scholar]

[bib2] Chen E. Y., Fan J., Zhu X. (2023). Community network auto-regression for high-dimensional time series. Journal of Econometrics, 235, 1239–1256. [Google Scholar]

[bib3] Chen G., Glen D., Saad Z., Hamilton J. P., Thomason M., Gotlib I. et al. (2011). Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis. Computers in Biology and Medicine, 41, 1142–55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Chiang S., Guindani M., Yeh H. J., Haneef Z., Stern J. M., Vannucci M. (2017). Bayesian vector autoregressive model for multi-subject effective connectivity inference using multi-modal neuroimaging data. Human Brain Mapping, 38, 1311–1332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Donoho D. L., Johnstone I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455. [Google Scholar]

[bib6] Friston K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity, 1, 13–36. [DOI] [PubMed] [Google Scholar]

[bib7] Ghahramani Z., Hinton G. E. (1996). Parameter estimation for linear dynamical systems. Technical report. Toronto, Canada: University of Toronto. [Google Scholar]

[bib8] Glasser M. F., Sotiropoulos S. N., Wilson J. A., Coalson T. S., Fischl B., Andersson J. L. et al. (2013). The minimal preprocessing pipelines for the human connectome project. Neuroimage, 80, 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Gorrostieta C., Fiecas M., Ombao H., Burke E., Cramer S. (2013). Hierarchical vector auto-regressive models and their applications to multi-subject effective connectivity. Frontiers in Computational Neuroscience, 7, 159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Gorrostieta C., Ombao H., Bédard P., Sanes J. N. (2012). Investigating brain connectivity using mixed effects vector autoregressive models. NeuroImage, 59, 3347–3355. [DOI] [PubMed] [Google Scholar]

[bib11] Han F., Lu H., Liu H. (2015). A direct estimation of high dimensional stationary vector autoregressions. The Journal of Machine Learning Research, 16, 3115–3150. [Google Scholar]

[bib12] Hsu N.-J., Hung H.-L., Chang Y.-M. (2008). Subset selection for vector autoregressive processes using lasso. Computational Statistics and Data Analysis, 52, 3645–3657. [Google Scholar]

[bib13] Human Connectome Project. (2024). Connectomedb. https://db.humanconnectome.org/.

[bib14] Krampe J., Kreiss J., Paparoditis E. (2018). Bootstrap based inference for sparse high-dimensional time series models. arXiv preprint arXiv:1806.11083.

[bib15] Liu W., Shao Q.-M. (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics, 42, 2003–2025. [Google Scholar]

[bib16] Lyu X., Kang J., Li L. (2023). Statistical inference for high-dimensional vector autoregression with measurement error. Statistica Sinica, 1-25. [Google Scholar]

[bib17] Negahban S., Wainwright M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics, 39, 1069–1097. [Google Scholar]

[bib18] Poldrack R. A. (2006). Can cognitive processes be inferred from neuroimaging data?. Trends in Cognitive Sciences, 10, 59–63. [DOI] [PubMed] [Google Scholar]

[bib19] Power J. D., Cohen A. L., Nelson S. M., Wig G. S., Barnes K. A., Church J. A. et al. (2011). Functional network organization of the human brain. Neuron, 72, 665–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Pulvermüller F. (2018). Neural reuse of action perception circuits for language, concepts and communication. Progress in Neurobiology, 160, 1–44. [DOI] [PubMed] [Google Scholar]

[bib21] Saur D., Kreher B. W., Schnell S., Kümmerer D., Kellmeyer P., Vry M.-S. et al. (2008). Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences, 105, 18035–18040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Song S., Bickel P. J. (2011). Large vector auto regressions. arXiv preprint arXiv:1106.3915.

[bib23] Sripada C., Angstadt M., Rutherford S., Taxali A., Shedden K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41, 3186–3197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Van Essen D. C., Smith S. M., Barch D. M., Behrens T. E., Yacoub E., Ugurbil K. et al. (2013). The wu-minn human connectome project: an overview. Neuroimage, 80, 62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Wu T., Chan P., Hallett M. (2010). Effective connectivity of neural networks in automatic movements in parkinson’s disease. Neuroimage, 49, 2581–2587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Xia Y., Cai T., Cai T. T. (2018). Multiple testing of submatrices of a precision matrix with applications to identification of between pathway interactions. Journal of the American Statistical Association, 113, 328–339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Zhang M., Savill N., Margulies D. S., Smallwood J., Jefferies E. (2019). Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific reports, 9, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Zhang T., Wu J., Li F., Caffo B., Boatman-Reich D. (2015). A dynamic directional model for effective brain connectivity using electrocorticographic (ecog) time series. Journal of the American Statistical Association, 110, 93–106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Zheng L., Raskutti G. (2019). Testing for high-dimensional network parameters in auto-regressive models. Electronic Journal of Statistics, 13, 4977–5043. [Google Scholar]

[bib30] Zhu X., Xu G., Fan J. (2023). Simultaneous estimation and group identification for network vector autoregressive model with heterogeneous nodes. Journal of Econometrics, 105564. [Google Scholar]

PERMALINK

High-dimensional multisubject time series transition matrix inference with application to brain connectivity analysis

Xiang Lyu

Jian Kang

Lexin Li

ASTRACT

1. INTRODUCTION

2. EM ESTIMATION

2.1. Modified EM algorithm

2.2. EM estimation consistency

Theorem 1

3. SIMULTANEOUS INFERENCE

3.1. Test statistic

Proposition 1

Theorem 2

3.2. Simultaneous testing procedure

Assumption 1

Assumption 2

Theorem 3

Theorem 4

Corollary 1

4. SIMULATIONS

4.1. Simulation setup

FIGURE 1.

4.2. Simulation results

FIGURE 2.

FIGURE 3.

FIGURE 4.

5. APPLICATION ON BRAIN CONNECTIVITY ANALYSIS

TABLE 1.

FIGURE 5.

FIGURE 6.

ACKNOWLEDGMENTS

Supplementary Material

Contributor Information

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases