Summary:
Brain functional connectivity reveals the synchronization of brain systems through correlations in neurophysiological measures of brain activities. Growing evidence now suggests that the brain connectivity network experiences alterations with the presence of numerous neurological disorders, thus differential brain network analysis may provide new insights into disease pathologies. The data from neurophysiological measurement are often multi-dimensional and in a matrix form, posing a challenge in brain connectivity analysis. Existing graphical model estimation methods either assume a vector normal distribution that in essence requires the columns of the matrix data to be independent, or fail to address the estimation of differential networks across different populations. To tackle these issues, we propose an innovative Matrix-Variate Differential Network (MVDN) model. We exploit the D-trace loss function and a Lasso-type penalty to directly estimate the spatial differential partial correlation matrix, and use an ADMM algorithm for the optimization problem. Theoretical and simulation studies demonstrate that MVDN significantly outperforms other state-of-the-art methods in dynamic differential network analysis. We illustrate with a functional connectivity analysis of an Attention Deficit Hyperactivity Disorder (ADHD) dataset. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies.
Keywords: Brain network, Differential network analysis, fMRI, Matrix data, Graphical model, Variable Selection
1. Introduction
Brain connectivity analysis has been at the foreground of neuroscience research. Brain functional connectivity reveals the synchronization of brain systems through correlations in the neurophysiological measurement of brain activities, which map the intrinsic functional architecture of the brain when measured during the resting state (Varoquaux and Craddock, 2013). Growing evidence now suggests that the brain connectivity network experience alterations with the presence of numerous neurological disorders, including Alzheimer’s disease, attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (Fox and Greicius, 2010; Daliri and Behroozi, 2013).
The functional magnetic resonance imaging (fMRI) is one of the mainstream imaging modalities to study brain functional connectivity. It is of great interest to investigate how the network of connected brain regions alters from the disease group to the control group, i.e., the differential network of brain regions between the two groups. Our motivating example is the ADHD-200 study of resting-state fMRI of children and adolescents. It includes 2 groups of subjects: combined types of ADHD and typically developing control (TDC). An illustration of the data structure is given in Figure 1. In this paper, we focus on the problem of comparing brain functional connectivity patterns between ADHD and TDC groups. Such differential network analysis will provide new insights into ADHD pathologies.
Figure 1.

Illustration of brain connectivity analysis: (A) Brain imaging data; (B) Matrix data; (C) Correlation structure; (D) Brain connectivity network; ➀ indicates directly estimating the differential network, i.e., the different of two precision matrices; ➁ and ➂ indicates separately estimating the connectivity network. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
Differential network modelling has been widely applied in disease biology (Tian et al., 2016; Ji et al., 2016; Yuan et al., 2016; Ji et al., 2017; Zhang et al., 2017; Grimes et al., 2019), which is typically modelled as the difference of the precision matrices (Zhao et al., 2014; Xia et al., 2015; Yuan et al., 2015; He et al., 2018). However, the literature mentioned above mostly focuses on differential network modelling for vector-form data, while the observed fMRI data in brain functional connectivity study are in the matrix form (see Figure 1A–1B), where the row variables represent the brain regions while the column variables represent time points. Typically, the number of brain regions used in brain differential network analysis is of the order 102 and the number of time points is around 150 to 200. The primary interest is to infer the spatial connectivity network. For matrix-form data, most existing literature focuses on estimating a single graph (Leng and Tang, 2012; Zhou, 2014) or jointly estimating multiple graphs (Zhu and Li, 2018) (see Figure 1C–1D ➁➂), while few studies investigate the differential network modelling for matrix-form data directly (see Figure 1C–1D ➀). It is natural to first separately or jointly estimate the precision matrices and then subtract the estimates to obtain the differential network. However, the joint or separate estimating methods impose the restrictive assumption that both the precision matrices are sparse and thus may result in unreliable conclusions especially when hub nodes exist. Both the simulation study and real data analysis in the following sections illustrate the inferiority of the straightforward approach. This motivates us to consider the direct estimate of the differential network for two matrix-form data groups. Another line of researches in neuroscience community focuses on graph-based multiple hypothesis testing methods, such as Zalesky et al. (2010); Pan et al. (2014); Kim and Pan (2015); Xia and Li (2017, 2019), typically aiming at controlling family-wise error rate or false discovery rate and the techniques involved are quite different from the penalized estimation methods considered here.
The most common type of Matrix-valued data is the fMRI data, where each row of the matrix corresponds to the times series of blood-oxygen-level averaged over the voxels in one Region Of Interest (ROI). For the matrix-valued data, applying the existing differential network estimation method for the vector-form data (by assuming the row variable to follow a vector normal distribution) would ignore the dependence between the columns of the matrix data and result in incorrect conclusion. For example, the columns in fMRI study corresponding to times series of repeatedly measured brain activity are highly correlated. To tackle this issue, we propose a differential network model under the matrix-normal framework for matrix-variate data. The main contribution of the current work lies in the following aspects. First, we propose a Matrix-Variate Differential Network (MVDN) model which is particularly useful in modelling connectivity alteration for matrix-variate data such as in brain connectivity analysis. Second, we propose a computationally efficient algorithm for estimating the MVDN, which directly estimate the spatial partial correlation matrix difference without attempting to estimate the partial correlation matrices separately. Third, the proposed approach fully excavates the temporal information without assuming the columns of the matrix data to be independent compared with the vector-normal graphical models. Finally, we show that the proposed method can identify all differential edges accurately with probability tending to 1 under mild and regular conditions as spatial locations p, times points q, and sample size n go to infinity.
The rest of the paper is organized as follows. In Section 2, we describe our model, estimation method, and inferential procedure. In Section 3 we assess the performance of our method via the Monte Carlo simulation studies. Section 4 illustrates the proposed method through the ADHD-200 study. We summarize our method and present some future directions in Section 5. Mathematical derivations will be deferred in the Web Appendix S1.
2. Methods
2.1. Notation and Model Setup
The following notations are adopted throughout of the paper. Let A = [aij] be a square matrix of dimension d, define , , , . We denote the trace of A as Tr(A) and let Vec(A) be the vector obtained by stacking the columns of A. The notation ⊗ represents Kronecker product. For a set , denote by the cardinality of . For two real numbers p, q, define p ∨ q = max(p, q).
Let Xp×q be the spatial-temporal matrix data from an image modality with p spatial locations and q times points. Vec(Xp×q) is formed by stacking the columns of Xp×q into a vector in . Denote by and the covariance matrices of p spatial locations and q times points, respectively. We say Xp×q follows a matrix normal distribution with the Kronecker product covariance structure Σ = ΣT ⊗ ΣS, written as
if and only if Vec(Xp×q) follows a multivariate normal distribution with mean Vec(Mp×q) and covariance Σ = ΣT ⊗ ΣS. Correspondingly, we have
where and denote the spatial and temporal precision matrices, respectively. Note that ΣS and ΣT are only identifiable up to a scaled factor, as for any c > 0. In fact, in the brain connected network analysis, the partial correlation is a commonly adopted correlation measure (Peng et al., 2009; Zhu and Li, 2018). Besides, the primary interest in brain connectivity analysis is to infer the connectivity network characterized by the spatial precision matrix ΩS, while the temporal precision matrix ΩT is of little interest and thus can be viewed as a nuisance parameter. In other words, under the matrix normal framework a region-by-region spatial partial correlation matrix characterizes the brain connectivity graph, in which nodes represent brain regions, and links measure conditional dependence between the brain regions. Brain connectivity analysis is equivalently transformed into the estimation of spatial partial correlation matrix. We remark here that the matrix-normal distribution framework has been widely adopted in real application and is scientifically plausible in neuroimaging analysis, see, for example, Yin and Li (2012); Leng and Tang (2012); Zhou (2014); Xia and Li (2017); Zhu and Li (2018); Xia and Li (2019).
Let X, be the spatial-temporal matrices of the diseased and the healthy control groups. We assume that X, Y follow a matrix normal distribution with the Kronecker product covariance structure. The mean matrices, without loss of generality, are assumed to be zero. In other words, , and . Under the matrix normal framework, let
where is the partial correlation matrix of the spatial locations in diseased group and is the diagonal matrix of . Similarly, for the healthy control group, we can define and .
The primary interest of this paper is to detect spatial locations whose connectivities, in terms of the spatial partial correlation, differ across the two groups. We define the differential network between two groups as the difference between the two spatial partial correlation matrices, denoted by . On the other hand, , are viewed as nuisance parameters.
2.2. Estimation
Let {X1,…, Xn1} and {Y1,…, Yn2}, each element of which is a matrix with dimension p×q, be two sets of i.i.d. random samples from the independent matrix normal populations X and Y, respectively. The D-trace loss function takes a very simple form and can be used to directly estimate the difference of two precision matrices (Yuan et al., 2015). For the convenience of theoretical development, we adopt the D-trace loss function adapted for matrix-valued data, i.e., we propose to estimate Δ by minimizing the following loss function:
where and denote the correlation versions of and , respectively. It can be shown that the Hessian matrix with respect to Δ of the loss function is , which indicates that the loss function L(Δ, , ) is convex with respect to Δ and has a unique minimizer at . In brain connectivity alteration detection analysis, it is often the case that the altered connectivity in the spatial networks of two groups is far less compared with the dimensionality, which motivates us to add a penalty to the D-trace loss function. Generally, we can add a decomposable non-convex penalty function which has the form Pλ(δ) = ∑j,k pλ(Δj,k) such as the smoothly clipped absolute deviation (SCAD) penalty. For simplicity, we consider a Lasso penalty here. Finally Δ is estimated by solving the following optimization problem:
| (1) |
where λ is a tuning parameter and , are sample correlation matrices defined as
| (2) |
where for a square matrix C, . Note that the sample correlation matrices in (2) are based on the matrix-valued observations. The convergence rate of the estimators and were established by Zhou (2014) in the high dimensional regime, which facilitates our subsequent theoretical analysis. The optimization problem in (1) can be solved by the Alternating Direction method of Multipliers (ADMM) algorithm. To present the detailed procedure, we first introduce two multivariate functions G(·,·,·,·) and H(·,·). For symmetric p×p matrices A and B, let and be the corresponding eigenvalue decompositions respectively, where and . Define function G(·,·,·,·) as
where D = (Dij) with and ○ denotes the Hadamard (entrywise) product of two matrices. For a positive λ, define a map H(·,·) as follows:
In detail, we summarize the procedure in Algorithm 1. Note that auxiliary matrices Δ1, Δ2, Δ3 and Λ1, Λ2, Λ3 are involved in Algorithm 1, the details can be found in the Web Appendix S2.

In the simulation study and real data analysis, we set the step size ρ = 1 and terminate the algorithm if at the k + 1-th step, we have
The tuning parameter λ is selected by minimizing the Bayesian information criterion, i.e., λ is chosen to minimize
where L(λ) represents the loss function based on either matrix L1 or LF norm which are defined by
2.3. Inference
Under certain regularity conditions (A1), (A2), (A3) as shown in the Web Appendix S1, we have the following theorem.
Theorem 1:
Assume that Assumptions (A1), (A2) and (A3) hold and min(n1, n2) > log(p ∨ q)/q. Further assume that
| (3) |
for some constant c > 0. Then with the tuning parameter properly chosen, we have with probability tending to 1 as (n1, n2, p, q) → ∞, where and and sgn(x) = −I(x < 0) + I(x > 0).
The detailed proof is given in the Web Appendix S1. Theorem 1 shows that the proposed estimator in (1) shares exactly the same support as the true differential partial correlation matrix Δ with probability tending to 1 under mild and regular conditions, indicating that the proposed method can identify all differential edges accurately as dimensions of matrix-valued data p, q and sample size n go to infinity. In fact, as long as the minimum signal strength condition in (3) is satified, then recover not only the support of Δ but also the signs of its nonzero entries. For more details, please refer to the Web Appendix S1.
3. Simulation studies
3.1. Data Generation Settings
In this subsection, we introduce simulation designs for evaluating the performance of the proposed matrix-variate differential network estimation method. For the temporal covariance matrices, we considered the following two types of structures. Type 1 is an autoregressive model, with elements and with elements , 1 ⩽ i, i′ ⩽ q. Type 2 is a moving average model, with nonzero elements for |i − i′| ⩽ 3 and with nonzero elements for |i − i′| ⩽ 4. Note that and are both nuisance parameters. We mainly focus on the spatial graphs of two groups encoded in and respectively. We first generate the edge set for the group X. To this end, we consider the following three graph structures: a hub graph, a scale-free graph, and a small-world graph, as shown in Figure 2.
Figure 2.

Three types of graph used in our simulation studies: (A) Hub graph; (B) Scale-free graph; (C) Small-world graph. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
Scenario 1: Hub graph We partition p features into 5 equally-sized and non-overlapping sets: C1 ∪ C2 ⋯∪ C5 = {1,…, p}, |Cs| = p/5, Cs ∩ Ct = ∅, s, t = 1,…, 5. For the smallest i ∈ Cs, we set for all {j ≠ i : j ∈ Cs}.
Scenario 2: Scale-free graph Initially only one edge exists in . Then the graph grows such that each new node is connected to one existed node with the probability proportional to the degree of the node. In practice, we use R package “huge” to generate such a graph structure.
Scenario 3: Small-world graph The graph is generated by the R package “rags2ridges” with 5 starting neighbors and 5% probability of rewiring (for further detail, one may refer to Wieringen and Peeters (2016)).
The non-zero entries of is then determined by the edge set . The value of each nonzero entry of was generated from a uniform distribution with support [−0.5,−0.1]∪[0.1, 0.5]. To ensure positive definiteness of , let . We then proceed to generate the differential matrix Δ0. For Scenario 1, we randomly select two hub nodes from the 5 equally-sized and non-overlapping sets and the differential matrix Δ0 is generated such that the connections of these two hub nodes change sign between and . For Scenario 2 and Scenario 3, we randomly select 40% of the edges in and the differential matrix Δ0 is generated such that the corresponding connections change sign between and . The covariance matrix and are generated by and , respectively. Finally we generate n1 i.i.d observations of X from the distribution and n2 i.i.d observations of Y from the distribution.
We examine a range of spatial and temporal dimensions and the sample sizes: p = {50, 100, 300}, q = {50, 100} and n1 = n2 = {20, 50}, which are consistent with the usual setup in the functional connectivity analysis. All the simulation results are based on 100 replications.
We evaluate the performance of the estimation methods from the view of support recovery. The support recovery results are evaluated by true positive rate (TPR), true discovery rate TDR, and true negative rate (TNR) along a range of tuning parameter λ. Suppose that the true difference matrix Δ has the support and its estimator has the support set . TPR, TDR, and TNR are defined as follows:
where TP and TN are the numbers of true positives and true negatives respectively, which are defined as
We compare the proposed MVDN method with the joint multiple matrix Gaussian graphs estimation method in Zhu and Li (2018). Note that in Zhu and Li (2018), non-convex and convex penalties are proposed to jointly estimate matrix graphs and we simply denote the corresponding methods as Non-convex and Convex. Moreover, Zhu and Li (2018) showed the advantage of directly working with the matrix data rather than the vector-valued data after whitening in their simulation study, so there is no need to compare the matrix-variate methods with vector-valued methods such as Zhao et al. (2014), Cai et al. (2016) and He et al. (2017). Following the suggestion of one reviewer, we also compare MVDN with two well-known vector-valued methods with whitening, CLIME (Cai et al., 2011) and GLASSO (Friedman et al., 2008). Note that for CLIME and GLASSO, we separately estimate the precision matrix for each group and take the difference of the estimated precision matrices as an estimate of the differential network.
3.2. Evaluation of computational complexity
We carry out simulations for Scenario 1 to compare the computing speed of different methods with n = {20, 50}, q = {50, 100} and p = {50, 100, 200, 300}. Time (in seconds) is computed using a single core, Intel(R) Xeon(R) CPU E5-2697 v3 at 2.60GHz and 128 Gbytes memory, which is shown in Table 1.
Table 1.
Average time (in seconds) over 20 replications.
| N = 20 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| q = 50 | q = 100 | |||||||||
| p | MVDN | Non-convex | Convex | GLASSO | CLIME | MVDN | Non-convex | Convex | GLASSO | CLIME |
| 50 | 0.31 | 38.35 | 38.95 | 3.13 | 13.48 | 0.32 | 32.08 | 30.60 | 13.78 | 24.90 |
| 100 | 2.26 | 240.38 | 242.77 | 8.34 | 115.31 | 2.06 | 182.99 | 168.91 | 29.65 | 140.44 |
| 200 | 18.71 | 2343.59 | 2362.52 | 36.14 | 1660.23 | 16.01 | 1488.51 | 1497.20 | 82.99 | 1679.67 |
| 300 | 72.33 | 10964.37 | 10996.85 | 132.60 | 10431.46 | 57.71 | 5509.32 | 5511.80 | 222.80 | 10068.62 |
| N = 50 | ||||||||||
| q = 50 | q = 100 | |||||||||
| p | MVDN | Non-convex | Convex | GLASSO | CLIME | MVDN | Non-convex | Convex | GLASSO | CLIME |
| 50 | 0.30 | 33.20 | 30.68 | 3.21 | 12.36 | 0.31 | 30.50 | 28.51 | 13.71 | 23.64 |
| 100 | 2.01 | 168.84 | 169.63 | 7.13 | 99.17 | 2.01 | 154.63 | 155.83 | 28.68 | 124.01 |
| 200 | 15.24 | 1277.72 | 1275.26 | 21.16 | 1217.06 | 14.16 | 1117.74 | 1167.39 | 64.73 | 1326.04 |
| 300 | 54.14 | 4653.04 | 4639.22 | 53.10 | 8372.24 | 49.15 | 3312.87 | 3320.16 | 122.91 | 7565.68 |
3.3. Simulation Results
To evaluate the support recovery performance of the methods, we plot the ROC curve and the Precision-Recall (PR) curve for various scenarios. Figure 3 depicts the ROC curve and PR curve for Scenarios 1 (Hub Graph) with p = 50, 100, 300 and autoregressive temporal covariance structure, in which the solid line represents MVDN, dotted line and dashed line represents the Convex method and Non-convex method in Zhu and Li (2018), respectively. The top panels in Figure 3 are the ROC curves for p = 50, 100, 300, demonstrating that the MVDN performs uniformly better than the Convex and Non-convex methods. Also, the Convex and Non-convex methods perform nearly the same, and both work much better than the CLIME and GLASSO methods. The advantage of the MVDN method is clearly shown from the Precision-Recall curves in the bottom panels of Figure 3. When the true positive rate (TPR) does not exceed 80%, the true discovery rate (TDR) of the MVDN method is always 100% while those of Convex method, Non-convex method, CLIME and GLASSO do not exceed 50%. Figure 3 also shows that as dimension p increases, the performance of all methods deteriorates, which is expected as the graph becomes large. It is noteworthy that the MVDN method generally performs robustly as p varies. Similar conclusions drawn from Figure 3 can be observed from Figure 4 and Figure 5, which correspond to Scenario 2 (Scale-free graph) and Scenario 3 (Small-world graph), respectively. From Figure 3 to Figure 5, we can see that all methods perform similarly for different types of graph structures and the performances of our MVDN method are not sensitive to the tuning parameter λ. To save space, we put all the other simulation results in the Web Appendix S3, where we can also see that for the same Scenarios, as sample size n1, n2 increases (with p, q fixed), the performance of all methods improves. We also craft a simulation setting whereby the data exhibits properties of resting state fMRI data and is not assumed to follow the matrix-normal distribution. The results indicates that the MVDN method is not sensitive to the matrix-normal assumption and we put the detailed results in Web Appendix S3. Furthermore, with n, p fixed, as q increases, the performance of all methods improves as we collect more temporal data. Table 1 shows the average time (in seconds) for different methods, indicating that the MVDN method is computationally more efficient than its competitors.
Figure 3.

ROC curve (upper panels) and Precision-Recall curve (lower panels) for Scenario 1 with p = 50, 100, 300 and autoregressive temporal covariance structure. Solid, dotted, dashed, dot-and-dash, and long dashed lines represent MVDN, the convex method, the non-convex method, GLASSO and CLIME respectively. The sample size n1 = n2 = 20 and q = 50. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
Figure 4.

ROC curve (upper panels) and Precision-Recall curve (lower panels) for Scenario 2 with p = 50, 100, 300 and autoregressive temporal covariance structure. Solid, dotted, dashed, dot-and-dash, and long dashed lines represent MVDN, the convex method, the non-convex method, GLASSO and CLIME respectively. The sample size n1 = n2 = 20 and q = 50. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
Figure 5.

ROC curve (upper panels) and Precision-Recall curve (lower panels) for for Scenario 3 with p = 50, 100, 300 and autoregressive temporal covariance structure. Solid lines represent MVDN, dotted lines represent convex method, dashed lines represent non-convex method, dot-and-dash lines represent GLASSO, and long dashed lines represent CLIME. The sample size n1 = n2 = 20 and q = 50. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
In summary, the simulation results show that the MVDN method outperforms its competitors in all of the scenarios, illustrating its advantage. The joint estimating methods result in unreliable conclusions especially when hub nodes exist as we discussed in the introduction. In addition, the simulation results further show that the non-convex and convex methods outperform the state of the art vector-normal-based methods such as CLIME and GLASSO, which is consistent with the conclusion in Zhu and Li (2018) and further indicates the advantage of MVDN method against the vector-normal-based methods.
4. Application to Attention Deficit Hyperactivity Disorder (ADHD) data
Attention deficit hyperactivity disorder (ADHD), characterized by difficulty in paying attention, excessive activity, or difficulty in controlling behavior, is one of the most commonly diagnosed child-onset neurodevelopmental disorders. The prevalence of ADHD is estimated to be 5–10% worldwide, and annual cost is estimated to range from $143 billion to $266 billion dollars (Doshi et al., 2012). Despite being the most commonly studied and diagnosed mental disorder in children and adolescents, the exact cause of ADHD is unknown in the majority of cases.
In this study, we analyze a dataset from the ADHD-200 Global Competition. The dataset includes demographical information and resting-state fMRI of nearly one thousand children and adolescents, including both combined types of ADHD and typically developing control (TDC). The data were collected from eight participating sites. We focus our analysis on the fMRI data from the New York University site only to avoid potential site bias, following the same strategy as in Ahn et al. (2015). The data that we analyze are the preprocessed version using the Athena pipeline (Bellec et al., 2017), see the ADHD-200 dataset website (2019). All fMRI scans have been preprocessed by slice timing correction, motion correction, spatial smoothing, denoising by regressing out motion parameters and white matter and cerebrospinal fluid time courses. Each brain image was parcellated into 116 Regions Of Interest (ROIs) using the Anatomical Automatic Labeling (AAL) atlas. Following the strategy in Xia and Li (2019), we only use the scan that passed the quality control and this information is provided in the phenotypic data. Furthermore, if both scans of a subject passed the quality control, we choose the first scan. If neither scan passed the quality control, we delete the subject from further analysis. We also remove the subjects with missing diagnostic status or missing scans, which finally results in a dataset consisted of 96 combined ADHD subjects and 91 TDC subjects. The time series of voxels within the same ROI were then averaged, resulting in a spatial-temporal data matrix for each individual subject, with spatial dimension p = 116 and temporal dimension q = 172.
In Figure 6, we show the differential edges identified by different methods, the orange (this figure appears in color in the electronic version of this article, and any mention of color refers to that version) edges show an increase in partial correlation dependency from the TDC group to the ADHD group and grey edges show a decrease. The tuning parameter λ for method MVDN is selected by minimizing the Bayesian information criterion, i.e., λ is chosen to minimize (n1 + n2)L(λ) + log(n1 + n2)∥Δ∥0 where L(λ) represents the loss function based on the L1 norm introduced in Section 2. For the Non-convex and Convex methods, λ is selected by minimizing a prediction criterion by using five fold cross-validation as described in Zhu and Li (2018). In Figure 6, (A)–(C) show the differential edges identified by MVDN from the axial view, coronal view and sagittal view, respectively; (D)-(F) show the differential edges identified by the non-convex method from the axial, coronal, and sagittal views, respectively; and (G)-(I) show the differential edges identified by the convex method from the axial, coronal, and sagittal views, respectively. The detailed information for the selected 22 differential edges by the MVDN method is presented in the Web Appendix S4. The convex and non-convex methods select so many differential edges (3001 edges for non-convex and 4744 edges for convex) that we only show the top 10% edges with the largest absolute values in Figure 6, i.e., 300 edges for the non-convex method and 474 edges for the convex method.
Figure 6.

Differential edges and the associated brain regions identified by various procedures for the ADHD resting-state fMRI data. (A)-(C) corresponds to the MVDN method, (D)-(F) corresponds to the non-convex method and (G)-(I) corresponds to the convex method. Orange edges show an increase in partial correlation dependency from TDC group to ADHD group; grey edges show a decrease. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.
As we can see from Figure 6, the convex and non-convex methods select so dense differential edges that it is really difficult to draw any useful conclusions. For the MVDN method, from (A)-(C) in Figure 6, the affected brain regions in ADHD are mainly located in the cerebellum, paracentral lobule, inferior temporal gyrus, inferior occipital gyrus, and middle frontal gyrus orbital part. The cerebellum is recognized as an important structure in cognition and in ADHD pathophysiology (Buckner, 2013), which is also found to be structurally connected with prefrontal and striatal circuits that are implicated in ADHD (Bostan et al., 2013). Besides, in structural neuroimaging studies, growing evidence has revealed that the reduced volumes of the cerebellum or its subregions in ADHD correlate with attentional problems and clinical outcomes (Makris et al., 2013; Stoodley, 2014). Additionally, fMRI studies have revealed the decreased cerebellar activation in ADHD during performance of a number of cognitive tasks (Valera et al., 2005; Suskauer et al., 2008; Valera et al., 2010). Kucyi et al. (2015) also showed that in ADHD the cerebellar default mode network exhibits disrupted resting state functional connectivity of several brain networks spanning sensory and association areas of the cerebral cortex. Another critical brain region detected by our MVDN method is the paracentral lobule, which is located on the medial surface of the cerebral hemisphere and includes parts of both the frontal and parietal lobes. The paracentral lobule controls motor and sensory innervations of the contralateral lower extremity, and plays an important role in functional brain networks. Dickstein et al. (2006) adopted a meta-analysis method and found that the right paracentral lobule has a greater probability of activation in patients with ADHD than that in controls. Inferior temporal gyrus displays significant differences in the resting-state functional connectivity of anterior insula between adults with ADHD and healthy controls (Zhao et al., 2017). Lei et al. (2015) integrated several studies and concluded that there exists significantly increased activation in the right inferior occipital gyrus during response inhibition in ADHD. In addition, inferior occipital gyrus is also the potential targeting region for therapeutic interventions in ADHD. Shang et al. (2016) found that inattention improvement is correlated with increased intrinsic brain activity in the inferior occipital gyrus for treatment with atomoxetine. Finally, our MVDN method identifies middle frontal gyrus orbital as a crucial region related to ADHD. In fact, evidence from multiple modalities including neurocognitive, neuropharmacological, neuroimaging, have shown that the dopaminergic treatment or neurocognitive functions of the middle frontal gyrus region is impaired in pediatric and adult ADHD (Tafazoli et al., 2013).
We also select the tuning parameters for convex and non-convex methods such that the numbers of detected differential edges by these two methods are comparable with that of the MVDN method. The results show that the selected edges by convex and non-convex methods are totally different from those selected by MVDN. The detailed information for the selected differential edges by the convex and non-convex methods is presented in the Web Appendix S5.
In summary, our findings are in general consistent with evidences from a wide range of neuroscience and clinical studies. The proposed matrix-variate differential network framework can effectively capture the dynamic brain connectivity alteration and provide insights into neural network mechanisms of ADHD. The findings may motivate future investigations to explore the mechanisms of disrupted functional connectivity in ADHD and other neurological diseases as well as discover potential targeting regions for therapeutic interventions.
5. Discussion
In this paper, we established a matrix-variate differential network model assuming that the matrix-variate data follows a matrix-normal distribution, and proposed to exploit the D-trace loss function and a Lasso-type penalty to estimate the spatial differential partial correlation matrix directly. Theoretically we showed that the proposed method can identify all differential edges accurately with probability tending to 1 in high-dimensional setting under mild and regular conditions. Simulation studies demonstrated the matrix-variate differential network models enjoy great advantages over existing methods. The proposed matrix-variate differential network model is very flexible and may provide deeper understanding of the brain connectivity alteration mechanism.
Our work could be extended in the following aspects. First, in this paper, we exploit the D-trace loss function and a Lasso-type penalty to estimate the spatial differential partial correlation matrix. We can consider the constrained ℓ1 minimization approach as in Zhao et al. (2014). The theoretical properties of the estimator will require investigation, but we expect that the empirical performance should be similar. Second, the matrix-normal distribution assumption for the brain connectivity data could be somewhat restrictive. Ning and Liu (2013) propose a semiparametric extension of the matrix-normal distribution, termed the “matrix-nonparanormal” distribution. We can assume that the brain connectivity data follow the matrix-nonparanormal distribution and establish the corresponding differential network model. This extension is more challenging because there exists diverging number of unknown transformation functions. Another interesting direction would be using non-parametric techniques to provide statistical significance for the identified edges. We briefly investigate this in Web Appendix S3 and more work is warranted. It’s also interesting to compare with multiple testing procedures such as the Network Based Statistic (NBS) method by Zalesky et al. (2010) in the future.
Supplementary Material
Acknowledgements
This work was supported by grants from the National Natural Science Foundation of China [grant number 81803336, 11801316]; Natural Science Foundation of Shandong Province [grant number ZR2018BH033, ZR2019QA002]; National Institute on Aging of The National Institute of Health [grant number R01AG057555 to L.X.], and National Center for Advancing Translational Sciences (grant number UL1TR002345 to L.L). We would like to thank professor Yunzhang Zhu for providing the simulation codes of their Non-convex and Convex methods. We would also like to thank the Editor, the Associate Editor and the anonymous reviewers for their constructive comments that led to a major improvement of this article.
Footnotes
Supporting Information
Web Appendices, referenced in Section 1–5, is available with this paper at the Biometrics website on Wiley Online Library. Code and an example are also available. The R package “MVDN” for implementing the methods is available at https://github.com/jijiadong/MVDN.
References
- Ahn M, Shen H, Lin W, et al. (2015). A sparse reduced rank framework for group analysis of functional neuroimaging data. Statistica Sinica 25, 295–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bellec P, Chu C, Chouinard-Decorte F, et al. (2017). The neuro bureau adhd-200 preprocessed repository. Neuroimage 144, 275. [DOI] [PubMed] [Google Scholar]
- Bostan AC, Dum RP, and Strick PL (2013). Cerebellar networks with the cerebral cortex and basal ganglia. Trends in Cognitive Sciences 17, 241–254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buckner R (2013). The cerebellum and cognitive function: 25 years of insight from anatomy and neuroimaging. Neuron 80, 807–815. [DOI] [PubMed] [Google Scholar]
- Cai T, Li H, Liu W, and Xie J (2016). Joint estimation of multiple high-dimensional precision matrices. Statistica Sinica 26, 445–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai T, Liu W, and Luo X (2011). A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106, 594–607. [Google Scholar]
- Daliri MR and Behroozi M (2013). Advantages and disadvantages of resting state functional connectivity magnetic resonance imaging for clinical applications. Omics Journal of Radiology 3, 1. [Google Scholar]
- Dickstein SG, Katie B, F Xavier C, et al. (2006). The neural correlates of attention deficit hyperactivity disorder: an ale meta-analysis. Journal of Child Psychology & Psychiatry 47, 1051–1062. [DOI] [PubMed] [Google Scholar]
- Doshi JA, Paul H, Jennifer K, Vanja S, Cangelosi MJ, Juliana S, M Haim E, and Neumann PJ (2012). Economic impact of childhood and adult attention-deficit/hyperactivity disorder in the united states. Journal of the American Academy of Child & Adolescent Psychiatry 51, 990–1002.e2. [DOI] [PubMed] [Google Scholar]
- Fox MD and Greicius M (2010). Clinical applications of resting state functional connectivity. Frontiers in Systems Neuroscience 4, 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedman J, Hastie T, and Tibshirani R (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimes T, Potter SS, and Datta S (2019). Integrating gene regulatory pathways into differential network analysis of gene expression data. Scientific reports 9, 5479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He Y, Ji J, Xie L, et al. (2018). A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 19, 493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He Y, Zhang X, Ji J, et al. (2017). Joint estimation of multiple high-dimensional gaussian copula graphical models. Australian & New Zealand Journal of Statistics 59, 289–310. [Google Scholar]
- Ji J, He D, Feng Y, et al. (2017). JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics 33, 3080–3087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ji J, Yuan Z, Zhang X, et al. (2016). A powerful score-based statistical test for group difference in weighted biological networks. BMC Bioinformatics 17, 86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim J and Pan W (2015). Highly adaptive tests for group differences in brain functional connectivity. NeuroImage: Clinical 9, 625–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kucyi A, Hove MJ, Biederman J, et al. (2015). Disrupted functional connectivity of cerebellar default network areas in attention-deficit/hyperactivity disorder. Human Brain Mapping 36, 3373–3386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lei D, Du M, Wu M, et al. (2015). Functional mri reveals different response inhibition between adults and children with adhd. Neuropsychology 29, 874. [DOI] [PubMed] [Google Scholar]
- Leng C and Tang CY (2012). Sparse matrix graphical models. Journal of the American Statistical Association 107, 1187–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Makris N, Liang L, Biederman J, et al. (2013). Toward defining the neural substrates of adhd: A controlled structural mri study in medication-naive adults. Journal of Attention Disorders 19, 944–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ning Y and Liu H (2013). High-dimensional semiparametric bigraphical models. Biometrika 100, 655–670. [Google Scholar]
- Pan W, Kim J, Zhang Y, Shen X, and Wei P (2014). A powerful and adaptive association test for rare variants. Genetics 197, 1081–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng J, Wang P, Zhou N, et al. (2009). Partial correlation estimation by joint sparse regression models. Publications of the American Statistical Association 104, 735–746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shang CY, Yan CG, Lin HY, et al. (2016). Differential effects of methylphenidate and atomoxetine on intrinsic brain activity in children with attention deficit hyperactivity disorder. Psychological Medicine 46, 3173–3185. [DOI] [PubMed] [Google Scholar]
- Stoodley CJ (2014). Distinct regions of the cerebellum show gray matter decreases in autism, adhd, and developmental dyslexia. Frontiers in Systems Neuroscience 8, 92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suskauer SJ, Fotedar SS, Blankner JG, et al. (2008). Functional magnetic resonance imaging evidence for abnormalities in response selection in attention deficit hyperactivity disorder: Differences in activation associated with response inhibition but not habitual motor response. Journal of Cognitive Neuroscience 20, 478–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tafazoli S, O’ Neill J, Bejjani A, et al. (2013). 1 h mrsi of middle frontal gyrus in pediatric adhd. Journal of Psychiatric Research 47, 505–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- the ADHD-200 dataset website (2019). http://neurobureau.projects.nitrc.org/ADHD200/Data.html. (Accessed March 7, 2019).
- Tian D, Gu Q, and Ma J (2016). Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic acids research 44, e140–e140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valera EM, Faraone SV, Biederman J, et al. (2005). Functional neuroanatomy of working memory in adults with attention-deficit / hyperactivity disorder. Biological Psychiatry 57, 439–447. [DOI] [PubMed] [Google Scholar]
- Valera EM, Spencer RMC, Zeffiro TA, et al. (2010). Neural substrates of impaired sensorimotor timing in adult attention-deficit/hyperactivity disorder. Biological Psychiatry 68, 359–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varoquaux G and Craddock RC (2013). Learning and comparing functional connectomes across subjects. Neuroimage 80, 405–415. [DOI] [PubMed] [Google Scholar]
- Wieringen WNV and Peeters CFW (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis 103, 284–303. [Google Scholar]
- Xia Y, Cai T, and Cai TT (2015). Testing differential networks with applications to detecting gene-by-gene interactions. Biometrika 102, 247–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xia Y and Li L (2017). Hypothesis testing of matrix graph model with application to brain connectivity analysis. Biometrics 73, 780–791. [DOI] [PubMed] [Google Scholar]
- Xia Y and Li L (2019). Matrix graph hypothesis testing and application in brain connectivity alternation detection. Statistica Sinica 29, 303–328. [Google Scholar]
- Yin J and Li H (2012). Model selection and estimation in the matrix normal graphical model. Journal of Multivariate Analysis 107, 119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan H, Xi R, and Deng M (2015). Differential network analysis via the lasso penalized d-trace loss. Biometrika 104, 755–770. [Google Scholar]
- Yuan Z, Ji J, Zhang X, et al. (2016). A powerful weighted statistic for detecting group differences of directed biological networks. Scientific reports 6, 34159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zalesky A, Fornito A, and Bullmore ET (2010). Network-based statistic: Identifying differences in brain networks. Neuroimage 53, 1197–1207. [DOI] [PubMed] [Google Scholar]
- Zhang XF, Ou-Yang L, and Yan H (2017). Incorporating prior information into differential network analysis using nonparanormal graphical models. Bioinformatics 33, 2436. [DOI] [PubMed] [Google Scholar]
- Zhao Q, Hui L, Yu X, et al. (2017). Abnormal resting-state functional connectivity of insular subregions and disrupted correlation with working memory in adults with attention deficit/hyperactivity disorder. Frontiers in Psychiatry 8, 200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao SD, Cai TT, and Li H (2014). Direct estimation of differential networks. Biometrika 101, 253–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou S (2014). Gemini: Graph estimation with matrix variate normal instances. Annals of Statistics 42, 532–562. [Google Scholar]
- Zhu Y and Li L (2018). Multiple matrix gaussian graphs estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80, 927–950. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
