Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jul 1.
Published in final edited form as: Electron J Stat. 2021 Sep 14;15(2):4192–4235. doi: 10.1214/21-ejs1887

Principal regression for high dimensional covariance matrices

Yi Zhao 1, Brian Caffo 2, Xi Luo 3; Alzheimer’s Disease Neuroimaging Initiative
PMCID: PMC9248851  NIHMSID: NIHMS1765239  PMID: 35782590

Abstract

This manuscript presents an approach to perform generalized linear regression with multiple high dimensional covariance matrices as the outcome. In many areas of study, such as resting-state functional magnetic resonance imaging (fMRI) studies, this type of regression can be utilized to characterize variation in the covariance matrices across units. Model parameters are estimated by maximizing a likelihood formulation of a generalized linear model, conditioning on a well-conditioned linear shrinkage estimator for multiple covariance matrices, where the shrinkage coefficients are proposed to be shared across matrices. Theoretical studies demonstrate that the proposed covariance matrix estimator is optimal achieving the uniformly minimum quadratic loss asymptotically among all linear combinations of the identity matrix and the sample covariance matrix. Under certain regularity conditions, the proposed estimator of the model parameters is consistent. The superior performance of the proposed approach over existing methods is illustrated through simulation studies. Implemented to a resting-state fMRI study acquired from the Alzheimer’s Disease Neuroimaging Initiative, the proposed approach identified a brain network within which functional connectivity is significantly associated with Apolipoprotein E ε4, a strong genetic marker for Alzheimer’s disease.

Keywords: Covariance matrix estimation, generalized linear regression, heteroscedasticity, shrinkage estimator

MSC2020 subject classifications: Primary 62J99, secondary 62H99

1. Introduction

In this manuscript, we study a regression problem with covariance matrices as the outcome under a high dimensional setting. Suppose yitp is a p-dimensional random vector, which is the tth acquisition from subject i, for t = 1, …, Ti and i = 1, …, n, where Ti is the number of observations of subject i and n is the number of subjects. Let Tmax = maxi Ti. The term “high dimensionality” refers to the scenario when Tmaxp and p increases to infinity. The data, yit, are assumed to follow a normal distribution with covariance matrix Σi. Here, without loss of generality, it is assumed that the distribution mean is zero as the study interest focuses on the covariance matrices. Let xiq denote the q-dimensional covariates of interest acquired from subject i. For the covariance matrices, we assume the following regression model. For i = 1, …, n, the data heteroscedasticity satisfies the following generalized linear regression model with a logarithmic link function,

log(γΣiγ)=xiβ, (1.1)

where γp is a linear projection and βq is the model coefficient. In xi, the first element is set to one to include the intercept term. Using a logarithmic link function, it is guaranteed that Σi’s are positive semi-definite. The goal is to estimate γ and β using the observed data {(yi1,,yiTi),xi}i=1n. In Model (1.1), γ is an unknown linear projection to be estimated such that the characteristic of the covariance matrices can be best captured by the covariates of interest.

One application of such a regression problem is to analyze covariate associated variations in brain coactivation in a functional magnetic resonance imaging (fMRI) study, where covariance/correlation matrices of the fMRI signals are generally utilized to reveal the coactivation patterns. Characterizing these patterns with population/individual covariates is of great interest in neuroimaging studies [34, 41]. Another example is the study of financial equities data. Considering a pool of stock values, covariance matrices over a period of time capture the comovement or synchronicity of the stocks. Firm and market-level information, such as industry type, firm’s cash flow, stock size, and book-to-market ratio, plays an essential role in determining the synchronicity. Quantifying such association is an important topic in financial theory [43].

Assuming Tmin = mini Ti > p and p is fixed, Zhao et al. [41] first studied Model (1.1) and proposed to estimate γ and β through a likelihood-based approach minimizing the negative log-likelihood function in the projection space. One sufficient condition to solve the likelihood-based criterion is that the sample covariance matrices are positive definite. Thus, the likelihood estimator is ill-posed when Tmax < p as the sample covariance matrices are rank-deficient. Additionally, it has been shown that when p increases, the sample covariance matrix performs poorly and can lead to invalid conclusions. For example, the largest eigenvalue of the sample covariance matrix is not a consistent estimator, and the eigenvectors can be nearly orthogonal to the truth [22]. To circumvent difficulties raised by the high dimensionality, one solution is to impose structural assumptions, such as bandable covariance matrices, sparse covariance matrices, spiked covariance matrices, covariances with a tensor product structure, and latent graphical models [see a review of 6, and references therein]. Based on structural assumptions, many regularization-based methods have been developed. However, most of these methods produce covariance estimates that may not always be positive definite (numerically), and this creates subsequent numerical convergence issues when the quadratic product with Σi is negative in (1.1). Moreover, most regularization methods can be computationally expensive on finding the solution and may require searching over different regularization parameters, not to mention the computational costs increase multiplicatively when computing over multiple covariance matrices. Further research is also needed to evaluate what structural assumptions are most appropriate for fMRI data. Another class of high-dimensional covariance matrix estimator is the shrinkage estimator. Daniels and Kass [11] considered two shrinkage estimators of the covariance matrix, a correlation shrinkage and a rotation shrinkage, offering a compromise between completely unstructured and structured estimators to improve the robustness. Ledoit and Wolf [24] introduced a well-conditioned estimator of the covariance matrix, which is an optimal linear combination of the identity matrix and the sample covariance matrix under squared error loss. This estimator is guaranteed to be positive definite and is easy to compute based on a simple and explicit formula. These advantages make it desirable for formulating the proposed estimator. Instead of a linear combination, Ledoit and Wolf [25] extended this work to nonlinear transformations of the sample eigenvalues and presented a way of finding the transformation that is asymptotically equivalent to the oracle linear combination. Based on Tyler’s robust M-estimator [37] and the linear shrinkage estimator [24], Chen et al. [8] and Pascal et al. [28], in parallel, introduced robust estimators of covariance matrices for elliptical distributed samples.

To model multiple covariance matrices, procedures include regression-type approaches [1, 9, 21, 14, 43]; (common) principal component analysis related methods [13, 5, 20, 15]; and methods based on other types of matrix decomposition, such as the Cholesky decomposition [30]. Among these, Fox and Dunson [14] introduced a scalable nonparametric covariance regression model applying low-rank approximation. Franks and Hoff [15] generalized a Bayesian hierarchical model studying the heterogeneity in the covariance matrices to high dimensional settings. Assuming that the ideal covariance structure exists in the eigenspace of the data covariance matrix, Chen et al. [7] introduced a regression-based approach to remove the scanner effects in covariance achieving the goal of harmonization. Compared to the above-mentioned approaches, Model (1.1) offers higher flexibility in modeling the relationship with the covariates. For example, x can be either continuous or categorical, and one can easily include interactions and/or polynomials of the covariates.

In the high dimensional setting considered in this study, γ and β, as well as n covariance matrices, will be estimated under Model (1.1). It is well known that the eigenvalues of the sample covariance matrix are more dispersed than the truth [24]. The class of linear combinations of the identity and sample covariance matrix corrects this dispersion issue by shrinking towards the identity matrix. The choice of the identity matrix can also be interpreted as a prior without strong structural assumptions or prior knowledge. Interestingly, it will be shown that estimating each covariance matrix separately, such as using the shrinkage estimator proposed in Ledoit and Wolf [24], leads to suboptimal estimation accuracy for γ, β, and Σi’s. Thus, we propose a linear shrinkage estimator of all the covariance matrices jointly, of which the shrinkage coefficients are shared across matrices. In addition, it is shown that the proposed shrinkage estimator leads to a consistent estimator of model coefficients. We first replace the sample covariance formulation with the proposed shrinkage estimator, and then estimate (γ, β) through maximizing a plug-in likelihood evaluated at the shrinkage estimator. In fMRI studies, shrinkage is also a popular technique to improve the reliability of subject-level functional connectivity captured by the covariance matrix. In the technique, when estimating individual covariance matrix, population-level information is borrowed as prior knowledge [38, 31, 27, 32, 29].

The framework proposed in this manuscript has three major contributions.

  1. This paper first studies a joint shrinkage estimator for multiple high dimensional covariance matrices, generalizing the linear shrinkage estimator for a single covariance matrix [24]. We show that the latter approach is suboptimal compared to the proposed joint covariance shrinkage estimator, where the shrinkage coefficients are shared across multiple matrices. Within this class of shrinkage estimators, we believe that this is among the first attempts to analyze the variations of a large number of covariance matrices associated with covariates in a regression setting under certain model assumptions.

  2. The proposed shrinkage estimator of the covariance matrices is well conditioned and has uniformly minimum quadratic risk asymptotically among all linear combinations (Theorem 3.3).

  3. Under certain regularity conditions, the proposed approach achieves consistent estimators of the parameters in Model (1.1) (Proposition 3.1).

The rest of the paper is organized as the following. Section 2 introduces the proposed shrinkage estimator of the covariance matrices and the pseudo-likelihood based method of estimating γ and β. Section 3 studies the asymptotic properties. In Section 4, the superior performance of the proposed approach over existing methods is demonstrated through simulation studies. Section 5 articulates an application to a resting-state fMRI data set acquired from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Section 6 concludes this paper with discussions. Technical proofs are collected in the appendix.

2. Methods

Considering the regression model (1.1), it is proposed to estimate the parameters by solving the following optimization problem.

minimize(β,γ)(β,γ)=12i=1nTi{xiβ+γΣ^iγexp(xiβ)},such that  γHγ=1, (2.1)

where Σ^i is an estimator of the covariance matrix Σi to be discussed later, which is positive definite, for i = 1, …, n; and H is a positive definite matrix in p×p, which is set to be the average of Σ^i’s, that is H=i=1nTiΣ^i/i=1nTi. It is essential to impose a constraint on γ, otherwise the objective function of (2.1) is minimized at γ = 0 with fixed β. When Σ^i=Si=t=1Tiyityit/Ti (i.e., the sample covariance matrix), which is the proposal in Zhao et al. [41], it is equivalent to minimize the negative log-likelihood function of {γ yit}i,t assuming the data are normally distributed. However, when Tmax = maxi Ti < p, problem (2.1) is ill-posed as Si’s are rank-deficient. Thus, the goal of this manuscript is to propose a well-conditioned estimator of Σi that yields optimal properties. To achieve this, a covariate-dependent linear shrinkage estimator, denoted as Σi*, is proposed, which yields the minimum expected squared loss under regression model (1.1), where the expectation is taken over the sample covariance matrix Si.

minimize(μ,ρ)1ni=1nE{γΣi*γexp(xiβ)}2,such that  Σi*=ρμI+(1ρ)Si,  for i=1,,n. (2.2)

The following theorem gives the solution to (2.2).

Theorem 2.1. For given (γ, β), the solution to optimization problem (2.2) is

Σi*=ψ2δ2μI+ϕ2δ2Si,  for i=1,,n, (2.3)

and the minimum value is

1ni=1nE{γΣi*γexp(xiβ)}2=ϕ2ψ2δ2, (2.4)

where

μ=1n(γγ)i=1nexp(xiβ),ϕ2=1ni=1nϕi2,ψ2=1ni=1nψi2,δ2=1ni=1nδi2,ϕi2={μ(γγ)exp(xiβ)}2,ψi2=E{γSiγexp(xiβ)}2,δi2=E{γSiγμ(γγ)}2;

and Lemma 2.1 shows that ψ22 + ϕ22 = 1.

Lemma 2.1. Fori ∈ {1, …, n}, δi2=ϕi2+ψi2, and thus δ2 = ϕ2 + ψ2.

According to Theorem 2.1, parameters ϕi2, ψi2 and δi2 are expected values as the objective is to minimize the expected squared loss. Thus, one cannot replace Σ^i with i* in (2.1) and solve for solution using the data. For implementation in practice, the following sample counterparts are used to compute (2.3) and thus Σ^i in (2.1). Let

δ^i2={γSiγμ(γγ)}2,ψ^i2=1Ti{γSiγexp(xiβ)}2,ϕ^i2=δ^i2ψ^i2,δ^2=1ni=1nδ^i2,ψ^2=1ni=1nmin(ψ^i2,δ^i2),ϕ^2=1ni=1nϕ^i2,

and

Si*=ψ^2δ^2μI+ϕ^2δ^2Si,  for i=1,,n. (2.5)

In Section 3, we show that Si* is a consistent estimator of i* and is uniformly optimal asymptotically among all the linear combinations of the sample covariance matrices and the identity matrix regarding the quadratic risk. The objective function (β, γ) is an approximation of the negative log-likelihood function if replacing Σ^i with the proposed shrinkage estimator Si*. Thus, optimizing (2.1) can be considered as a pseudo-likelihood approach under the normality assumption.

The proof of Theorem 2.1 and Lemma 2.1 is presented in Appendix Section A.1. Formulation (2.2) introduces a shrinkage estimator of the covariance matrix, where the shrinkage is shared across subjects and is optimal under the squared error loss. For each subject, i* is a linear combination of the sample covariance matrix Si and the identity matrix. The weighting parameters, ρ and μ, are population level parameters that are shared across subjects. This is equivalent to imposing a linear shrinkage on the sample eigenvalues. Assuming γ is a common eigenvector of all the covariance matrices, μ is the average eigenvalue corresponding to γ. The level of shrinkage is determined by the leverage between the accuracy of Si’s and the variation in the eigenvalues. If Si’s are accurate or the errors are small relative to the variation in the eigenvalues, less shrinkage will be imposed; otherwise, if Si’s are inaccurate and the errors are comparable or even higher than the eigenvalue variability, the sample covariance matrices will be shrank more.

Algorithm 1 summarizes the optimization procedure. As problem (2.1) is nonconvex, a series of random initializations of (γ, β) is considered and the one that achieves the minimum value of the objective function is the estimate. The initial values of γ can be set as the eigenvectors of the average sample covariance matrices, S¯=i=1nTiSi/i=1nTi; and the initial values of β is the corresponding solution to (2.1) by replacing Σ^i with a well-conditioned estimator, such as the estimator proposed in Ledoit and Wolf [24]. When p<i=1nTi, S¯ is of full rank, and the sample eigenvectors are consistent estimators assuming all the covariance matrices have the same eigendecomposition. Step 3 in the algorithm updates the covariance matrix estimators with a global shrinkage parameter. In Section 4, through simulation studies, we show that it improves the performance in estimating the covariance matrices and β with lower bias and higher stability. The details of updating γ and β in Step 4 can be found in Algorithm 1 in Zhao et al. [41].

graphic file with name nihms-1765239-f0001.jpg

For higher-order components, one can first remove the identified components and use the new data to estimate the next with an additional orthogonality constraint, that is, the new component is orthogonal to the identified ones. Different from Algorithm 2 in Zhao et al. [41], there is no need to include a rank-completion step as Si* is introduced to render the rank-deficiency issue. To determine the number of components, the metric of average deviation from diagonality is adopted [41]. Let Γ(k)p×k denote the first k estimated components, the average deviation from diagonality is defined as

DfD(Γ(k))=i=1n(det{diag(Γ(k)Si*Γ(k))}det(Γ(k)Si*Γ(k)))Ti/iT, (2.6)

where diag(A) is a diagonal matrix of the diagonal elements in a square matrix A, and det(A) is the determinant of A. If Γ(k) is a common diagonalization of Si*’s, that is, Γ(k)Si*Γ(k) is a diagonal matrix, for ∀ i = 1, …, n, then DfD(Γ(k)) = 1. In practice, k can be chosen before DfD increases far away from one or before a sudden jump occurs.

3. Asymptotic Properties

In this section, we study the asymptotic properties of the proposed estimators. For i = 1, …, n, it is assumed that Σi has the eigendecomposition of Σi=ΠiΛiΠi, where Λi = diag{λi1, …, λip} is a diagonal matrix and Πi = (πi1, …, πip) is an orthonormal rotation matrix; {λi1, …, λip} are the eigenvalues and the columns of Πi are the corresponding eigenvectors. Let Zi = YiΠi, where Yi=(yi1,,yiTi)Ti×p is the data matrix of subject i. Under the normality assumption, the columns of Zi = (zitj)t,j are uncorrelated, and the rows, zit=(zi1,,zip)p for t = 1, …, Ti, are normally distributed with mean zero and covariance matrix Λi. The following assumptions are imposed.

Assumption A1 There exists a constant C1 independent of Tmax such that p/TmaxC1, where Tmax = maxi Ti.

Assumption A2 Let N=i=1nTi, p/N → ∞ as n, Tmin → ∞, where Tmin = mini Ti.

Assumption A3 There exists a constant C2 independent of Tmin and Tmax such that j=1pE(zi1j8)/pC2, for ∀ i ∈ {1, …, n}.

Assumption A4 Let Q denote the set of all the quadruples that are made of four distinct integers between 1 and p, for ∀ i ∈ {1, …, n},

limTip2Ti2(j,k,l,m)Q{Cov(zi1jzi1k,zi1lzi1m)}2|Q|=0, (3.1)

where |Q| is the cardinality of set Q.

Assumption A5 All the covariance matrices share the same set of eigenvectors, i.e., Πi = Π, for i = 1, …, n. For each Σi, there exists (at least) a column, indexed by ji, such that γ = πiji and Model (1.1) is satisfied.

Assumption A1 allows the data dimension, p, to be greater than the (maximum) number of observations, Tmax, and to grow at the same rate as Tmax does. This is a common regularity condition for shrinkage estimators [24]. Assumption A2 guarantees that the average sample covariance matrix S¯=i=1nTiSi/N utilized in the initial step of Algorithm 1 is positive definite. Together with Assumption A5, the eigenvectors of S¯ are consistent estimators of Π [2]. Assumptions A3 and A4 regulate zit on higher-order moments, which is equivalent to imposing restrictions on the higher-order moments of yit. When the data are assumed to be normally distributed, both A3 and A4 are satisfied. Assumption A5 assumes that all the covariance matrices share the same eigenspace, though the ordering of the eigenvectors may differ. When p/Tmin → 0, Zhao et al. [41] relaxed this assumption to partial common diagonalization and demonstrated the method robustness through numerical examples. Studying the asymptotic properties under the relaxation is difficult and not available in existing literature, especially when p > Tmax.

Taking the eigenvectors of S¯ as the initial values of γ, the following proposition demonstrates the consistency of the proposed estimator.

Proposition 3.1. Under Assumptions A1–A5, as n,Tmin → ∞, the estimator of γ and β obtained by Algorithm 1 are asymptotically consistent.

To prove Proposition 3.1, we first study the asymptotic properties of Si* and show that Si* is the optimal linear shrinkage estimator of the covariance matrix under the squared loss. This is accomplished under the assumption that γ is given. As the initialization of γ is already a consistent estimator, the consistency of the solution after iteration follows. For β, it is firstly shown that the association between the shrinkage estimator, Σi*, and the covariates is the same as the covariance matrix, Σi, does (Lemma 3.3). Thus, it is equivalent to optimize problems (2.1) and (2.2) to solve for β, and the solution is a consistent estimator of β based on the pseudo-likelihood theory [16]. In the iteration step of Algorithm 1, Si* improves the estimation of the covariance matrices with lower squared loss, and in consequence, improves the estimation of γ and β. In Section 4, the improvement is demonstrated through simulation studies.

In Section 2, the optimization problem (2.2) introduces a linear combination of the sample covariance matrix and the identity matrix, Σi*, that achieves the minimum expected squared error. From Theorem 2.1, the solution has population-level parameters. Thus, the sample counterpart, Si*, is introduced. The following Lemma 3.1 first shows that asymptotically, the weighting parameters in Σi* are well-behaved. Lemma 3.2 demonstrates that the corresponding sample counterpart of the weighting parameters are consistent estimators. Theorem 3.1 demonstrates that Si* performs as well as Σi* does asymptotically.

Lemma 3.1. For given (γ, β), let Tmin = mini Ti, as Tmin → ∞, μ, ϕ2, ψ2 and δ2 are bounded.

Lemma 3.2. For given (γ, β), as Tmin → ∞,

  1. E(δ^i2δi2)20, for i = 1, , n, and thus E(δ^2δ2)20;

  2. E(ψ^i2ψi2)20, for i = 1, , n, and thus E(ψ^2ψ2)20;

  3. E(ϕ^i2ϕi2)20, for i = 1, , n, and thus E(ϕ^2ϕ2)20.

Theorem 3.1. Fori ∈ {1, …, n}, Si* is a consistent estimator of Σi*, that is, as Tmin = mini Ti → ∞,

ESi*Σi*20. (3.2)

Thus, the asymptotic expected loss of Si* and Σi* are identical, that is,

E{γSi*γexp(xiβ)}2E{γΣi*γexp(xiβ)}20. (3.3)

Next, we show that Si* uniformly achieves the minimum quadratic risk asymptotically over all linear combinations of the sample covariance matrix and the identity matrix. For given (γ, β), let Σi** denote the solution to the following optimization problem,

minimizeρ1,ρ21ni=1n{γΣi**γexp(xiβ)}2, such that  Σi**=ρ1I+ρ2Si,  for i=1,,n (3.4)

Theorem 3.2. Si* is a consistent estimator of Σi**, that is, as Tmin = mini Ti → ∞, for i = 1, …, n,

ESi*Σi**20. (3.5)

Then, Si* has the same asymptotic expected loss as Σi** does, that is,

E{γSi*γexp(xiβ)}2E{γΣi**γexp(xiβ)}20. (3.6)

Theorem 3.3. Assume (γ, β) is given. With a fixed n+, for any sequence of linear combinations {Σ^i}i=1n of the identity matrix and the sample covariance matrix, where the combination coefficients are constant over i ∈ {1, …, n}, the estimator Si* verifies:

limTinfTiT[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γSi*γexp(xiβ)}2]0. (3.7)

In addition, every sequence of {Σ^i}i=1n. that performs as well as {Si*}i=1n identical to {Si*}i=1n in the limit:

limT[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γSi*γexp(xiβ)}2]=0 (3.8)
EΣ^iSi*20, for i=1,,n. (3.9)

The difference between Σi** and Σi* is that Σi** minimizes the squared loss instead of the expected loss, while asymptotically they are equivalent (Theorems 3.1 and 3.2). Theorem 3.3 presents the main result that, with a fixed sample size n, the proposed shrinkage estimator {Si*}i=1n achieves the uniformly minimum (average) quadratic risk asymptotically among all linear combinations of the identity matrix and the sample covariance matrix. Here, “average” implies an average over the subjects, and “asymptotically” refers to that the number of observations within each subject increases to infinity. Therefore, Si* is asymptotically optimal. In addition, it is guaranteed that Si* is positive definite (see a discussion in Appendix Section A.8). Thus, there exits unique solution to the optimization problem (2.1).

Next, we study the asymptotic properties of the model coefficient estimator. Let β^ denote the solution to the optimization problem (2.1).

Lemma 3.3. For given γ, assume the linear shrinkage estimator, Σi*, satisfies

E(γΣi*γ)=exp(xiβ*), for i=1,,n, (3.10)

then

β*=β. (3.11)

Theorem 3.4. For given γ, assume Assumptions A1–A5 are satisfied, β^ is a consistent estimator of β as n, Tmin → ∞, where Tmin = mini Ti.

Lemma 3.3 implies that under the rotation γ, the expectation of the shrinkage estimator, Σi*, has the same association with the covariates as the true covariance matrix, Σi, does. Si* is a consistent estimator of Σi* and is positive definite. This substantiates the choice of Si* replacing the sample covariance matrix Si in the optimization problem. Theorem 3.4 states the consistency of β^.

4. Simulation Study

4.1. γ is known

In this section, we focus on examining the performance of the proposed method in estimating the covariance matrices and model coefficients by assuming the projection γ is known. Three methods are compared. (i) Estimate each individual covariance matrix using the estimator proposed in Ledoit and Wolf [24] and replace Σ^i with it in the optimization problem (2.1). We denote this approach as LW-CAP (Ledoit and Wolf based Covariate Assisted Principal regression), where the shrinkage is estimated on each individual covariance matrix. (ii) Estimate the covariance matrices using the proposed shrinkage estimator Si* in (2.5). We denote this approach as CS-CAP (Covariate dependent Shrinkage CAP), where the shrinkage parameters are assumed to be shared across subjects. (iii) Estimate each individual covariance matrix using the sample covariance matrix and plug into the optimization problem (2.1). This is the CAP approach proposed in Zhao et al. [41], which is only applicable when Tmin = mini Ti > p.

The covariance matrices are generated using the eigendecomposition Σi = ΠΛiΠ, where Π = (π1, …, πp) is an orthonormal matrix in p×p and Λi = diag{λi1, …, λip} is a diagonal matrix with the diagonal elements to be the eigenvalues, for i = 1, …, n. In Λi, the diagonal elements are exponentially decaying, where eigenvalues of the second and the fourth dimension (D2 and D4) satisfy the log-linear model in (1.1). We consider a case with a single predictor X (thus q = 2), which is generated from a Bernoulli distribution with probability 0.5 to be one. For D2, the coefficient β1 = −1; and for D4, β1 = 1. For the rest dimensions, λij, for i = 1, …, n, is generated from a log-normal distribution, where the mean of the corresponding normal distribution decreases from 5 to −1 over j. Cases when p = 20, 50, 100 are considered.

We first compare the three approaches, LW-CAP, CS-CAP and CAP, under sample sizes n = 50 and Ti = T = 50 for all i and present the result in Table 1. In the estimation, for dimension j, γ is set to be πj. In Table 1, we present the bias and the mean squared error (MSE) in estimating the eigenvalues and the model coefficient in D2 and D4. From the table, for both the eigenvalues and β1, CS-CAP yields lower estimation bias and MSE than LW-CAP does. When p < T, CS-CAP achieves a similar estimation bias as the CAP approach does in estimating the covariance matrices, while the MSE is slightly lower. For the estimation of β1, CS-CAP yields slightly lower bias. As the dimension p increases, the bias and MSE of eigenvalue estimates from LW-CAP increase; while the bias and MSE of the estimates from CS-CAP are similar at all p settings. This demonstrates the superiority of the proposed estimator in estimating the covariance matrices. Figure 1 presents the estimation bias and MSE of CS-CAP estimator at various levels of T when fixing n = 50 when p = 20. From the figure, as the number of observations within each subject increases, the estimates converge to the truth.

Table 1.

Bias and mean squared error (MSE) in estimating the eigenvalues of the covariance matrices and bias, MSE, and coverage probability (CP) in estimating β1 coefficient with sample sizes n = 50 and Ti = T = 50, for i = 1, …, n, when γ is known.

Method λ^ij β^1
Bias MSE Bias MSE CP
p = 20 D2 LW-CAP −6.520 225.360 0.053 0.006 0.795
CS-CAP −1.175 204.686 0.001 0.004 0.935
CAP −1.175 206.117 −0.003 0.004 0.935
D4 LW-CAP −7.422 277.888 −0.040 0.005 0.860
CS-CAP −1.223 249.881 0.001 0.004 0.905
CAP −1.223 251.595 0.005 0.004 0.910
p = 50 D2 LW-CAP −7.975 244.326 0.028 0.004 0.915
CS-CAP −1.428 202.141 0.008 0.003 0.935
CAP - - - - -
D4 LW-CAP −8.641 295.221 −0.012 0.004 0.915
CS-CAP −1.242 248.254 0.001 0.004 0.925
CAP - - - - -
p = 100 D2 LW-CAP −8.924 260.268 0.010 0.004 0.915
CS-CAP −0.973 203.151 −0.001 0.003 0.930
CAP - - - - -
D4 LW-CAP −10.487 331.864 −0.011 0.003 0.940
CS-CAP −1.705 245.754 −0.007 0.003 0.940
CAP - - - - -

Fig 1.

Fig 1.

Bias and mean squared error (MSE) in estimating the eigenvalues of the covariance matrices and bias, MSE, and coverage probability in estimating β1 coefficient using CS-CAP with the number of subjects n = 50 at various numbers of observations from each subject with p = 20 when γ is known.

4.2. γ is unknown

In this section, we evaluate the performance of the CS-CAP approach when γ is unknown and estimated by solving optimization problem (2.1) using Algorithm 1. The data are generated following the same procedure as in Section 4.1. To evaluate the performance in estimating the projection γ, we consider a similarity metric measured by |γ^,γ|, where 〈·, ·〉 is the inner product of two vectors and γ^ denotes the estimate of γ. When this metric is one, the two vectors are identical (up to sign flipping); and when this metric is zero, the two vectors are orthogonal. Case where p = 100 is studied. The performance of the CS-CAP approach is firstly compared to the LW-CAP approach with sample sizes n = 100 and Ti = T = 100. The results are presented in Table 2. From the table, the CS-CAP approach improves the performance with much lower MSE in estimating the eigenvalues, and lower MSE and higher coverage probability (CP) in estimating the β coefficient. After iterations, the CS-CAP approach yields an estimate of the projection with much higher similarity to the truth. To further examine the performance of the CS-CAP approach under finite sample size, combinations of sample sizes n = 50, 100, 500, 1000 and Ti = T = 50, 100, 500, 1000 are considered. Figure 2 presents the performance in estimating the second dimension (D2), including the bias, the MSE and the CP of β^1, the MSE of λ^ij, and the similarity of γ^ to the eigenvector of D2 (Appendix Section B.1 presents the results of the fourth dimension, D4). From the figure, as n, T → ∞, all estimates converge to the truth.

Table 2.

Bias, mean squared error (MSE), and coverage probability (CP) from 500 bootstrap samples in estimating the β1 coefficient, the similarity of γ^ to πj and the standard error (SE), and the MSE in estimating the eigenvalues λ^ij, for j = 2, 4. Data dimension p = 100, sample size n = 100 and Ti = T = 100.

Method β^1 λ^ λ^ij
Bias MSE CP |γ^,πj| (SE) MSE
D2 LW-CAP −0.027 0.002 0.782 0.653 (0.033) 1812.091
CS-CAP −0.023 0.001 0.855 0.931 (0.012) 173.225
D4 LW-CAP 0.018 0.002 0.770 0.666 (0.027) 2186.265
CS-CAP 0.019 0.001 0.845 0.926 (0.011) 231.856

Fig 2.

Fig 2.

Estimation performance of CS-CAP in estimating the second dimension (D2) when γ is unknown. For β^1, (a) bias, (b) mean squared error (MSE) and (c) coverage probability (CP) are presented, where CP is obtained from 500 bootstrap samples. For the eigenvalues λ^ij, (d) MSE is presented. For γ^, (e) similarity to π2 is presented. Data dimension p = 100. Sample sizes vary from n = 50, 100, 500, 100 and Ti = T = 50, 100, 500, 1000.

In Appendix Section B.2, the robustness of the proposed method to model misspecification is examined. Two types of model misspecification are considered, model misspecification in β and model misspecification in γ. When the log-linear model is misspecified, the proposed approach can correctly identify the linear projections under certain scenarios, while the estimate of model coefficients is biased. The proposed approach is robust to the setting that the eigenvectors of the covariance matrices are partially common, while it will not work when the eigenvectors are completely unique to each covariance matrix.

5. The Alzheimer’s Disease Neuroimaging Initiative Study

Data used in this study are obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).

We apply the proposed approach to ADNI resting-state functional MRI (fMRI) data acquired at the baseline screening. AD is an irreversible neurodegenerative disease that destroys memory and related brain functions causing problems in cognition and behavior. Apolipoprotein E ε4 (APOE-ε4) has been consistently identified as a strong genetic risk factor for AD. With an increasing number of APOE-ε4 alleles, the lifetime risk of developing AD increases, and the age of onset decreases [10]. Thus, APOE-ε4 is generally treated as a potential therapeutic target [33]. In AD studies, resting-state fMRI is another emerging biomarker for diagnosis [23]. It is important to articulate the genetic impact on brain functional architecture. In this study, n = 194 subjects diagnosed with either MCI or AD are analyzed. Resting-state fMRI data collected at the initial screening are preprocessed. Time courses are extracted from p = 75 brain regions, including 60 cortical and 15 subcortical regions grouped into 10 functional modules, using the Harvard-Oxford Atlas in FSL [35]. For each time course, a subsample is taken with an effective sample size of T = 67 to remove the temporal dependence. The resulting data, denoted as yit (for i = 1, …, n and t = 1, …, T), are assumed to follow a multivariate normal distribution with mean zero and covariance Σi. The off-diagonal elements in Σi represent the pairwise functional connectivity between brain regions and Σi represents the brain functional architecture of subject i. In the regression model, APOE-ε4, sex, and age are entered as the covariates (xi’s). The validity of model assumptions is discussed in Appendix Section C.1.

The CS-CAP approach is applied to identify brain subnetworks within which the functional connectivity demonstrates a significant association with APOE-ε4. Using the deviation from diagonality criterion, CS-CAP identifies three components denoted as C1, C2, and C3. The model coefficients and 95% bootstrap confidence interval from 500 bootstrap samples are presented in Table 3. From the table, C3 is significantly associated with APOE-ε4 and age; C1 and C2 are significantly associated with sex and age. To better interpret C3, a fused lasso regression [36] is employed to sparsify the loading profile, similarly as in the sparse principal component analysis proposed in Zou et al. [42]. The fused lasso regularization is defined based on the modular information to impose local smoothness and consistency [19, 40]. Figure 3(a) presents the sparse loading profile colored by the corresponding functional module, and Figure 3(b) is the river plot illustrating the loading configuration. In C3, all regions with negative loadings are subcortical regions. Contributions to positive loadings are from regions in the default mode network (DMN), the ventral- and dorsal-attention networks, and the somato-motor network. Figure 3(c) presents these regions on a brain map. C3 is negatively associated with APOE-ε4 indicating that functional connectivity between regions in the same sign among APOE-ε4 carriers is lower, while connectivity between regions in the opposite signs among APOE-ε4 carriers is higher. The findings are in line with existing knowledge about AD. Compared to APOE-ε4 non-carriers, more functional connectivity between the left hippocampus and the insular/prefrontal cortex while more functional disconnection of the hippocampus has been observed in APOE-ε4 carriers [12]. Alterations in DMN connectivity in cognitively normal APOE-ε4 carriers have been reported across all age groups [3]. Increased connectivity in the limbic system, including the hippocampus, the amygdala, and the thalamus, has been detected in individuals with memory impairment [18, 17], though the effect of APOE-ε4 carriage lacks consensus [3]. It was shown that the limbic hyperconnectivity is positively associated with the memory performance, suggesting the preservation of brain function due to increased connectivity in the medial temporal lobe pathology [17].

Table 3.

Model coefficient estimate and 95% bootstrap confidence interval using the PS-CAP approach. The intervals are obtained over 500 bootstrap samples.

APOE-ε4 Sex Age
C1 0.012 (−0.031, 0.263) −0.431 (−0.636, −0.230) −0.227 (−0.319, −0.129)
C2 0.049 (−0.191, 0.309) −0.544 (−0.867, −0.186) −0.232 (−0.383, −0.066)
03 −0.156 (−0.270, −0.045) −0.061 (−0.201, 0.075) −0.241 (−0.328,−0.172)

Fig 3.

Fig 3.

(a)The sparsified loading profile, (b) the module river plot, and (c) regions with nonzero loadings in a brain map of C3. In (a) and (b), the figure and the legend are colored by brain functional modules. In (c), the brain maps are colored by the loading weights.

6. Discussion

In this study, we introduce an approach to perform linear regression with multiple high dimensional covariance matrices as the outcome. A linear shrinkage estimator of the covariance matrix is firstly introduced, where the shrinkage coefficients are shared parameters across subjects. It is shown that the proposed estimator is optimal achieving the uniformly minimum quadratic loss asymptotically among all linear combinations of the identity matrix and the sample covariance matrix. Replacing the sample covariance matrices with the proposed well-conditioned estimator in the likelihood function, the linear projection parameter and the model coefficient are shown to be consistently estimated. Through simulation studies, the proposed approach demonstrates superior performance in estimating the covariance matrices and the model coefficients with lower estimation bias and variation over the existing methods. Applying to a resting-state fMRI data set acquired from the ADNI study, the findings are consistent with existing knowledge about AD.

The proposed framework extends the proposal in Zhao et al. [41] to high dimensional scenario. When p is small, the proposed shrinkage estimator demonstrates lower squared loss than the sample covariance matrix as suggested in both theoretical results and simulation studies. Different from the linear shrinkage estimator introduced in Ledoit and Wolf [24], which was proposed for a single covariance matrix estimation, the shrinkage coefficients considered in this study are population level parameters shared across subjects. This is superior than the individual shrinkage as the proposed one leverages the accuracy of the sample covariance matrix and the variability in the eigenvalues across subjects.

In this study, the asymptotic properties are studied under the assumption that the covariance matrices have the same eigendecomposition. We leave the study of the consistency relaxing this assumption to future research. The proposed shrinkage estimator is optimal with respect to a squared risk. However, this may overshrink the small eigenvalues [11]. Other types of loss function, such as the Stein’s loss, will be considered in the future. In the ADNI application, we included an ad hoc procedure to select important brain regions for interpretation. A next-step research is to include the regularization on γ into the optimization or to introduce an efficient approach to draw inference on the loadings (such as a bootstrap sampling procedure).

Acknowledgments

Zhao was partially supported by NIH grant U54AG065181 and P30AG010133; Caffo by NIH grant R01EB029977 and P41EB031771; and Luo by NIH grant R01EB022911. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI, National Institutes of Health Grant U01 AG024904 and Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Appendix A: Theory and Proof

A.1. Proof of Theorem 2.1 and Lemma 2.1

Proof. Given (γ, β), E(γSiγ)=γΣiγ=exp(xiβ). For the objective function in (2.2), under the constraint that Σi*=ρμI+(1ρ)Si, we have

f(μ,ρ)=1ni=1nE{γΣi*γexp(xiβ)}2=1ni=1n[ρ2{μ(γγ)exp(xiβ)}2+(1ρ)2E{γSiγexp(xiβ)}2].

In order to minimize the objective function, as the objective function is convex, derivatives are firstly taken over μ and ρ.

For μ,

fμ=ρ21ni=1n2{μ(γγ)exp(xiβ)}(γγ)=0,μ=1n(γγ)i=1nexp(xiβ).

For ρ, let ϕi2={μ(γγ)exp(xiβ)}2 and ψi2=E{γSiγexp(xiβ)}2,

fρ=2ρ(1ni=1nϕi2)2(1ρ)(1ni=1nψi2)=0,ρ=i=1nψi2k=1nϕi2+i=1nψi2.

Let δi2=E{γSiγμ(γγ)}2, then δi2=ϕi2+ψi2. Let ϕ2=i=1nϕi2/n, ψ2=i=1nψi2/n, and δ2=i=1nδi2/n (thus, δ2 = ϕ2 + ψ2), the optimizer of problem (2.2) is

Σi*=ψ2δ2μI+ϕ2δ2Si,i=1,,n.

The minimum value of the function is

1ni=1nE{γΣi*γexp(xiβ)}2=1ni=1nE{ψ2δ2μγγ+ϕ2δ2γSiγψ2+ϕ2δ2exp(xiβ)}2=1ni=1n[E{ψ2δ2μγγψ2δ2exp(xiβ)}2+E{ϕ2δ2γSiγϕ2δ2exp(xiβ)}2]=1ni=1n(ψ4δ4ϕi2+ϕ4δ4ψi2)=ψ4ϕ2+ϕ4ψ2δ4=ϕ2ψ2δ2.

A.2. Proof of Proposition 3.1

Proof. Under Assumptions A2 and A5, the eigenvectors of S¯ are consistent estimators of Π. Replace γ with its estimate in Theorems 3.13.3 and Theorem 3.4, the consistency of β follows. □

A.3. Proof of Lemma 3.1

Proof. (1) For μ,

μ=1n(γγ)i=1nexp(xiβ)=1ni=1nγΣiγγγ1ni=1nΣi22.

Under Assumption A2,

1ni=1nΣi22=1ni=1nΛi221ni=1nΛiF2=1ni=1n{1pj=1pE(zi1j2)2}=1ni=1n{1pj=1pE(zi1j4)}1ni=1n1pj=1pE(zi1j)81ni=1nC2=C2,

where ∥·∥F is the Frobenius norm of a matrix.

(2) For ϕ2, upper limits of ϕi2 is derived first.

ϕi2={μ(γγ)exp(xiβ)}2μ2(γγ)2+{exp(xiβ)}2=μ2(γγ)2+(γΣiγ)2(μ2+Σi24)(γγ)2.

From the above derivation, we have

μ2C2, and Σi22=Λi22ΛiF2C2.

Since γ is given, without loss of generality, assume that ∥γ2 = 1, i.e., γγ = 1. Then,

ϕi22C2(γγ)=2C2.

Thus,

ϕ2=1ni=1nϕi22C2.

(3) For ψ2, analogously, ψi2 is considered first.

ψi2=E{γSiγexp(xiβ)}2=E{γ(SiΣi)γ}2(γγ)2ESiΣi22ESiΣiF2=1pj=1pk=1pE{(1Tit=1Tiyitjyitkσijk)2}=1pj=1pk=1pE{(1Tit=1Tizitjzitkλijk)2}=1pj=1pk=1pVar(1Tit=1Tizitjzitk)=1pj=1pk=1p1TiVar(zi1jzi1k)1pTij=1pk=1pE(zi1j2zi1k2)1pTij=1pk=1pEzi1j4Ezi1k4pTi(1pj=1pEzi1j4)2pTi(1pj=1pEzi1j4)pTi1pj=1pEzi1j8C1C2

Thus, for ψ2,

ψ2=1ni=1nψi21ni=1n(γγ)2C1C2=C1C2.

(4) Finally, for δ2,

δ2=ϕ2+ψ22C2+C1C2.

A.4. Proof of Lemma 3.2

Proof. In the proof of Lemma 3.2, here, it is assumed that γ is a column of Πi indexed by ji, for i = 1, …, n (Assumption A4).

(i) First, we prove the consistency of δ^i2.

δ^i2δi2={γSiγμ(γγ)}2E{γSiγμ(γγ)}2={(γSiγ)2E(γSiγ)2}2μ(γγ){(γSiγ)E(γSiγ)}

Under Assumption A4,

γSiγ=1Tit=1Tiγyityitγ=1Tit=1Tizitji2.
(γSiγ)2=1Ti2(t=1Tizitji2)2=1Ti2t=1Tizitji4+1Ti2tszitji2zisji2.
E(γSiγ)2=1Ti2TiEzi1ji4+1TiTi(Ti1)(Ezitji2)2=1TiEzi1ji4+Ti(Ti1)Ti2(γΣiγ)2.

For ∀ ϵ > 0,

{|(γSiγ)E(γSiγ)|ϵ}1ϵ2Var(γSiγ)=1ϵ2[E(γSiγ)2{E(γSiγ)}2]=1ϵ2{1Ti2Ezi1ji4+Ti(Ti1)Ti2(γΣiγ)2(γΣiγ)2}Ti0.
E(γSiγ)4=1Ti4E(t=1Tizitji2)4=1Ti4{tEzitji8+2tsEzitji4zisji4+2uE(ziuji4tszitji2zisji2)+uvtsE(zitji2zisji2ziuji2zivji2)}=1Ti4{TiEzi1ji8+2Ti(Ti1)(Ezi1ji4)2+2Ti2(Ti1)Ezi1ji4(Ezi1ji2)2+Ti2(Ti1)2(Ezi1ji2)4}.
{E(γSiγ)2}2=1Ti2(Ezi1ji4)2+2Ti(Ti1)Ti3Ezi1ji4(γΣiγ)2+Ti2(Ti1)2Ti4(γΣiγ)4.

For ∀ ϵ > 0,

{|(γSiγ)2E(γSiγ)2|ϵ}1ϵ2Var(γSiγ)2=1ϵ2[E(γSiγ)4{E(γSiγ)2}2]=1ϵ2{1Ti3Ezi1ji8+Ti2Ti3(Ezi1ji4)2}Ti0.

Therefore, as Tmin = mini Ti → ∞,

E(δ^i2δi2)20, for i=1,,n, and E(δ^2δ2)20.

(ii) Secondly, prove the consistency of ψ^i2, for i = 1, …, n.

ψ^i2ψi2=1Ti{γSiγexp(xiβ)}2E{γSiγexp(xiβ)}2.
E{γSiγexp(xiβ)}2=E{1Titzitji2exp(xiβ)}2=1Ti2tVar(zitji2)=1TiVar(zi1ji2).
ψ^i2ψi2=1Ti[{γSiγexp(xiβ)}2Var(zi1ji2)]=1Ti[(γSiγ)2Ezi1ji42exp(xiβ){γSiγexp(xiβ)}].

From above derivation and the fact that E(γSiγ)=γΣiγ=exp(xiβ), as Ti → ∞, for ∀ ϵ > 0,

{|(γSiγ)E(γSiγ)|ϵ}0.

As both (γ Siγ)2 and Ezi1ji4 are bounded, then, as Tmin = mini Ti → ∞,

E(ψ^i2ψi2)20, for i=1,,n.

Let ψ˜i2=min(ψ^i2,δ^i2).

ψ˜i2ψi2=min(ψ^i2,δ^i2)ψi2ψ^i2ψi2|ψ^i2ψi2|max(|ψ^i2ψi2|,|δ^i2δi2|).

δi2=ϕi2+ψi2ψi2, then

ψ˜i2ψi2=min(ψ^i2,δ^i2)ψi2=min(ψ^i2ψi2,δ^i2ψi2)min(ψ^i2ψi2,δ^i2δi2)min(|ψ^iψi2|,|δ^i2δi2|)max(|ψ^iψi2|,|δ^i2δi2|).
E(ψ˜i2ψi2)2E{max(|ψ^iψi2|,|δ^i2δi2|)2}E(ψ^i2ψi2)2+E(δ^i2δi2)2.

Therefore, as Tmin = mini Ti → ∞,

E(ψ˜i2ψi2)20, for i=1,,n, and E(ψ^2ψ2)20.

(iii) Lastly, ϕ^i2=δ^i2ψ^i2. The consistency of ϕ^i2 (for i = 1, …, n) and ϕ^2 are straightforward. □

A.5. Proof of Theorem 3.1

In order to prove Theorem 3.1, the following lemma is firstly introduced. This lemma is also used to prove Lemma A.2 in the next section.

Lemma A.1. If ai2 is a sequence of nonnegative random variables (implicitly indexed by Ti) whose expectations converge to zero, for i = 1, …, n, and κ1, κ2 are two nonrandom scalars, and

ai2δ^ik1δiκ22(δ^i2+δi2)a.s.,

then, as Tmin = mini Ti → ∞,

E(ai2δ^iκ1δiκ2)0.

Analogously, if a2 is a sequence of nonnegative random variables (implicitly indexed by Tmin = mini Ti) whose expectations converge to zero, and κ1, κ2 are two nonrandom scalars, and

a2δ^κ1δκ22(δ^2+δ2)  a.s., 

then, as Tmin = mini Ti → ∞,

E(a2δ^κ1δκ2)0.

Proof. For a fixed ϵ > 0, let Ti denote the set of indices Ti such that δi2ϵ/8. In Lemma 3.2, it is proved that E(δ^i2δi2)20. Thus, there exists an integer Ti1 such that ∀ TiTi1,

E|δ^i2δi2|ϵ/4.

For ∀ TiTi1 in the set Ti,

E(ai2δ^iκ1δiκ2)2(Eδ^i2+δi2)2(E|δ^i2δi2|+2δi2)2(ϵ4+2×ϵ8)=ϵ.

Consider the complementary of set Ti, since Eai20, there exists an integer Ti2 such that, ∀ TiTi2,

Ea2ϵκ1+κ2+124κ1+3κ2+1.

δi2 is bounded by 2C2+C1C2. Then, there exists an integer Ti3 such that, for ∀ TiTi3

(|δ^i2δi2|ϵ16)4ϵ16(2C2+C1C2)+ϵ.

Let 1{·} denote the indicator function. For ∀ Ti ≥ max(Ti2, Ti3) outside the set Ti, then

E(ai2δ^iκ1δiκ2)=E(ai2δ^iκ1δiκ21{δ^i2ϵ/16})+E(ai2δ^iκ1δiκ21{δ^i2>ϵ/16})E{2(δ^i2+δi2)1{δ^i2ϵ/16}}+(16ϵ)κ1(8ϵ)κ2E(ai21{δ^i2>ϵ/16})2{(2C2+C1C2)+ϵ16}(|δ^i2δi2|ϵ16)+(16ϵ)κ1(8ϵ)κ2E(ai2)2{(2C2+C1C2)+ϵ16}4ϵ16(2C2+C1C2)+ϵ+(16ϵ)κ1(8ϵ)κ2ϵκ1+κ2+124κ1+3κ2+1ϵ.

Bringing together the results inside and outside the set Ti, for ∀ Ti ≥ max(Ti1, Ti2, Ti3),

E(ai2δ^iκ1δiκ2)ϵ.

The proof of the second part follows the same strategy. □

Now, we prove Theorem 3.1.

Proof. We first prove that Si* is a consistent estimator of Σi*.

Si*Σi*2=maxγ0γ(Si*Σi*)γ2γγ=maxγ01γγ(ϕ^2δ^2ϕ2δ2)(γSiγμγγ)2=maxγ01γγ(ϕ^2δ^2ϕ2δ2)2δ^i2.
1ni=1nSi*Σi*2=maxγ01γγ(ϕ^2δ2ϕ2δ^2)2δ^4δ41ni=1nδ^i2=maxγ01γγ(ϕ^2δ2ϕ2δ^2)2δ^2δ4.

Using the fact that ϕ2δ2 and ϕ^2δ^2,

(ϕ^2δ2ϕ2δ^2)2δ^2δ4δ^22(δ^2+δ2).

In Lemma 3.2, it is shown that E(ϕ^2ϕ2)2 and E(δ^2δ2)2 converge to zero. In addition, Lemma 3.1 shows that ϕ2 and δ2 are bounded. Thus,

E(ϕ^2δ2ϕ2δ^2)2=E{(ϕ^2ϕ2)δ2ϕ2(δ^2δ2)}2δ4E(ϕ^2ϕ2)2+ϕ4E(δ^2δ2)20.

Let a2=(ϕ^2δ2ϕ2δ^2)2, κ1 = 2 and κ2 = 4, then Ea20, and using Lemma A.1,

E(ϕ^2δ2ϕ2δ^2)2δ^2δ40.

Thus,

1ni=1nESi*Σi*20.

And therefore, for ∀ i,

ESi*Σi*20.

For the second statement,

E|Si*Σi2Σi*Σi2|=E|Si*Σi*,Si*+Σi*2Σi|ESi*Σi*2ESi*+Σi*2Σi20.

Therefor,

E{γSi*γexp(xiβ)}2E{γΣi*γexp(xiβ)}20.

A.6. Proof of Theorem 3.2

Before proving Theorem 3.2, we first provide the solution to the optimization problem (3.4). Let

f(ρ1,ρ2)=1ni=1n{γ(ρ1I+ρ2Si)γexp(xiβ)}2.
fρ1=1ni=1n2(γγ){ρ1γγ+ρ2γSiγexp(xiβ)}=0
fρ2=1ni=1n2(γSiγ){ρ1(γγ)+ρ2(γSiγ)exp(xiβ)}=0.
ρ2=i(γSiγ)exp(xiβ)/n(iγSiγ/n)(iexp(xiβ)/n)i(γSiγ)2/n(iγSiγ/n)2
ρ1=1γγ{1ni=1nexp(xiβ)1ni=1nρ2(γSiγ)}=1γγ{(iγSiγ/n)(i(γSiγ)exp(xiβ)/n)i(γSiγ)2/n(iγSiγ/n)2(iexp(xiβ)/n)(i(γSiγ)2/n)i(γSiγ)2/n(iγSiγ/n)2}.

In order to prove Theorem 3.2, the following lemma is introduced.

Lemma A.2. For given (γ, β), let Tmin = mini Ti, as Tmin → ∞, fori ∈ {1, …, n},

E(|ϕ^i2ψ^i2δ^i2ϕi2ψi2δi2|)0.

Then, as n, Tmin → ∞,

E(|ϕ^2ψ^2δ^2ϕ2ψ2δ2|)0.

Proof.

ϕ^i2ψ^i2δ^i2ϕi2ψi2δi2=ϕ^i2ψ^i2δi2ϕi2ψi2δ^i2δ^i2δi2.

Let ai2=|ϕ^i2ψ^i2δi2ϕi2ψi2δ^i2|, κ1 = 2 and κ2 = 2. First need to verify the assumptions in Lemma A.1.

|ϕ^i2ψ^i2δ^i2ϕi2ψi2δi2|ϕ^i2ψ^i2δ^i2+ϕi2ψi2δi2ϕ^i2+ϕi2δ^i2+δi22(δ^i2+δi2),  a.s.. 

Furthermore,

E(|ϕ^i2ψ^i2δi2ϕi2ψi2δ^i2|)=E{|(ϕ^i2ψ^i2ϕi2ψi2)δi2ϕi2ψi2(δ^i2δi2)|}=E{|(ϕ^i2ϕi2)(ψ^i2ψi2)δi2+ϕi2(ψ^i2ψi2)δi2+(ϕ^i2ϕi2)ψi2δi2ϕi2ψi2(δ^i2δi2)|}E(ϕ^i2ϕi2)2E(ψ^i2ψi2)2δi2+ϕi2E|ψ^i2ψi2|δi2+E|ϕ^i2ϕi2|ψi2δi2ϕi2ψi2E|δ^i2δi2|.

The right-hand side converges to zero. Therefore, Eai20, conditions in Lemma A.1 are satisfied. Therefore,

E|ϕ^i2ψ^i2δ^i2ϕi2ψi2δi2|0.

Analogously, it can be shown that

E|ϕ^2ψ^2δ^2ϕ2ψ2δ2|0.

Next, we prove Theorem

Proof. Let αi = (γΣiγ)(γSiγ) − (γγ)}2 and α=i=1nαi/n. E(αi)=exp2(xiβ)μ2(γγ)2, then

Eα=1ni=1nexp2(xiβ)μ2(γγ)=ϕ2.

First, need to prove that αϕ2 converges to zero in quadratic mean.

Var(αi)=Var{(γΣiγ)(γSiγ)μ2(γγ)2}=Var{(γΣiγ)(γSiγ)}+Var{μ2(γγ)2}2Cov{(γΣiγ)(γSiγ),μ2(γγ)2}=Var{(γΣiγ)(γSiγ)}.
(γΣiγ)(γSiγ)=λiji(1Tit=1Tizitji2).
Var{(γΣiγ)(γSiγ)}=Var{1Tit=1Tiλijizitji2}=1TiVar(λijizi1ji2)1TiE(λijizi1ji2)21TiEλiji2zi1ji41Ti(Ezi1ji2)2Ezi1ji41Ti(Ezi1ji4)21TiEzi1ji8C2Ti.
Var(α)=1n2i=1nVar(αi)C2n2i=1n1Ti0, as Tmin=min iTi.

This proves that αϕ2 converges to 0 in quadratic mean. In the following, we prove that Si* is a consistent estimator of Σi**.

Si*=ψ^2δ^2μI+ϕ^2δ^2Si=δ^2ψ^2δ^2μI+ϕ^2δ^2Si=μI+ϕ^2δ^2(SiμI).Σi**=ρ1I+ρ2Si=(ρ1+ρ2μ)I+ρ2(SiμI).
1ni=1nSi*Σi**2=1ni=1n(μρ1ρ2μ)I+(ϕ^2δ^2ρ2)(SiμI)2=1ni=1n{maxγ01γγ(μρ1ρ2μ)(γγ)+(ϕ^2δ^2ρ2)(γSiγμ(γγ))2}maxγ0{(μρ1ρ2μ)2(γγ)+1γγ(ϕ^2δ^2ρ2)2δ^i2+2(μρ1ρ2μ)(ϕ^2δ^2ρ2)(1ni=1nγSiγμ(γγ))}.(μρ1ρ2μ)2=(iγSiγ/niexp(xiβ)/n)2(γγ)2{(iγSiγ/n)2i(γSiγ)2/n}2.{(iγSiγ/n)(iexp(xiβ)/n)i(γSiγ)exp(xiβ)/n}2(γγ)2{(iγSiγ/n)2i(γSiγ)2/n}2
E{1niγSiγ1niexp(xiβ)}2=1n2iE{γSiγexp(xiβ)}2+1n2iiE{γSiγexp(xiβ)}{γSiγexp(xiβ)}.
E{γSiγexp(xiβ)}2=E{γSiγE(γSiγ)}2=Var(γSiγ)=1TiEzi1ji4+Ti(Ti1)Ti2(γΣiγ)2(γΣiγ)2Ti0.

It is assumed that the samples/subjects are independent, therefore,

E{γSiγexp(xiβ)}{γSiγexp(xiβ)}=0.

Thus,

E{1niγSiγ1niexp(xiβ)}20, as Tmin.
E{(1niγSiγ)(1niexp(xiβ))1ni(γSiγ)exp(xiβ)}2E(1niγSiγ)2(1niexp(xiβ))2+E{1ni(γSiγ)exp(xiβ)}2.
E(1niγSiγ)2=1n2iE(γSiγ)2+1n2iiE(γSiγ)(γSiγ)=1n2i{1Ti2Ezitji4+1Ti2tsEzitji2zisji2}+1n2ii(1Ti2t=1TiEzitji2)(1Ti2t=1TiEzitji2)=1n2i{1TiEzi1ji4+Ti(Ti1)Ti2(γΣiγ)2}+1n2ii(1Ti(γΣiγ))(1Ti(γΣiγ))Tmin1n2i(γΣiγ)2.
E{1ni(γSiγ)exp(xiβ)}2=1n2iiE(γSiγ)2exp2(xiβ)+1n2iiE(γSiγ)exp(xiβ)E(γSiγ)exp(xiβ)=1n2i{1TiEzitji4+Ti(Ti1)Ti2(γΣiγ)2}(γΣiγ)2+1n2ii(γΣiγ)2(γΣiγ)2Tmin1n2i(γΣiγ)4+1n2ii(γΣiγ)2(γΣiγ)2.
E{(1niγSiγ)(1niexp(xiβ))1ni(γSiγ)exp(xiβ)}2E(1niγSiγ)2(1niexp(xiβ))2+E{1ni(γSiγ)exp(xiβ)}2Tmin1n2i(γΣiγ)2(1ni(γΣiγ))2+1n2i(γΣiγ)4+1n2ii(γΣiγ)2(γΣiγ)2.

The above quantity on the right is bounded by a constant from above. Therefore, as Tmin → ∞,

(μρ1ρ2μ)20.
(ϕ^2δ^2ρ2)2=(ϕ^2δ^2ϕ2δ^2)2+(ϕ2δ^2αδ^2)2+(αδ^2ρ2)2.

Since δ^4 is bounded,

E(ϕ^2ϕ2)20E(ϕ^2δ^2ϕ2δ^2)20.
E(ϕ2α)20E(ϕ2δ^2αδ^2)20.

Let ρ2=ρ2(1)/ρ2(2), where

ρ2(1)=1ni(γSiγ)exp(xiβ)(1niγSiγ)(1niexp(xiβ)),ρ2(2)=1ni(γSiγ)2(1niγSiγ)2.
E(αρ2(1))2=(1niexp(xiβ))2E{1ni(γSiγ)1niexp(xiβ)}20.
δ^2=1ni=1n{γSiγμ(γγ)}2=1ni=1n(γSiγ)22(1ni=1nγSiγ)(1ni=1nexp(xiβ))+(1ni=1nexp(xiβ))2.

It can be concluded that as Tmin → ∞,

E(δ^ρ22)2=E{1ni=1n(γSiγ)1ni=1nexp(xiβ)}20,

and

E(ϕ^2δ^2ρ2)20.E{1ni=1nSi*Σi**2}0,ESi*Σi**20.

This implies that

E{γSi*γexp(xiβ)}2E{γΣi**γexp(xiβ)}20.

A.7. Proof of Theorem 3.3

Proof. For the first statement,

limTmininfTiTmin[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γSi*γexp(xiβ)}2]inf[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γΣi**γexp(xiβ)}2]+lim[1ni=1nE{γΣi**γexp(xiβ)}21ni=1nE{γSi*γexp(xiβ)}2].

By Theorem 3.2, the second term on the right converges to zero, and the first term is ≥ 0 by the definition of Σi**.

For the second statement,

limTmin[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γSi*γexp(xiβ)}2]=0limTmin[1ni=1nE{γΣ^iγexp(xiβ)}21ni=1nE{γΣi**γexp(xiβ)}2]=0limTminE{γΣ^iγexp(xiβ)}2E{γΣi**γexp(xiβ)}2=0limTminEγΣ^iγγΣi**γ2=0limTminEγΣ^iγγSi*γ2=0limTminEΣ^iSi*2=0.

This finishes the proof of this theorem.□

A.8. Si* is well-conditioned

In this section, we show that the proposed estimator Si* is well-conditioned and thus, invertible. This is achieved by two steps: for i = 1, …, n, (1) prove that the largest eigenvalue of Si* is bounded in probability; (2) prove that the smallest eigenvalue of Si* is bounded away from zero in probability. The proof follows the same strategy as in Ledoit and Wolf [24], but considers the case with multiple covariance matrices.

The covariance matrix Σi has the eigendecomposition as Σi=ΠiΛiΠi. Let Ui=Λi1/2Yi. Denote λmax(A) and λmin(A) as the maximum and minimum eigenvalue of a matrix A, respectively.

λmax(Si*)=λmax(ψ^2δ^2μI+ϕ^2δ^2Si)=ψ^2δ^2μ+ϕ^2δ^2λmax(Si).
μ=1ni=1nexp(xiβ)=1ni=1nλijimax iλmax(Λi).
λmax(Si)=λmax(1TiΛ1/2UiUiΛi1/2)λmax(1TiUiUi)λmax(Λi)λmax(1TiUiUi)max iλmax(Λi).

Assume that p/Tmax converges to a limit, denoted as c. Based on Assumption A1, cC1. Based on the results in Yin et al. [39], as Tmin = mini Ti → ∞, for i = 1, …, n,

lim  λmax(1TiUiUi)=(1+c)2,  a.s.

This implies that

{λmax(Si*)(1+c)2maxiλmax(Λi)}1,

and

{λmax(Si*)(1+C1)2maxiλmax(Λi)}1.

Therefore, if p/Tmax converges to a constant, the largest eigenvalue of Si* is bounded in probability. If p/Tmax has no limit, under Assumption A1, there exists a subsequence such that p/Tmax converges. Along this sequence, the largest eigenvalue of Si* is bounded in probability. This is true for any converging sequence, and in addition, the upper bound is independent of the particular subsequence. As a result, it holds for the whole sequence.

Next, we show that the smallest eigenvalue of Si* is bounded away from zero in probability. Analogously, we have

λmin(Si)=λmin(1TiΛ1/2UiUiΛi1/2)λmin(1TiUiUi)λmin(Λi)λmin(1TiUiUi)miniλmin(Λi).

First, assume p/Tmax converges to a constant c. If c ∈ (0, 1), based on the results in Bai and Yin [4],

lim  λmin(1TiUiUi)=(1c)2,   a.s.

Assume c ≤ 1 − κ for some κ ∈ (0, 1). One can conclude that

{λmin(Si*)(11κ)2miniλmin(Λi)}1.

When c > 1 − κ, we propose to identify a lower bound from the following

λmin(Si*)=λmin(ψ^2δ^2μI+ϕ^2δ^2Si)ψ^2δ^2μ.

Compare the right-hand side in the above to it population counterpart,

ψ^2δ^2μψ2δ2μ=μ{ψ^2ψ2δ2+ψ^2(1δ^21δ2)}.

From Lemmas 3.1 and 3.2, we can show that the above converges to zero in probability. First, consider ψ2=i=1nψi2/n, where ψi2=E{γ(SiΣi)γ}2. From the proof of Lemma 3.1,

ESiΣi2=1pTij=1pk=1pE(zi1j2zi1k2)1pTij=1pk=1pλijk2=pTi{1p2j=1pk=1pE(zi1j2zi1k2)}1pTij=1pλijj2.

As Tmin → ∞, the second term on the right-hand side converges to zero. For ϵ > 0, there exists a constant M > 0 such that when Tmin > M, j=1pλijj2/(pTi)<ϵ. Thus, ψi2(1κ)ϵ and ψ2 ≥ (1 − κ) − ϵ.

λmin(Si*)ψ^2δ^2μ=ψ2δ2μ+(ψ^2δ^2μψ2δ2μ)ψ2δ2μϵψ22C2+C1C2ϵ(1κ)ϵ2C2+C1C2ϵ.

For a choice of ϵ, we have

{λmin(Si*)1κ2(2C2+C1C2)}1.

Therefore, for both c ≤ 1 − κ and c > 1 − κ, the smallest eigenvalue of Si* is bounded away from zero. Analogous to the proof of the largest eigenvalue, for the case that p/Tmax does not have a limit, we can also have the conclusion for the whole sequence. Since both the largest and the smallest eigenvalues are bounded, Si* is well-conditioned and invertible.

A.9. Proof of Lemma 3.3 and Theorem 3.4

We first proof Lemma 3.3.

Proof.

E(γΣi*γ)=ψ2δ2μ(γγ)+ϕ2δ2E(γSiγ)=ψ2δ2μ(γγ)+ϕ2δ2exp(xiβ)=exp(xiβ*).
iexp(xiβ*)/niexp(xiβ)/n=ψ2δ2μ(γγ)iexp(xiβ)/n+ϕ2δ2=ψ2δ2+ϕ2δ2=1.
1ni=1nexp(xiβ*)=1ni=1nexp(xiβ).

Therefore,

β*=β.

Next, we prove that the proposed estimator β is a consistent estimator (Theorem 3.4).

Proof. Using the consistency of pseudo-likelihood estimator [16] and the conclusion in Lemma 3.3, β^ is a consistent estimator of β. □

Appendix B: Additional Simulation Results

B.1. γ unknown

Here, we present the performance of estimating the fourth dimension (D4) when γ is unknown (Figure B.1). From the figures, as n and T increase, the estimate of the covariance matrices, the projection and the model coefficient converge to the truth.

B.2. Model misspecification

B.2.1. Model misspecification in β

In this section, we examine the performance of the proposed approach when the log-linear model (1.1) is misspecified. We consider the case when the data dimension is p = 20 and sample size n = 100 and Ti = T = 100 for illustration. Two scenarios are considered. In the first scenario, the true model has two correlated covariates generated from a bivariate normal distribution with mean zero, standard deviation 0.5, and correlation 0.2:

log(λij)=β0+β1xi1+β2xi2. (B.1)

In D2, |β1| = |β2| and in D4, |β1| = 2|β2|. Under the misspecified case, the second covariate, xi2, is ignored. Table B.1 presents the results using the proposed approach.

Fig B.1.

Fig B.1

Estimation performance of PS-CAP in estimating the fourth dimension (D4) when γ is unknown. For β^1, (a) bias, (b) mean squared error (MSE) and (c) coverage probability (CP) are presented, where CP is obtained from 500 bootstrap samples. For the eigenvalues λ^ij, (d) MSE is presented. For λ^, (e) similarity to π4 is presented. Data dimension p = 100. Sample sizes vary from n = 50, 100, 500, 100 and Ti = T = 50, 100, 500, 1000.

The second scenario considers the following log-linear model for the eigenvalues is considered, which includes an interaction between the covariates:

log(λij)=β0+β1xi1+β2xi2+β3(xi1×xi2), (B.2)

where xi1 is generated from a Bernoulli distribution with probability 0.5 to be one and xi2 is generated from a normal distribution with mean zero and standard deviation 0.5. Table B.2 presents the estimation results using the proposed method. Under the misspecified case, the interaction between the two covariates is ignored. Thus, in the table, it has no estimate of β3.

From both tables, under either correctly specified or misspecified model, the proposed approach correctly identifies the components related to the covariates. Under the misspecified model, the estimate of β is biased.

B.2.2. Model misspecification in γ

In this section, we discuss the robustness of the proposed approach to the violation of the assumption that all the covariance matrices share the same eigenspace. One advantage of the proposed shrinkage estimator of the covariance matrix is that it will not change the eigenvectors compared to the sample covariance matrix. In Section E.6 of the supplementary materials of Zhao et al. [41], the performance under the partial common diagonalization assumption was examined through a simulation study. In the setting, two eigencomponents are set to be the same across subjects and the rest are unique to each subject. The method can correctly identify the common component across subjects that is related to the covariates. As the proposed approach in this study has the property of preserving the eigenstructure, under the setting of partial common diagonalization, it will also correctly identify the common component that is related to the covariates.

Here, we also consider a case that each covariance matrix has a unique eigenspace, that is, the covariance matrices are generated using the eigendecomposition Σi=ΠiΛiΠi, where Πi is an orthonormal matrix in p×p, for i = 1, …, n. The rest simulation settings are the same as in Section 4.2. Table B.3 presents the results when p = 20, n = 100, and Ti = T = 100. For the estimation of γ, we compare with an average of the eigenvectors (after scaling to unit 2-norm). For D2, through the correlation between the estimated γ and the average if 0.915, the estimation of β1 is off. For D4, both the estimate of γ and β1 are away from the truth. Therefore, an assumption of partial common diagonalization is essential for the proposed framework.

Appendix C: Additional Analysis of the ADNI Study

C.1. Validity of model assumptions

In resting-state fMRI studies, the output data are generally considered normally distributed. For each time course, data are temporally correlated of at most lag two [26]. Thus, we subsample the data to remove the temporal correlation. Figure C.1 presents the normal Q-Q plot and the histogram of the data extracted from one brain region of one subject. From the figure, the marginal distribution is close to normal. Thus, the normality assumption is satisfied.

In Section 3, five assumptions are imposed to achieve estimation consistency of the parameters. By setting C1 = 2, Assumption A1 is satisfied. In the fMRI dataset, the increase of the total number of observations across subjects (i.e. N=i=1nTi) can be faster than the number of variables (p). Thus, Assumption A2 can be satisfied. Under the normality assumption, the eighth order moment exits, and Assumptions A3 and A4 are valid. Assumption A5 concerns the population eigenvalues. We cannot easily assess this assumption using sample covariance matrices due to the large bias under the high-dimensional setting [22]. We thus can only provide some empirical examination while noting that the empirical results should be evaluated with caution due to this bias issue. To empirically assess the validity of Assumption A5, we first calculate the average sample covariance matrix and then compare the eigenvectors of the average covariance matrix with the eigenvectors of each individual’s sample covariance matrix. When the correlation of two eigenvectors is greater than 0.5, we say there is a high similarity, allowing variability and bias in sample eigenvectors. About 67% of the eigenvectors have a high similarity across multiple subjects. Since the individual sample covariance matrix is rank-deficient, the eigenvectors are not unique. With about 67% overlapping, the assumption of common eigenstructure is partially satisfied. In addition, as discussed in Section B.2.2, when the eigenstructure is partially common across subjects, it will not impact the identification of the common components that are related to the covariates. The proposed approach identifies three components based on the metric of average deviation from diagonality suggesting that these three components commonly diagonalize the covariance matrices. Here the assumption that the log-linear model is correctly specified is challenging to validate using data alone. The current model is considered based on the domain knowledge and the study interest of AD research.

Table B.1.

Bias and mean squared error (MSE) in estimating β, and the similarity of λ^ to πj and the standard error (SE), for j = 2, 4, under the misspecified and correctly specified models for model (B.1). Data dimension p = 20, sample size n = 100 and Ti = T = 100.

β^1 β^2 λ^
Bias MSE Bias MSE |γ^,πj| (SE)
D2 Misspecified 0.105 0.014 - - 0.994 (0.003)
Correctly specified 0.002 0.001 < 0.001 0.001 0.993 (0.003)
D4 Misspecified −0.081 0.008 - - 0.991 (0.004)
Correctly specified −0.015 0.001 0.008 0.001 0.983 (0.009)

Table B.2.

Bias and mean squared error (MSE) in estimating β, and the similarity of λ^ to πj and the standard error (SE), for j = 2, 4, under the misspecified and correctly specified models for model (B.2). Data dimension p = 20, sample size n = 100 and Ti = T = 100.

β^1 β^2 β^3 λ^
Bias MSE Bias MSE Bias MSE |γ^,πj| (SE)
D2 Misspecified < 0.001 0.002 −0.252 0.066 - - 0.993 (0.003)
Correctly specified 0.002 0.001 −0.002 0.002 −0.001 0.003 0.993 (0.003)
D4 Misspecified −0.010 0.001 −0.113 0.014 - - 0.987 (0.006)
Correctly specified −0.010 0.001 0.015 0.002 −0.005 0.003 0.988 (0.006)

Table B.3.

Bias and mean squared error (MSE) in estimating β1 and the similarity of λ^ to the average of πij (denoted as π¯j) and the standard error (SE), for j = 2, 4, when each covariance matrix has a unique eigenspace. Data dimension p = 20, sample size n = 100 and Ti = T = 100.

β^1 λ^
Truth Bias MSE |γ^,π¯j| (SE)
D2 −1 0.980 0.971 0.915 (0.031)
D4 1 −0.523 0.293 0.571 (0.060)

Fig C.1.

Fig C.1.

Normal Q-Q plot and histogram of the data extracted from one brain region of one subject.

Contributor Information

Yi Zhao, Department of Biostatistics and Health Data Science, Indiana University School of Medicine.

Brian Caffo, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health.

Xi Luo, Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston.

References

  • [1].Anderson T (1973). Asymptotically efficient estimation of covariance matrices with linear structure. The Annals of Statistics 1 135–141. [Google Scholar]
  • [2].Anderson TW (1963). Asymptotic theory for principal component analysis. The Annals of Mathematical Statistics 34 122–148. [Google Scholar]
  • [3].Badhwar A, Tam A, Dansereau C, Orban P, Hoffstaedter F and Bellec P (2017). Resting-state network dysfunction in Alzheimer’s disease: a systematic review and meta-analysis. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring 8 73–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Bai Z and Yin Y (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The annals of Probability 1275–1294. [Google Scholar]
  • [5].Boik RJ (2002). Spectral models for covariance matrices. Biometrika 89 159–182. [Google Scholar]
  • [6].Cai TT, Ren Z and Zhou HH (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics 10 1–59. [Google Scholar]
  • [7].Chen AA, Beer JC, Tustison NJ, Cook PA, Shinohara RT and Shou H (2020). Removal of scanner effects in covariance improves multivariate pattern analysis in neuroimaging data. bioRxiv 858415. [Google Scholar]
  • [8].Chen Y, Wiesel A and Hero AO (2011). Robust shrinkage estimation of high-dimensional covariance matrices. IEEE Transactions on Signal Processing 59 4097–4107. [Google Scholar]
  • [9].Chiu TY, Leonard T and Tsui K-W (1996). The matrix-logarithmic covariance model. Journal of the American Statistical Association 91 198–210. [Google Scholar]
  • [10].Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small G, Roses AD, Haines J and Pericak-Vance MA (1993). Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261 921–923. [DOI] [PubMed] [Google Scholar]
  • [11].Daniels MJ and Kass RE (2001). Shrinkage estimators for covariance matrices. Biometrics 57 1173–1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].De Marco M and Venneri A (2017). ApoE-dependent differences in functional connectivity support memory performance in early-stage Alzheimer’s disease (P4. 094). Neurology 88. [Google Scholar]
  • [13].Flury BN (1984). Common principal components in k groups. Journal of the American Statistical Association 79 892–898. [Google Scholar]
  • [14].Fox EB and Dunson DB (2015). Bayesian nonparametric covariance regression. Journal of Machine Learning Research 16 2501–2542. [Google Scholar]
  • [15].Franks AM and Hoff P (2019). Shared Subspace Models for Multi-Group Covariance Estimation. Journal of Machine Learning Research 20 1–37. [Google Scholar]
  • [16].Gong G and Samaniego FJ (1981). Pseudo maximum likelihood estimation: theory and applications. The Annals of Statistics 861–869. [Google Scholar]
  • [17].Gour N, Felician O, Didic M, Koric L, Gueriot C, Chanoine V, Confort-Gouny S, Guye M, Ceccaldi M and Ranjeva JP (2014). Functional connectivity changes differ in early and late-onset Alzheimer’s disease. Human Brain Mapping 35 2978–2994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Gour N, Ranjeva J-P, Ceccaldi M, Confort-Gouny S, Barbeau E, Soulier E, Guye M, Didic M and Felician O (2011). Basal functional connectivity within the anterior temporal network is associated with performance on declarative memory tasks. Neuroimage 58 687–697. [DOI] [PubMed] [Google Scholar]
  • [19].Grosenick L, Klingenberg B, Katovich K, Knutson B and Taylor JE (2013). Interpretable whole-brain prediction analysis with GraphNet. NeuroImage 72 304–321. [DOI] [PubMed] [Google Scholar]
  • [20].Hoff PD (2009). A hierarchical eigenmodel for pooled covariance estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 971–992. [Google Scholar]
  • [21].Hoff PD and Niu X (2012). A covariance regression model. Statistica Sinica 22 729–753. [Google Scholar]
  • [22].Johnstone IM and Lu AY (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association 104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Koch W, Teipel S, Mueller S, Benninghoff J, Wagner M, Bokde AL, Hampel H, Coates U, Reiser M and Meindl T (2012). Diagnostic power of default mode network resting state fMRI in the detection of Alzheimer’s disease. Neurobiology of Aging 33 466–478. [DOI] [PubMed] [Google Scholar]
  • [24].Ledoit O and Wolf M (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88 365–411. [Google Scholar]
  • [25].Ledoit O and Wolf M (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics 40 1024–1060. [Google Scholar]
  • [26].Lindquist MA (2008). The statistical analysis of fMRI data. Statistical Science 23 439–464. [Google Scholar]
  • [27].Mejia AF, Nebel MB, Barber AD, Choe AS, Pekar JJ, Caffo BS and Lindquist MA (2018). Improved estimation of subject-level functional connectivity using full and partial correlation with empirical Bayes shrinkage. NeuroImage 172 478–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Pascal F, Chitour Y and Quek Y (2014). Generalized robust shrinkage estimator and its application to STAP detection problem. IEEE Transactions on Signal Processing 62 5640–5651. [Google Scholar]
  • [29].Pervaiz U, Vidaurre D, Woolrich MW and Smith SM (2020). Optimising network modelling methods for fMRI. Neuroimage 211 116604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Pourahmadi M, Daniels MJ and Park T (2007). Simultaneous modelling of the Cholesky decomposition of several covariance matrices. Journal of Multivariate Analysis 98 568–587. [Google Scholar]
  • [31].Rahim M, Thirion B and Varoquaux G (2017). Population-shrinkage of covariance to estimate better brain functional connectivity. In International Conference on Medical Image Computing and Computer-Assisted Intervention 460–468. Springer. [Google Scholar]
  • [32].Rahim M, Thirion B and Varoquaux G (2019). Population shrinkage of covariance (PoSCE) for better individual brain functional-connectivity estimation. Medical Image Analysis 54 138–148. [DOI] [PubMed] [Google Scholar]
  • [33].Safieh M, Korczyn AD and Michaelson DM (2019). ApoE4: an emerging therapeutic target for Alzheimer’s disease. BMC Medicine 17 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Seiler C and Holmes S (2017). Multivariate heteroscedasticity models for functional brain connectivity. Frontiers in Neuroscience 11 696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE et al. (2004). Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage 23 S208–S219. [DOI] [PubMed] [Google Scholar]
  • [36].Tibshirani R, Saunders M, Rosset S, Zhu J and Knight K (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 91–108. [Google Scholar]
  • [37].Tyler DE (1987). A distribution-free M-estimator of multivariate scatter. The Annals of Statistics 234–251. [Google Scholar]
  • [38].Varoquaux G, Gramfort A, Poline J-B and Thirion B (2010). Brain covariance selection: better individual functional connectivity models using population prior. In Advances in neural information processing systems 2334–2342. [Google Scholar]
  • [39].Yin Y-Q, Bai Z-D and Krishnaiah PR (1988). On the limit of the largest eigenvalue of the large dimensional sample covariance matrix. Probability theory and related fields 78 509–521. [Google Scholar]
  • [40].Zhao Y, Lindquist MA and Caffo BS (2020). Sparse principal component based high-dimensional mediation analysis. Computational Statistics & Data Analysis 142 106835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Zhao Y, Wang B, Mostofsky SH, Caffo BS and Luo X (2021). Covariate assisted principal regression for covariance matrix outcomes. Biostatistics 22 629–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Zou H, Hastie T and Tibshirani R (2006). Sparse principal component analysis. Journal of computational and graphical statistics 15 265–286. [Google Scholar]
  • [43].Zou T, Lan W, Wang H and Tsai C-L (2017). Covariance Regression Analysis. Journal of the American Statistical Association 112 266–281. [Google Scholar]

RESOURCES