On statistical inference with high-dimensional sparse CCA

Nilanjana Laha; Nathan Huey; Brent Coull; Rajarshi Mukherjee

doi:10.1093/imaiai/iaad040

. 2023 Nov 17;12(4):2818–2850. doi: 10.1093/imaiai/iaad040

On statistical inference with high-dimensional sparse CCA

Nilanjana Laha ^1,^✉, Nathan Huey ², Brent Coull ³, Rajarshi Mukherjee ⁴

PMCID: PMC10656287 PMID: 37982049

Abstract

We consider asymptotically exact inference on the leading canonical correlation directions and strengths between two high-dimensional vectors under sparsity restrictions. In this regard, our main contribution is developing a novel representation of the Canonical Correlation Analysis problem, based on which one can operationalize a one-step bias correction on reasonable initial estimators. Our analytic results in this regard are adaptive over suitable structural restrictions of the high-dimensional nuisance parameters, which, in this set-up, correspond to the covariance matrices of the variables of interest. We further supplement the theoretical guarantees behind our procedures with extensive numerical studies.

Keywords: sparse canonical correlation analysis, asymptotically valid confidence intervals, one-step bias correction, high-dimensional nuisance parameters

1. Introduction

Statistical analysis of biomedical applications requires methods that can handle complex data structures. In particular, formal and systematic Exploratory Data Analysis (EDA) is an important first step when attempting to understand the relationship between high-dimensional variables. Key examples include eQTL mapping and [20, 66] epigenetics [26, 29, 30, 57]. In greater generality, EDA can be an essential part of any study involving the integration of multiple biological datasets including as genetic markers, gene expression and disease phenotypes [37, 45]. In each of these examples, it is critically important to understand the relationships between the high-dimensional variables of interest. Linear relationships are often the most straightforward and intuitive models used in this regard, lending themselves well to interpretation. Consequently, a large volume of statistical literature has been devoted to exploring linear relationships through variants of the classical statistical toolbox of Canonical Correlation Analysis (CCA) [28]. We focus in this paper on some of the most fundamental inferential questions in the context of high-dimensional CCA.

To formally set up these inferential questions in the CCA framework, we consider i.i.d. data Inline graphic on two random vectors and with joint covariance matrix

The first canonical correlation Inline graphic is defined as the maximum possible correlation between two linear combinations of and . More specifically, consider the following optimization problem:

(1.1)

The maximum value attained in (1.1) is Inline graphic , and the solutions to (1.1) are commonly referred to as the first canonical directions, which we will denote by and , respectively. This paper considers inference on , , and associated quantities of interest. In most scientific problems, the first canonical correlation coefficient is of prime interest as it summarizes the ‘maximum linear association’ between Inline graphic and , thereby motivating our choice of inferential target.

Early developments in the theory and application of CCA have been well documented in the statistical literature, and we refer the interested reader to [1] and the references therein for further details. These classical results have been widely used to provide statistical inference (i.e. asymptotically valid hypothesis tests, confidence intervals and p-values) across a vast range of disciplines, such as psychology, agriculture, oceanography and others. However, the current surge of interest in CCA, motivated by modern high-throughput biological experiments, requires re-thinking several aspects of the traditional theory and methods. In particular, in many contemporary datasets, the number of samples is often comparable with, or much smaller than, the number of variables in the study. This renders vanilla CCA inconsistent and inadequate without further structural assumptions [3, 13, 48]. A natural constraint that has gained popularity is that of sparsity, i.e. when an (unknown) small collection of variables is relevantly associated with each other rather than the entire collection of high-dimensional variables. Sparse Canonical Correlation Analysis (SCCA) [66] has been developed to target such low-dimensional structures and subsequently provide consistent estimation in the context of high-dimensional CCA. Although such structured CCA problems have witnessed a renewed enthusiasm from both theoretical and applied communities, most papers have heavily focused on the key aspects of estimation (in suitable norms) and relevant scalable algorithms—see, for example, [19, 24, 25, 48, 49]. However, asymptotically valid inference is yet to be explored systematically in the context of SCCA. In particular, none of the existing estimation methods for SCCA lend themselves to uncertainty quantification, i.e. inference on Inline graphic , or . This is unsurprising, given that the existing estimators are based on penalized methods. Thus, they are asymptotically biased, super-efficient for estimating zero coordinates and not tractable in terms of estimating underlying asymptotic distributions.[42–44, 55]. This complicates the construction of asymptotically valid confidence intervals for Inline graphic , ’s and . In the absence of such intervals, bootstrap or permutation tests are typically used in practice [66]. However, these methods are often empirically justified and might suffer from subtle pathological issues that underlie standard re-sampling techniques in penalized estimation frameworks [16–18]. This paper takes a step in resolving these fundamental issues with inference in the context of SCCA.

1.1 Main contribution

The main result of this paper is a method to construct asymptotically valid confidence intervals for Inline graphic and . Our method is based on a one-step bias correction performed on preliminary sparse estimators of the canonical directions. The resulting bias-corrected estimators have an asymptotic linear influence function type expansion (see e.g. [60] for asymptotic influence function expansions) with Inline graphic -scaling (see Theorem 4.1 and Proposition 4.2) under suitable sparsity conditions on the truth. This representation is subsequently exploited to build confidence intervals for a variety of relevant lower dimensional functions of the top canonical directions; see Corollary 1 and Corollary 3 and the discussions that follow. Finally, we will show that the entire de-biased vector is asymptotically equivalent to a high-dimensional Gaussian vector in a suitably uniform sense; see Proposition 4.2, which enables the control of family-wise error rates.

The bias correction procedure crucially relies on a novel representation of Inline graphic and as the unique maximizers (up to a sign flip) of a smooth objective (see Lemma 3.1), which may be of independent interest. The uniqueness criteria are indispensable here since otherwise, a crucial local convexity property (see Lemma 3.2) is not guaranteed, which we exploit to deal with the high dimensionality of the problem. We also discuss why the commonly used representations of the top canonical correlations are difficult to work with owing to either the lack of such local convexity properties or the lack of a non-cumbersome derivation of the one-step bias correction. We elaborate on these subtleties in Section 3.2.

Further, we pay special attention to adapting to underlying sparsity structures of the marginal precision matrices ( Inline graphic ) of the high-dimensional variables () under study. These serve as high-dimensional nuisance parameters in the SCCA problem. Consequently, our construction of asymptotically valid confidence intervals for top canonical correlation strength and directions are agnostic over the structures (e.g. sparsity of the precision matrices of Inline graphic and ) of these complex nuisance parameters. The de-biasing procedure can be implemented using our R package [31].

Finally, we supplement our methods for inference with suitable constructions of initial estimators of canonical correlation directions as well as nuisance parameters under suitable sparsity assumptions. The construction of these estimators, although motivated by existing ideas, requires careful modifications to tackle inference on the first canonical correlation strength and directions while treating remaining directions as nuisance parameters.

2. Mathematical formalism

In this section, we collect some assumptions and notations that will be used throughout the rest of the paper.

2.1 Structural assumptions

Throughout this paper, we will assume that Inline graphic and are centred sub-Gaussian random vectors (see [63] for more details) with joint covariance matrix and sub-Gaussian norms bounded by some constant (see p. 28 of [62] for the definition). The sub-Gaussianity assumption is a standard requirement that can be found in various related literature [5, 25, 32, 39, 49]. Assuming that the data are sub-Gaussian, we can obtain tighter concentration bounds on the sample covariance matrix. Although we are not aware of any efficient method for testing the sub-Gaussianity of a random vector Inline graphic or , there are several well-known examples of sub-Gaussian vectors. For instance, a multivariate Gaussian distribution is sub-Gaussian if the maximum eigenvalue of its variance matrix is bounded [38]. Moreover, random vectors uniformly distributed on the Euclidean sphere in with the origin as their centre and radius Inline graphic are also sub-Gaussian. Another example is when a vector’s elements are independent and have uniformly bounded sub-Gaussian norms, which is satisfied when each element is uniformly bounded [63]. We refer the readers to [63] for more examples.

We will let Inline graphic have a fixed rank (implying that apart from , there are additional canonical correlations [1]). Since the cross-covariance matrix has rank , it it can be shown that [cf. 19, 25]

(2.1)

where Inline graphic and are and dimensional matrices satisfying and , respectively. The in (2.1) is a diagonal matrix, whose diagonal entries are the canonical correlations, i.e.

The matrices Inline graphic and need not be unique unless the canonical correlations, i.e. the ’s, are all unique. Indeed, we will at the least require the uniqueness of and . Otherwise, these quantities are not even identifiable. To that end, we will make the following assumption that is common in the literature since it grants uniqueness of Inline graphic and up to a sign flip [cf. 19, 25, 49].

Assumption 2.1.

(Eigengap Assumption) There exists so that for all and .

Note that Assumption 2.1 also implies that Inline graphic stays bounded away from zero. There exist formal tests for verifying the eigengap assumption in the asymptotic regime when [3, 68]. However, to the best of our knowledge, no such tests currently exist for . A possible way to inspect the eigengap in this scenario is to estimate the canonical correlations and plot them against their index, which is called a screeplot. However, even the methods that consistently estimate the canonical correlations in the Inline graphic setting require the eigengap assumption [49]. Therefore, using a screeplot to assess the eigengap in this scenario may be unreliable.

Our next regularity assumption, which requires Inline graphic and to be positive definite and bounded in operator norm, is also common in the SCCA literature [24, 25, 39, 49]. If and are not bounded in the operator norm, it can be shown that our sub-Gaussianity assumption will be violated [63].

Assumption 2.2.

(Bounded eigenvalue Assumption) There exists such that the eigenvalues of and are bounded below by and bounded above by for all and .

Although we require assumptions 2.1 and 2.2 to hold for all Inline graphic and , since our results are asymptotic in nature, it suffices if these assumptions hold for all sufficiently large and . Checking whether and are bounded away from zero in high-dimensional settings can be challenging but it can be done under structural assumptions such as sparsity [8, 10]. In the context of proportional asymptotics (i.e. dimensions proportional to sample size) one can also appeal to results in classical random matrix theory concerning the largest eigenvalue [2] to check whether Inline graphic and are bounded above without additional sparsity assumptions on them. However, rigorous treatment of the above is beyond the scope of this paper.

However, if Inline graphic has some certain structures, Assumption 2.2 follows. We list a few examples below. (1) is a spike covariance matrix with finitely many spikes. This model has gained extensive attention in recent high-dimensional literature [36]. (2) is an autoregressive matrix of order one, i.e. . In this case, the eigenvalues lie in the interval Inline graphic (cf. [59]). (3) is a banded Toeplitz matrix, i.e. , where for for some finite . This model has seen use in high-dimensional statistics literature [11]. In this case, Assumption 2.2 can be proved using Gershgorin circle theorem [27].

2.2 Notation

We will denote the set of all positive integers by Inline graphic . For a matrix , we denote its th row and th column by and , respectively. Also, let and denote the largest and smallest eigenvalue of , respectively. We denote the gradient of a function by or , where we reserve the notation for the hessian. The th element of any vector is denoted by Inline graphic . We use the notation to denote the usual norm of a p-dimensional vector for any . For a matrix , and will denote the Frobenius and the operator norm, respectively. We denote by the elementwise supremum of . The norm will denote . For any , will denote the largest integer smaller than or equal to Inline graphic . will denote a positive constant whose value may change from line to line throughout the paper.

The results in this paper are mostly asymptotic (in Inline graphic ) in nature and thus require some standard asymptotic notations. If and are two sequences of real numbers then (and ) implies that (and ) as , respectively. Similarly (and ) implies that for some (and for some ). Alternatively, will also imply and will imply that for some ).

We will denote the set of the indices of the non-zero rows in Inline graphic and by and , respectively. We let and be the cardinalities of and and use to denote the total sparsity. We further denote by and the number of non-zero elements of and , respectively. The supports of and will similarly be denoted by and , respectively. We will discuss the precise requirements on these sparsities and the necessities of such assumptions in detail in Section 4.1.

Our method requires initial estimators of Inline graphic , , and . We let and be the initial estimators of and , respectively. Also, we denote the empirical estimates of , , and , by , , and , respectively. The estimate of is

(2.2)

The quantity Inline graphic may not be positive for any and . Therefore, mostly we will use as an estimate of .

This paper provides many rate results that involve the term Inline graphic . This term arises due to the union bound and is equivalent to the term in the asymptotic rate results of high-dimensional Lasso. To simplify notation, we will denote this term as , using the following equation:

(2.3)

3. Methodology

This section discusses the intuitions and details of the proposed methodology that we will analyse in later sections. The discussions are divided across three main subsections. The first Subsection 3.1 presents the driving intuition behind obtaining de-biased estimators of general parameters of interest, defined through a generic optimization framework. Subsequently, Subsection 3.2 translates this intuition to a working principle in the context of SCCA. In particular, we design a suitable optimization criterion that allows a principled application of the general de-biasing method and lends itself to rigorous theoretical analyses. Finally, our last Subsection 3.3 elaborates on the benefit of designing this specific optimization objective function over other possible choices of optimization problems that define the leading canonical directions.

3.1 The De-biasing method in general

We first discuss the simple intuition behind reducing the bias of estimators defined through estimating equations. Suppose we are interested in estimating Inline graphic , which minimizes the function . If is smooth, then solves the equation . Suppose is in a small neighbourhood of . The Taylor series expansion of around yields , where lies on the line segment joining and . If has finitely many global minima, then cannot be flat at . In that case, Inline graphic is strongly convex at some neighbourhood of . Therefore is positive definite, leading to . Suppose and are reliable estimators of and , respectively. Correcting the first-order bias of then yields the de-biased estimator . Thus, to find a bias-corrected estimator of , it suffices to find a smooth function which is minimized at Inline graphic and has at most finitely many global minima. This simple intuition is the backbone of our strategy.

Remark 3.1.

(Positive definiteness of ) The positive definiteness of is important as this is a requirement for most existing methods for estimating the inverse of a high-dimensional matrix. These methods proceed via estimating the columns of separately through a quadratic optimization step. Unless the original matrix is positive definite, these intermediate optimization problems are unbounded. Therefore, the associated algorithms are likely to diverge even with many observations. For more details, see section 1 of [32] (see also section 2.1 of 69].

3.2 The De-biasing method for SCCA

To operationalize the intuition described above in Section 3.1, we begin with a lemma which represents Inline graphic and as the unique minimizers (upto a sign flip) of a smooth objective function. We defer the proof of Lemma 3.1 to Supplement E.

Lemma 3.1.

Under Assumption 2.1, for any , we have

where .

The proof of Lemma 3.1 hinges on a seminal result in low rank matrix approximation dating back to [23], which implies that for any matrix Inline graphic with singular value decomposition ,

(3.1)

where Inline graphic is the set of all matrices with rank . We will use this result with in the proof of Lemma 3.1. In that case, if , then the minimizer in (3.1) is not unique, which is the reason why it is necessary to impose Assumption 2.1 in Lemma 3.1. Our primary inferential method for leading canonical directions builds on Lemma 3.1, and consequently, corrects for the bias of estimating Inline graphic and using preliminary plug-in estimators from literature. It is worth noting that we focus on the the leading canonical directions up to a multiplicative factor since from our inferential point of view, this quantity is enough to explore the nature of projection operators onto these directions. In particular, the test Inline graphic is equivalent to tests for no-signal such as .

Remark 3.2.

Suppose is as in Lemma 3.1. It can be shown that the other stationary points of , to be denoted by , correspond to the canonical pairs with correlations , . Moreover, the hessian of at has both positive and negative eigenvalues, indicating that the function is neither concave nor convex at these points. Therefore, all these stationary points are saddle points. Consequently, any minimum of is a global minimum irrespective of the choice of .

Now note that

(3.2)

and hence by symmetry, the hessian, Inline graphic , of at is given by

We note the flexibility of our approach with regard to the choice of Inline graphic . This allows us to work with a more amenable form of the hessian and its inverse that we need to estimate. We subsequently set so that the estimation of the cross term can be avoided. In particular, when and , then . We denote the hessian in this case as

(3.3)

A plug-in estimator Inline graphic of is given by

Because our Inline graphic is a sufficiently well-behaved function, it possesses a positive definite hessian at the minima , thereby demonstrating the crucial strong convexity property mentioned in Remark 3.1. This property of is the content of our following lemma, the proof of which can be found in Supplement E.

Lemma 3.2.

Under Assumptions 2.1 and 2.2, the matrix defined in (3.3) is positive definite with minimum eigenvalue where is as in Assumption 2.2.

Lemma 3.1 and Lemma 3.2 subsequently allows us to construct de-biased estimators of the leading canonical directions as follows. Suppose Inline graphic and are estimators of and , where and are the preliminary estimators of and , and is as defined in (2.2). Our construction of de-biased estimators for SCCA now relies on two objects: (a) estimators of and , which are simply given by

(3.4)

and (b) an estimator Inline graphic of , the inverse of . Construction of such an estimator can be very involved. To tackle this challenge, we develop a version of the nodewise Lasso algorithm (see Supplement C.4 for details) popularized in recent research [61].

Following the intu itions discussed in Section 3.1, we can then complete the construction of the de-biased estimators, whose final form can be written as

(3.5)

In Section 5, we will discuss how our proposed method connects to the broader scope of de-biased inference in high-dimensional problems. Regarding the targets of our estimators, we note that if Inline graphic estimates , then also estimates . However, if approximates instead, then instead approximates . A similar phenomenon can be observed for as well. Our theoretical analyses of these estimators will be designed accordingly.

Next, we will construct a de-biased estimator of Inline graphic using and . As we will see in Section 4, we will require stricter assumptions on the nuisance parameters to correct the first-order bias of . In particular, we will require the columns of and to be sparse. We will also require estimators of and that are column sparse. Section 4 will provide a more detailed discussion of the sparsity requirements on Inline graphic and and their corresponding sparse estimators. For now, we will assume that we have access to and , which are column-sparse estimators of and , respectively.

Recall that Inline graphic denotes the estimator of based on and . Our estimator of can be constructed as , where

We want to clarify that in constructing Inline graphic and , we use and as before. We do not use and there.

Before moving onto the theoretical properties of our proposed methods, we make a slight relevant digression to note that there exist many formulations of the optimization program in (1.1) so that Inline graphic are the global optima. We therefore close this current section with a discussion on why the particular formulation in Lemma 3.1 is beneficial for our purpose.

3.3 Subtleties with other representations of and

The most intuitive approach to characterize Inline graphic is to conceptualize it as the maximizer of the constrained maximization problem (1.1). This leads to the Lagrangian

(3.6)

where Inline graphic and are the Lagrange multipliers. Denoting , it can be verified that since is a stationary point of (1.1), also solves . Using the first-order Taylor series expansion of , one can then show that any in a small neighbourhood of has the approximate expansion

If we then replace Inline graphic by an estimator of , we can use the above expansion to estimate the first-order bias of this estimator provided is suitably well-behaved and estimable. However, by strong max-min property [cf. Section 5.4.1 6], satisfies

(3.7)

which implies Inline graphic is a saddle point of . Thus fails to be positive definite. In fact, any constrained optimization program fails to provide a Lagrangian with positive definite hessian, and thus violates the requirements outlined in Section 3.1. We have already pointed out in Remark 3.1 that statistical tools for efficient estimation of the inverse of a high-dimensional matrix are scarce unless the matrix under consideration is positive definite. Therefore, we refrain from using the constrained optimization formulation in (1.1) for the de-biasing procedure.

For any Inline graphic , the function

however, is a valid choice for the Inline graphic outlined in Subsection 3.1 since its only global minimizers are , which also indicates strong convexity at . However, the gradient and the hessian of this function takes a complicated form. Therefore, establishing asymptotic results for the de-biased estimator based on this is significantly more cumbersome than its counterpart based on the Inline graphic in Lemma 3.1. Hence, we refrain from using this objective function for our de-biasing procedure as well.

3.4 Possible extension to higher order canonical directions

Let us denote by Inline graphic where we remind the readers that were the canonical direction pairs corresponding to for . Till now, we discussed the estimation of . In this section, we will briefly outline the de-biased estimation of of for because they may be of interest in some applications. To this end, we first present a lemma that generalizes Lemma 3.1 to higher order canonical directions. The proof of Lemma 3.3 can be found in Supplement E. This lemma shows that similar to Inline graphic , the scaled higher order canonical directions can be presented as the minimizer of an unconstrained optimization problem up to a sign flip.

Lemma 3.3.

Suppose for some , where we take . Then for any , we have

Here where

As in Lemma 3.1, the condition Inline graphic is required to ensure that is identifiable up to a sign flip. The proof of Lemma 3.3 is similar to Lemma 3.1 and relies on (3.1).

An important observation from Lemma 3.3 is that the formula for Inline graphic is identical to that of in Lemma 3.1, except the in Lemma 3.1 has been replaced by . Therefore, the gradient and hessian of and are the same, except the in the former case is replaced by in the latter case. Since , we can proceed as in the case of and show that when , the hessian of Inline graphic takes the form

When Inline graphic , similar to Lemma 3.2, we can show that is positive definite.

In light of the previous discussion, we can construct a bias-corrected estimator of Inline graphic by following the same approach as in (3.5), provided that we can obtain a reliable estimate of and we have and consistent preliminary estimators of for . To obtain such preliminary estimators, we may extend our modified COLAR algorithm to the rank case, which will basically implement [25]’s COLAR algorithm with a rank parameter of Inline graphic . The estimation of requires estimators of all canonical directions up to order . Therefore we will estimate recursively. To do so, let us assume that we already have at our disposal bias-corrected estimators of for , denoted by . An estimator of can then be obtained using the following formula:

We can then use Inline graphic to obtain the de-biased estimator of . Note that, for , the de-biased estimators of the canonical correlations will depend on the nuisance matrices and in a more complicated manner. Therefore, the -consistency of may require stronger restrictions on and than those imposed by Assumptions 2.1 and 2.2. Although theoretical analysis of the higher order bias-corrected estimator Inline graphic is beyond the scope of this paper, we leave it as a topic for future research.

4. Asymptotic theory for the de-biased estimator

In this section, we establish the theoretical properties of our proposed estimators under a high-dimensional sparse asymptotic framework. To set up our main theoretical results, we present assumptions on sparsities of the true canonical directions and desired conditions on initial estimators of Inline graphic in Subsection 4.1. The construction of estimators with these desired properties is discussed in Supplement B and Supplement C. Subsequently, we present the main asymptotic results and their implications for the construction of confidence intervals of relevant quantities of interest in Subsection 4.2.

4.1 Assumptions on , and

For the de-biasing procedure to be successful, it is important that Inline graphic and are both and consistent for and with suitable rates of convergence. In particular, we will require them to satisfy the following condition.

Condition 4.1.

(Preliminary estimator condition) The preliminary estimators and of and satisfy the following for some , , and as defined in (2.3):

and

We present discussions regarding the necessity of the rates mentioned above as well as the motivation behind the exponent Inline graphic in Section 6. Moreover, we also discuss the construction of estimators satisfying Condition 4.1 in Supplement B. Our method for developing these initial estimators is motivated by the recent results in [25], who jointly estimate and up to an orthogonal rotation with desired guarantees. However, our situation is somewhat different since we need to estimate Inline graphic and up to a sign flip, which might not be obtained from the joint estimation of all the directions up to orthogonal rotation. This is an important distinction since the remaining directions act as nuisance parameters in our set-up. The asymptotics of the sign-flipped version require crucial modification of the arguments of [25]. The analysis of this modified procedure presented in Supplement 1 allows us to extract both the desired Inline graphic and guarantees.

We will also require an assumption on the sparsities Inline graphic and , the number of non-zero rows of and , respectively. We present this next while deferring the discussions on the necessity of such assumptions to Section 6.

Assumption 4.1.

(Sparsity Assumption) We assume , and where and is as in Condition 4.1.

In high-dimensional settings, low-dimensional structure is often necessary to estimate elements at Inline graphic -rate. Sparsity is one common and convenient form of low-dimensional structure that is frequently assumed in high-dimensional statistics [19, 24, 32, 49]. However, verifying this assumption mathematically can be a challenging information-theoretic question [14]. Therefore, subject matter knowledge is often required to justify the sparsity assumption. For instance, in genetic studies, only a small number of genetic variants or features are typically associated with a particular disease or phenotype of interest, which justifies the sparsity assumption for the canonical covariates between the genetic features and phenotypes [67]. We defer the discussion on the specific rates appearing in Assumption 4.1 to Section 6.

Finally, our last condition pertains to the estimator Inline graphic on . Most methods for estimating precision matrices can be adopted to estimate using an estimator of . However, care is needed since needs to satisfy certain rates of convergence for the de-biased estimators in (3.5) to be -consistent. We present this condition below.

Condition 4.2.

Inverse hessian Conditions. The estimator satisfies

and

where is as in Condition 4.1.

We defer the discussion on the construction of Inline graphic to Supplement C, where, in particular, we will show that the nodewise Lasso type estimator, which appeals to the ideas in [61], satisfies Condition 4.2.

4.2 Theoretical analyses

In what follows, we present only the results on inference for Inline graphic . Results for can be obtained in a parallel fashion. Before stating the main theorem, we introduce a few additional notations. We partition the column of comfortably w.r.t. the dimensions of and as , where and . We subsequently define the random variable

(4.1)

and its associated variance as

(4.2)

Since Inline graphic and are sub-Gaussian, it can be shown that all moments of , and in particular the ’s are finite under Assumption 2.2. Indeed, we show the same through the proof of Theorem 4.1. Finally define

(4.3)

With this we are ready to state the main theorem of this paper. This theorem is proved in Supplement F.

Theorem 4.1.

(Asymptotic representation of ) Suppose and are centred sub-Gaussian vectors, and Assumptions 2.1, 2.2 and 4.1 hold. Further suppose satisfies Condition 4.2, and and satisfy Condition 4.1. In particular, suppose and satisfy

(4.4)

Then the estimator defined in (3.5) satisfies , where is as defined in (4.3), and is a random vector satisfying . Here and are as in (2.3) and Condition 4.1, respectively. If and satisfy

(4.5)

instead, then satisfies , where .

A few remarks are in order about the statement and implications of Theorem 4.1. First, (4.4) and (4.5) correspond to the cases when Inline graphic concentrate around and , respectively. We point out that (4.4) and (4.5) essentially do not impose extra restrictions on top of Condition 4.1. If are reasonable estimators satisfying Condition 4.1, and they are chosen so as to ensure , then we can expect either (4.4) or (4.5) to hold. However, Condition 4.1 and Inline graphic alone are not sufficient to eliminate pathological cases where and do not converge anywhere. For example, consider the trivial case where for even , and for odd . In this case, and satisfy Condition 4.1 and , but they do not converge in for any . The purpose of (4.4) and (4.5) is to disentangle the sign flip from the asymptotic convergence of Inline graphic and , which helps to eliminate such uninteresting pathological cases from consideration.

Second, we note that under Assumption 4.1, Inline graphic . The importance of Theorem 4.1 subsequently lies in the fact that it establishes the equivalence between and the more tractable random vector under Assumption 4.1. In particular, one can immediately derive a simple yet relevant corollary about the asymptotically normal nature of the distribution of our de-biased estimators.

Proof outline of Theorem 4.1. For the sake of simplicity, we only consider the case when (4.4) holds. The key step of proving Theorem 4.1 is to decompose Inline graphic into four terms as follows:

(4.6)

The first term, i.e. the -term, is the main contributing term in the above expansion because, as the name suggests, it is asymptotically equivalent to Inline graphic (up to a term of order ). To show this, the main tool we use is the concentration of the sample covariance matrices around their population versions in the norm (see Lemma D.5), which is an elementary result for sub-gaussian random vectors (cf. 7, 32]. The remaining three terms on the right-hand side (RHS) of (4.6) are error terms of order Inline graphic .

First, we encounter the cross-product term because we approximate Inline graphic and by and , respectively. We control the error in estimating using Condition 4.2. To control the estimation error of , we observe that its random elements are basically sample covariance matrices, which, as previously mentioned, concentrate in the norm.

Second, the Taylor series approximation term occurs because our de-biasing method is essentially based on the first-order Taylor series approximation Inline graphic . The error due to this approximation is small because and are asymptotically close to and in the and norms; see Condition 4.1.

The final term, i.e. the preliminary estimation error term, is again a cross-product term. To show that this term is of order Inline graphic , we exploit the -consistency of and .

Now we will present an important corollary of Theorem 4.1, which underscores that Inline graphic equals and an error term of smaller order. Using the central limit theorem, it can be shown that the marginals of converge in distribution to centred gaussian random variables. In particular, we can show that

(4.7)

However, in Corollary 1, we decide to provide inference on Inline graphic instead of , because (a) the former is unaffected by the sign flip of and (b) the sign of is typically of little interest. As a specific example, testing is equivalent to testing . More importantly, one of the central objects of interest in low-dimensional representations obtained through SCCA is the projection operators onto the leading canonical directions. It is easy to see that for this operator, it is sufficient to understand the squared Inline graphic and the cross-terms , respectively. The proof of Corollary 1 is deferred to Supplement H.

Corollary 1.

Under the set up of Theorem 4.1, for any , the following assertions hold:

(a). If , then converges in distribution to a centred Gaussian random variable with variance where the ’s are as defined in (4.2).

(b). If , then converges in distribution to a central Chi-squared random variable with degrees of freedom one and scale parameter .

Next, we present results on the uniform nature of the joint asymptotically normal behaviour for the entire vector Inline graphic . To this end, we verify in our next proposition that if , the convergence in (4.7) is uniform across when restricted to suitably well-behaved sets. A proof of Proposition 4.2 is in Section G.

Proposition 4.2.

Let be the set of all hyperrectangles in and let be the covariance matrix of the -variate random vector . Assume the set-up of Theorem 4.1 Suppose the ’s defined in (4.2) satisfy for some and . Then if (4.4) holds,

and if (4.5) holds (the sign-flip case), then

where is a random vector distributed as .

Proof outline of Proposition 4.2. The proof of Proposition 4.2, which can be found in Supplement H, relies on a Berry–Esseen-type result. As indicated earlier, Inline graphic decomposes into and a reminder term of smaller order. The marginals of converge weakly to centred Gaussian distributions, which ultimately leads to (4.7). We apply a Berry–Esseen-type result from [21] on to show that the weak convergence of marginals can be strengthened to establish convergence over rectangular sets.

The lower bound requirement on the variance of the marginal limiting distribution, i.e. Inline graphic ’s, is typical for Berry–Esseen-type theorems—see e.g. [21]. As a specific example, we provide Corollary 2 below to establish the validity of for some when is jointly Gaussian. The proof of Corollary 2 can be found in Supplement G.

Corollary 2.

Suppose are jointly Gaussian, and is bounded away from zero and one. Further suppose . Then under the set-up of Theorem 4.1, the assertion of for some used in Proposition 4.2 holds.

Proposition 4.2 can be used, as mentioned earlier, for inference on the non-diagonal elements of the matrix Inline graphic . This is the content of the following corollary—the proof of which can be found in Supplement H.

Corollary 3.

Consider the set-up of Proposition 4.2. Suppose is positive definite. Let , and . Denote by the covariance between and , where ’s are as defined in (4.1). Then the following assertions hold:

Suppose . Then

Suppose . Then
where , and .

Here once again, we observe that the de-biased estimators of Inline graphic have different asymptotic behaviour depending on whether or not, which parallels the behaviour of the de-biased estimators of the diagonal elements we demonstrated earlier through Corollary 1.

Remark 4.1.

Proposition 4.2 can also be used to simultaneously test the null hypotheses . The uniform convergence in Proposition 4.2 can be used to justify multiple hypothesis testing for the coordinates of whenever the corresponding p-values are defined through rectangular rejection regions based on . To this end, one can use standard methods like Benjamini and Hochberg (BH) and Benjamini and Yekutieli (BY) procedures for FDR control. The simultaneous testing procedure can thereby also be connected to variable selection procedures. However, we do not pursue it here since specialized methods are available for the latter in SCCA context [39].

We end our discussions regarding the inference of Inline graphic with a method for consistent estimation of the ’s. Indeed, this will allow us to develop tests for the hypotheses or to build confidence intervals for . To this end, we partition , where and . Because for , we estimate using the pseudo-observations , which are defined by

To this end, we propose to use a sample-splitting technique to estimate Inline graphic , , , and from one half of the sample, and construct the pseudo-observations using the other half of the sample. Then we can estimate as the sample variance of the pseudo-observations. Since and are sub-Gaussians, under the set-up of Theorem 4.1, it can be shown that the resulting estimator is consistent.

4.2.1 Inference on

Our final theoretical result pertains to the asymptotic distribution of Inline graphic . However, we require and to be column-sparse (see Assumption 4.2 below) to establish the -consistency of .

Assumption 4.2.

A column of or can have a maximum of many non-zero elements, where is a positive integer. In other words, or are bounded by for each and .

We provide an explanation for the need behind Assumption 4.2. The assumption is crucial because the first-order bias of Inline graphic contains terms like , where is as defined in Theorem 4.1. These terms cannot be effectively controlled under the weaker Assumption 2.2 as may not be sparse or small in norm, which necessitates stronger assumptions on . Also, to control the first-order bias of , we require estimators of Inline graphic and with and guarantees—see Condition 4.3. In high dimensions, the existence of such estimators is not guaranteed without structural assumptions on and [5].

Assumption 4.2 implies a low degree of within-group association among the Inline graphic and variables. If the number of and variables are large, and they are also highly associated within themselves, decoding the association between and becomes difficult. To overcome this challenge, we require sparsity assumptions on the covariance matrices to precisely estimate the strength of the association between Inline graphic and in high dimensions. Sparse covariance matrices are also a common assumption in high-dimensional statistics [5, 9, 11]. In fact, many methods have been developed specifically to estimate sparse covariance matrices in high-dimensional settings [5, 9, 11]. The sparsity assumption on Inline graphic and can often be justified for genetic and network data by utilizing subject-matter knowledge (cf. [15, 53, 56]). Also, and will automatically satisfy Assumption 4.2 if they are banded matrices, a widely used structural assumption for high-dimensional covariance matrices [5, 11, 58].

The analog of Inline graphic in the context of principal component analysis (PCA) is the square of the largest eigenvalue [32], for which a de-biased estimator can be derived without column-sparsity assumptions, as in Assumption 4.2. This discrepancy is explained by the fact that in PCA, is not a nuisance parameter; its information on principal components is solely stored in Inline graphic . However, for CCA, both and are nuisance parameters, thus implying the need for different assumptions in sparse PCA compared with sparse CCA. It might be possible to find an -consistent estimator of with less restrictive assumptions. However, since our goal here is to obtain de-biased estimators of canonical covariates, further investigation in this direction is beyond the scope of this paper.

The following condition states the consistency requirements of Inline graphic and .

Condition 4.3.

and satisfy the following conditions:

(4.8)

Condition 4.3 requires operator norm consistency, but this can be relaxed to the condition that

However, we choose to state the condition in terms of operator norm consistency because this form is more prevalent in the sparse covariance matrix estimation literature [5, 8].

Example: Under our assumptions, the coordinate-thresholding estimator proposed by [5] satisfies Condition 4.3. Specifically, we define the estimators Inline graphic and as and , where . Theorem 1 of [5] guarantees the operator norm consistency of these estimators, while their norm consistency is ensured by Theorem 4 of [9] for sufficiently large .

We are now ready to state our final theorem, a proof of which can be found in Supplement H.

Theorem 4.3.

Suppose and satisfy Assumption 4.2 and the conditions of Theorem 4.1 are met. Suppose, in addition, , where is as in Assumption 4.2. Further suppose either (4.4) or (4.5) holds.

Then for ,

where . When , weakly converges to a random variable with distribution function where

Here is the distribution function of the standard Gaussian random variable. Moreover, when the observations are Gaussian, .

We note several points concerning the statement of Theorem 4.3.

(1) variance of the parametric MLE: In the Gaussian case, the value of matches the variance of the parametric MLE of under the Gaussian model [1, p.505]. This alignment is often seen in de-biased estimators, e.g. the de-biased estimator of the principal eigenvalue [32].
(2) Extreme When , the limiting distribution of is discontinuous because it is a truncated estimator, where the truncation is applied at . When , unlike the case, the truncation does not affect the asymptotic distribution of .
(3) Sparsity condition: Theorem 4.3 imposes stricter conditions on than those specified in Theorem 4.1. Although we have not investigated the strictness of this assumption, a similar stricter sparsity requirement can be found in [32], where a -consistent estimator of the maximum eigenvalue is constructed for the sparse PCA problem. To compare our set-up with the PCA case, we consider our nuisance parameters and as identity matrices, which implies . In such a case, our sparsity condition reduces to , which is similar to that of [32].
(4) Interplay between and : The condition for sparsity, , necessitates that either or be of a smaller order. It is worth noting that tends to be large in cases where the elements of and are highly correlated with each other. Hence, we can interpret the interplay between and in the following manner: either the correlation within the variables and or the correlation between these variables must be sufficiently low-dimensional for our bias correction method to be effective.

To construct an estimator of Inline graphic , we use a sample splitting procedure. First, we split the sample into two parts of roughly equal size: and . We use the sample part to estimate , and . Next, we use these estimators to construct the pseudo-observations , where the ’s and ’s () come from . Finally, we obtain the estimator of Inline graphic by averaging the squares of these pseudo-observations, i.e. . This estimator will be consistent under the Conditions of Theorem 4.1, and hence can be used in combination with Theorem 4.3 to construct confidence intervals for . To improve the efficiency of this estimator, we can repeat its construction by swapping the sample parts, and take average of the two estimators resulting from a sample split.

Proof outline of Theorem 4.3. The key step of this proof is to show that Inline graphic for all . For , we immediately obtain Theorem 4.3 because it can be easily shown that, in this case, with probability approaching one. When , weak convergence follows after some additional algebraic manipulations.

To show the weak convergence of Inline graphic , we need to express as the sum of a Lindeberg–Feller CLT term and an term. Our first step is to express in terms of , and as follows:

(4.9)

This representation is useful as it shows that the random part of Inline graphic depends only on , and . The second step is to show that the right-hand side of (4.9) can be written as an average of independent, centred random variables that allows us to apply the Lindeberg–Feller CLT.

5. Connection to related literature

The study of asymptotic inference in the context of SCCA naturally connects to the popular research direction of de-biased/de-sparsified inference in high-dimensional models [4, 12, 32–35, 50–52, 61, 71]. This line of research, starting essentially from the seminal work of [71], more or less follows the general prescription laid out in Section 3.1. Similar to our case, these methods also often depend on potentially high-dimensional parameters, thereby requiring sufficiently well-behaved initial estimators. For example, asymptotically valid confidence interval for the coordinates of a sparse linear regression vector relies critically on good initial estimators of the regression vector and nuisance parameter in the form of the precision matrix of the covariates [35, 61, 71]. The construction of a suitable estimating equation is, however, somewhat case-specific and can be involved, depending on the nature of the high-dimensional nuisance parameters. Since SCCA involves several high-dimensional nuisance parameters, including the covariance matrices Inline graphic and , special attention is required in deriving inferential procedures.

Among the methods mentioned above, our approach bears the greatest resemblance to the method recently espoused by [32] in the context of sparse principal component analysis. However, due to the presence of high-dimensional nuisance parameters Inline graphic and , the CCA problem is, in general, more challenging than that of PCA [24, 25]. Although the main idea of de-biasing stays the same, our method crucially differs from [32] in two key steps: (a) development of a suitable objective function and (b) construction of preliminary estimators. We elaborate on these two steps below.

(a) Section 3.1 highlights the significance of selecting an appropriate objective function Inline graphic for any de-biasing technique. The objective function used in [32] is based on the well-established fact that the first principal component extraction problem can be expressed as an unconstrained Frobenius norm minimization problem. However, no such analogous representation was previously available in the CCA literature. Therefore, we had to construct such an objective function, described in Lemma 3.1, which serves as the basis for our de-biasing process.

(b) Similar to our approach, [32] applies the de-biasing process to preliminary estimators. However, they solve a penalized version of the non-convex PCA optimization problem, where the search space is limited to a small neighbourhood of a consistent estimator of the first principal component. Any stationary point of their resulting optimization program consistently estimates the first principal component, avoiding the need to find global minima, but the program is non-convex. In contrast, our SCCA method, inspired by [25], obtains preliminary estimators by solving a convex program instead.

Compared with [32]’s PCA case or the linear regression case [35, 61], the presence of high-dimensional nuisance parameters in CCA makes proving the Inline graphic -consistency of the debiased estimators more technically involved. In particular, to prove the -consistency of the de-biased estimator of , we needed to introduce sparsity assumptions on the nuisance matrices and .

Another important direction of high-dimensional CCA concerns the proportional regime, i.e. when Inline graphic and converge to a constant [3, 47, 68]. In [3], the authors analyse the behaviour of the sample canonical correlation coefficient for Gaussian data. They demonstrate that there exists a threshold value , such that if the population canonical correlation is greater than , then there exists Inline graphic so that converges weakly to a centred normal distribution whose variance depends on , and . On the other hand, if , then exhibits Tracy–Widom limits and concentrates at the right edge of the Tracy–Widom Law, specifically at . Subsequent works such as [47] and [68] have relaxed the normality assumptions of the data to fourth- or eighth-moment conditions. However, these studies focus on inferring Inline graphic , whereas our paper deals with the inference on coordinates of and . Moreover, our analyses are motivated by regime in contrast to the proportional asymptotic regimes considered in [3, 47, 68]. Our numerical experiments show that vanilla CCA is well-suited for scenarios with moderate dimensions and high signal-to-noise ratio, but may struggle when dealing with high dimensions without additional regularity assumptions. In contrast, the proposed de-biased CCA method is specifically designed to handle high-dimensional data under sparsity assumptions.

6. On the conditions and assumptions of section 4

In this section, we provide a detailed discussion on assumptions made for the sake of theoretical developments in Section 4.1.

Discussion on Condition 4.1. First, some remarks are in order regarding the range of Inline graphic in Condition 4.1. Theorem 3.2 of [25] implies that it is impossible for to be strictly less than since the minimax rate of the error is roughly under Assumption 2.1 and Assumption 2.2. If is larger, i.e. and have slower rates of convergence, we pay a price in terms of the sparsity restriction Inline graphic in Assumption 4.1. Supplement B shows that estimators satisfying Condition 4.1 with exist. In fact, most SCCA estimators with theoretical guarantees have an error guarantee of with . The interested reader can refer to [19, 24, 25] and references therein. In view of the above, we let Inline graphic .

In light of Condition 4.1, Inline graphic and with a faster rate of convergence, i.e. , is preferable. COLAR and [19]’s estimator attain this minimax rate when . We do not yet know if there are SCCA estimators which attain the minimax rate for while only estimating the first canonical direction. For , the estimation problem becomes substantially harder because the remaining Inline graphic canonical directions start acting as high-dimensional nuisance parameters. A trade-off between computational and estimation efficiency likely arises in the presence of these additional nuisance parameters. In particular, it is plausible that the minimax rate of may not be achievable by polynomial-time algorithms in this case. To gather intuition about this, it is instructive to look at the literature on estimating the first principal component direction in high dimensions under sparsity. In this case, to the best of our knowledge, polynomial-time algorithms attain the minimax rate only in the single spike model or a slightly relaxed version of the latter. We refer the interested reader to [64] for more details. The algorithms that do succeed in estimating the first principal component under multiple spikes at the desired minimax rate attempt to solve the underlying non-convex problem, and hence are not immediately clear to be polynomial-time [32, 46, 70]. In this case, [70] and [46]’s methods essentially reduce to power methods that induce sparsity by iterative thresholding. [19]’s method tries to borrow this idea in the context of SCCA in the rank one case; see Remark B.2 for a discussion on the problems that their method may face in the presence of nuisance canonical directions.

Finally for the inferential question, it is natural to consider an extension of ideas from sparse PCA as developed in [32]. When translated to SCCA, their approach will aim to solve

(6.1)

where Inline graphic is a constant, and

We conjecture that for a suitably chosen Inline graphic , the resulting estimators will satisfy Condition 4.1 with . However, (6.1) is non-convex and solving (6.1) is computationally challenging for large and . Analogous to [32], one can simplify the problem by searching for any stationary point of (6.1) over a smaller feasible set, namely a small neighbourhood of a consistent preliminary estimator of Inline graphic and . However, while this first stage does guarantee good initialization, the underlying optimization problem still remains non-convex. Since the aim of the paper is efficient inference of and with theoretically guaranteed computational efficiency, we remain with the modified COLAR estimators and refrain from exploring the route mentioned above.

Discussion on Assumption 4.1: It is natural to wonder whether the condition Inline graphic is at all necessary, especially since it is much stricter than , which is sufficient for the consistency of and presented in Theorem B.1 of Supplement B. However, current literature on inference in high-dimensional sparse models bears evidence that the restriction might be unavoidable. In fact, this sparsity requirement is a staple in most de-biasing approaches whose preliminary estimators are minimax optimal, including sparse PCA [32] and sparse generalized linear models [35, 61]. Indeed, in case of sparse linear regression, [12] shows that this sparsity is necessary for adaptive inference. We believe similar results hold for our case as well. However, further inquiry in that direction is beyond the scope of this paper.

It is also natural to ask why Assumption 4.1 involves sparsity restrictions not only on Inline graphic and , but also on the other columns of and . This restriction stems from the initial estimation procedure of and . Although we estimate only the first pair of canonical directions, the remaining canonical directions act as nuisance parameters. Thus, to efficiently estimate and , we need to separate the other covariates from Inline graphic and . Therefore, we need to estimate the other covariates’ effect efficiently enough. Consequently, we require some regularity assumptions on these nuisance parameters as precisely quantified by Assumption 4.1.

Discussion on Condition 4.2: This is a standard assumption in de-biasing literature with similar assumptions appearing in the sparse PCA [32] and sparse generalized linear models literature [61]—both of whom use the nodewise lasso algorithm to construct Inline graphic . We remark in passing that [35]’s construction of the de-biased lasso does not require the analog of , which is the precision matrix estimator in their case, to satisfy any condition like Condition 4.2. Instead, it requires the ’s to be small. It is unknown whether such constructions work in the more complicated scenario of CCA or PCA.

7. Numerical experiments

7.1 Preliminaries

In this section, we explore aspects of finite sample behaviour of the methods discussed in earlier sections. Further numerical experiments are collected in Supplement A where we compare the bias of our method with popular SCCA alternatives. We start with some preliminary discussions on the choice for the set-up, initial estimators and tuning parameters.

Set-up: The set-up, under which we will conduct our comparisons, can be described by specifying the nuisance parameters (marginal covariance matrices of Inline graphic and ) along with the strength (), sparsity, rank and the joint distribution of . For the marginal covariance matrices of and , motivated by previously studied cases in the literature [25, 49] we shall consider two cases as follows:

Identity. This will correspond to the case where
Sparse-inverse.This will correspond to the case where is the correlation matrix obtained from , where , and is a sparse matrix with the form

Analogous to [49] and [25], we shall also take Inline graphic to be a rank one matrix, where we consider the canonical vectors and with sparsity as follows:

The canonical correlation Inline graphic depicts the signal strength in our set-up. We will explore three different values for : 0.2, 0.5 and 0.9, which will be referred to as the small, medium and high signal strength settings, respectively. The joint distribution of is finally taken to be Gaussian with mean . Also, throughout we set the Inline graphic combination to be , and , which correspond to being small, moderate and moderately high, respectively. Finally, we will always consider Monte Carlo samples.

Initial Estimators and Tuning Parameters: We construct the preliminary estimators using the modified COLAR algorithm (see Algorithm 1). For the rank one case, the latter coincides with [25]’s COLAR estimator. Recall that throughout we set the Inline graphic combination to be , and . One of the reasons we do not accommodate higher and is that the COLAR algorithm does not scale well with and (This was also noticed by 49]. Also, we do not consider smaller values of since it is expected that de-biasing procedures generally require to be at least moderately large [32, cf.].

In our proposed methods, tuning parameters arise from two sources: (a) estimation of the preliminary estimators and (b) precision matrix estimation. To implement the modified COLAR algorithm, we mostly follow the code for COLAR provided by the authors [25]. The COLAR penalty parameters, Inline graphic and , were left as specified in the COLAR code, namely and . The tolerance level was fixed at with a fixed maximum of 200 iterations for the first step of the COLAR algorithm. Next, consider the tuning strategy for the nodewise lasso algorithm (Algorithm 2), which involves the lasso penalty parameter Inline graphic and the parameter (). Theorem C.1 proposes the choice for all . In our simulations, the parameter is empirically determined to minimize . For the settings and , this parameter is set at and for the identity and sparse inverse cases, respectively. For the moderately high setting, this parameter is set at Inline graphic . The nodewise lasso parameter is taken to be , which is in line with [32], who recommends taking .

Targets of Inference: We present our results for the first and the 20th element of Inline graphic . The former stands for a typical non-zero element, where the latter represents a typical zero element. For each element, we compute confidence intervals for , and test the null . For the latter, we use a -squared test based on the asymptotic null distribution of given in part two of Corollary 1. As mentioned earlier, this test is equivalent to testing Inline graphic . The construction of the confidence intervals, which we discuss next, is a little more subtle.

We construct two types of confidence interval. For any Inline graphic , the first confidence interval, which will be referred as the ordinary interval from now on, is given by

(7.1)

Here, Inline graphic is the quantile of the standard Gaussian distribution. Corollary 1 shows that the asymptotic coverage of the above confidence interval is when . For , however, the above confidence interval can have asymptotic coverage higher than . To see why, note that by Corollary 1 in this case. Since both the length and the centre of the ordinary interval depend on Inline graphic , the coverage can suffer greatly if underestimates . Therefore, we construct another confidence interval by relaxing the length of the ordinary intervals. This second interval, to be referred to as the conservative interval from now on, is obtained by simply substituting the in the standard deviation term Inline graphic in (7.1) by . Clearly, the conservative interval can have potentially higher coverage than , which motivates our nomenclature.

Comparator: We compare our de-biasing procedure with the vanilla CCA analysis. In this case, we can do inference directly on Inline graphic ’s and we do not need to scale them by . To derive the confidence intervals of the ’s using the vanilla canonical CCA, we employed the formula for the finite case with provided by [1]. As usual, we use 1000 Monte Carlo samples. The formula depends on , , and . We estimated and using the vanilla CCA estimators as presented in [1]. However, we observed that the vanilla CCA estimator for Inline graphic performed poorly as the signal strength increased, which is unsurprising because this estimator is known to be inconsistent in the high-dimensional case [3, 47, 68]. Therefore, we utilized the estimator proposed by [3] for estimating . It’s worth noting that [3]’s estimator is known to be consistent in the asymptotic regime [3, 68]. Since vanilla CCA does not run when Inline graphic , we could produce the aforementioned confidence intervals only in the setting. We collect the numerical results in Fig. 1.

Fig. 1. — Confidence intervals obtained by the vanilla (ordinary) CCA for the first 100 replications. The rejection probability and the coverage are calculated based on all 1000 replications. Here, , and . The left column corresponds to , which is non-zero, and the right column corresponds to , which is zero. The top row corresponds to and being the identity covariance matrix, and the bottom corresponds to them being the sparse inverse matrix.

7.2 Results

We divide the presentation of our results on coordinates with and without signal, followed by discussions about issues regarding distinctions between asymptotic and finite sample considerations of our method.

Inference when there is no signal: If Inline graphic , both confidence intervals (CI) exhibit high coverage, often exceeding , across all settings; see Figures M.1 and M.2 in Supplement M.1. This is unsurprising in view of the discussion in the previous paragraph. Also, Fig. 1 implies that in the no signal case, the performance of the de-biased CCA confidence intervals are significantly better than those based on vanilla CCA. The conservative confidence intervals have substantially larger lengths, which is understandable because the ratio between the ordinary and the conservative CI length is Inline graphic in this case. Also, the length of the confidence intervals generally decreases as the signal strength increases, as expected. The rejection frequency of the tests (the type I error in this scenario) generally stays below , especially at medium to high signal strength.

Inference when there is signal: When Inline graphic , the ordinary intervals exhibit poor coverage at the low and medium signal strength regardless of the underlying covariance matrix structure, although the performance seems worse for sparse inverse matrices. Figure 2 entails that this underperformance is due to the underestimation of small signals Inline graphic , which is tied to the high negative bias of the preliminary estimator in these cases; see the histograms in Fig. M.3 in Supplement M.1. This issue will be discussed in more detail in Supplement A. Figure 2 also implies that if is small, the confidence intervals crowd near the origin. Also, at the high signal strength, the coverage of the ordinary intervals fails to reach the desired Inline graphic level. Figure 1 demonstrates that when the signal-to-noise ratio is high, vanilla CCA achieves comparable performance with our method. However, as the signal strength becomes moderate (), vanilla CCA’s performance is outperformed by de-biased CCA, and in the low signal-to-noise ratio regime, vanilla CCA performs equally poorly as de-biased CCA. These results highlight the limitations of vanilla CCA in scenarios with moderate to low signal strength.

Fig. 2. — Ordinary confidence intervals for .

The relaxation of the ordinary confidence interval length, which leads to the conservative intervals, substantially improves the coverage, with the improvement being dramatic at a low signal. In the latter case, the conservative intervals enjoy high coverage, which is well over Inline graphic for moderate or higher . In this case, the relaxation generally results in a four-fold or higher increase in the confidence interval length. As signal strength increases, the increase in the confidence interval length gets smaller, and consequently, the increase in the coverage slows down. This is unsurprising, noting the ratio between the length of the conservative and the ordinary interval is proportional to Inline graphic . One should be cautious with the relaxation, however, because it may lead to the inclusion of not only the true signal, as desired, but also zero. This can be clearly seen in the medium signal strength case of the sparse inverse matrix; compare the middle column of Fig. 2(b) with that of Fig. 3(b). The inclusion of origin does not bring any advantage for the relaxed intervals in the no-signal case either, because, as discussed earlier, in the latter case, the ordinary intervals are themselves efficient, with the relaxed versions hardly making any improvements.

Fig. 3. — Conservative confidence intervals for .

Discussion on Asymptotics: The performance of the confidence intervals improve if Inline graphic increase. See, for example, the illustration in Fig. M.4 in Supplement M.1 where the triplet has been doubled. Interestingly, the asymptotics successfully kick in for the corresponding tests as soon as the signal strength reaches the medium level. The test attains power higher than at the medium signal strength, and the perfect power of one at high signal strength. This phenomenon is the result of the super-efficiency of the de-biased estimator at Inline graphic , as expounded upon by Corollary 1. Since the test exploits the knowledge of this faster convergence under the null hypothesis, it has better precision than the confidence interval, which is oblivious to this fact. The test may get rejected in many situations, but the confidence intervals, even the ordinary one, may include zero. During implementation, if one faces such a situation, they should conclude that either the signal strength is too small or the sample size is insufficient for the confidence intervals to be very precise.

Discussions on Performance of De-biased SCCA: We conclude that since the de-biased estimators work on sparse estimators which are super efficient at zero, the inference does not face any obstacle if the true signal Inline graphic . In the presence of a signal, the tests are generally reliable if the signal strength is at least moderate. In contrast, the ordinary confidence intervals, which are blindly based on Corollary 1, struggle whenever the initial COLAR estimators incur a bias too large for the de-biasing step to overcome. This is generally observed at low to medium signal strength. The conservative intervals can solve this problem partially at the cost of increased length. At present, the Inline graphic and guarantees as required by Condition 4.1 are only available for COLAR type estimators. The performance of the ordinary confidence intervals may improve if one can construct an SCCA preliminary estimator with similar strong theoretical guarantees but better empirical performance in picking up small signals. Searching for a different SCCA preliminary estimator is important for another reason—COLAR is not scalable to ultra-high dimensions. This problem occurs because COLAR relies on semidefinite programming, whose scalability issues are well noted [22].

8. Real data application

Physiological functions in human bodies are controlled by complex pathways, whose deregulation can lead to various diseases. Therefore, it is important to understand the interaction between different factors participating in these biological pathways, such as proteins, genes, etc. We consider two important pathways: (a) Cytokine-cytokine receptor interaction pathway and (b) Adipocytokine signaling pathway. Cytokines are released in response to inflammation in the body, and pathway (a) is thus related to viral infection, cell growth, differentiation, and cancer progression [40]. Pathway (b) is involved in fat metabolism and insulin resistance, thus playing a vital role in diabetes [54]. We wish to study the linear association between the genes and proteins involved in these pathways. To that end, we use the microarray and proteomic datasets analysed by [41], which are originally from the National Cancer Institute, and available at http://discover.nci.nih.gov/cellminer/.

The dataset contains 60 human cancer cell lines. We use Inline graphic of the sixty observations because one has missing microarray information. Although the microarray data have information on many genes, we considered only those involved in pathways (a) and (b), giving and miRNAs, respectively. To this end, we use genscript.com (https://www.genscript.com/) to get the list of genes involved in these pathways. The dataset contains Inline graphic proteins. We center and scale all variables prior to our analysis. If one gene or protein had more than one measurement, we averaged across all the measurements.

It is well known that certain biological pathways involve only a small number of interacting genes and proteins, which justifies the possibility of Inline graphic and being low-dimensional [41]. Additionally, Fig. M.7 in Supplement M.2 shows that the majority of genes and proteins exhibit negligible correlation, further supporting this biological fact. On the other hand, Fig. M.8 in Supplement M.2 hints at the existence of low dimensional structures in the variance matrices of both the genes and the proteins, which is required for the consistency of our nodewise lasso estimator. However, it seems unlikely that they are totally uncorrelated among themselves, which questions the applicability of popular methods only suited for diagonal variance matrices, e.g. PMA [66].

Apart from the de-biased estimators, we also look into the SCCA estimates of the leading canonical covariates using [49], [25], [66] and [65]’s methods. The first three methods were implemented as discussed in Supplement A. To apply [65]’s methods, we used the code provided by the authors with the default choice of tuning parameters. Among these methods, only [66]’s method requires Inline graphic and to be diagonal. For these methods, we say a gene or protein is ‘detected’ if the corresponding loading, i.e. the estimated or , is non-zero.

We construct confidence intervals, both ordinary and conservative, and test the null hypothesis that Inline graphic or for each and , as discussed in Section 7. We apply the false discovery rate corrections of Benjamini and Hochberg (BH) as well as Benjamini and Yekutieli (BY), the latter of which does not assume independent p-values. Table 1 tabulates the number of detections by the above-mentioned methods. Even after false discovery rate adjustment, most discoveries seem to include zero in the confidence intervals. We discussed this situation in Section 7, where it was indicated that the former can occur if the signal strength is small or the sample size is insufficient. To be conservative, we consider only those genes and proteins whose ordinary interval excludes zero. These discoveries are reported in Tables 2 and 3 along with the confidence intervals. The pictorial representation of the confidence intervals can be found in Fig. M.9 and Fig. M.10 in Supplement M.2.

Table 1.

Number of detections: number of non-zero loadings in different SCCA estimators and number of detections by our tests (DB) after BH and BY false discovery rate correction. For the SCCA estimators, size of their intersection with DB+BY are given in parentheses

Variable	Mai and Zhang	Wilms and Croux	Gao et al.	Witten et al.	DB+BH	DB+BY
Pathway (a)
Genes	2 (2)	1 (1)	3 (3)	41 (5)	13	6
Proteins	4 (3)	1 (1)	7 (5)	13 (5)	36	22
Pathway (b)
Genes	2 (1)	1 (1)	4 (3)	11 (2)	8	5
Proteins	7 (1)	1 (1)	9 (1)	12 (1)	22	2

Open in a new tab

Table 2.

Discovered genes and protein from pathway (a). The confidence intervals are obtained using the methods described in Section 7. The P-values are the original P-values before false discovery rate correction. All genes and proteins were also detected by Benjamini and Yekutieli method

Gene	P-value	95% CI	Relaxed CI	Discovered by
CLCF1	2.0E-07	(0.055, 0.39)	(0, 0.58)	Witten et al.
EGFR	8.8E-09	(0.11, 0.58)	(0,0.74)	Mai and Zhang, Witten et al.,
				Gao et al.
LIF	1.6E-05	(0.022, 0.45)	(0, 0.68)	Witten et al., Gao et al.
PDGFC	1.4E-07	(0.094, 0.64)	(0, 0.82)	Witten et al.
TNFRSF12A	7.8E-11	(0.15, 0.60)	(0.01, 0.75)	Mai and Zhang, Witten et al.,
				Gao et al., Wilms and Croux
Protein	P-value	95% CI	Relaxed CI	Discovered by
ANXA2	1.3E-15	(0.13, 0.38)	(0.01, 0.51)	Mai and Zhang, Witten et al.,
				Gao et al., Wilms and Croux
CDH2	5.1E-09	(0.22, 1.1)	(0.12, 1.23)	Mai and Zhang, Witten et al.,
				Gao et al.
FN1	4.2E-07	(0.96, 7.6)	(0.96, 7.6)	none
GTF2B	6.7E-05	(0.034, 4.0)	(0.034, 4.0)	none
KRT20	1.2E-05	(0.015, 0.27)	(0, 0.48)	none
MVP	2.6E-05	(0.021, 0.59)	(0, 0.82)	Witten et al.

Open in a new tab

Table 3.

Discovered genes and protein from pathway (b). The confidence intervals are obtained using the methods described in Section 7. The P-values are the original P-values before false discovery rate correction. The displayed genes and proteins were also detected by Benjamini and Yekutieli method

Gene	P-value	95% CI	Relaxed CI	Discovered by
ACSL5	2.9E-05	(0.014, 0.45)	(0, 0.68)	none
RXRG	4.1E-10	(0.073, 0.32)	(0, 0.47)	Wilms and Croux, Gao et al.,
				Mai and Zhang
TNFRSF1B	1.1E-09	(0.49, 2.2)	(0.49, 2.2)	none
Protein	P-value	95% CI	Relaxed CI	Discovered by
ANXA2	2.7E-74	(1.1, 1.7)	(1.1, 1.7)	none

Open in a new tab

Using the Gene Ontology toolkit available at http://geneontology.org/, we observe that our discovered genes from pathway (a) are mainly involved in biological processes like positive regulation of gliogenesis and molecular function like growth factor activity, where the selected proteins play a role in regulating membrane assembly, enzyme function, and other cellular functions. The Gene Ontology toolkit also suggests that the discovered genes from pathway (b) are involved in positive regulation of cellular processes and molecular function like growth factor activity. The only discovered protein in pathway (b) is ANXA2, which, according to UNIPORT at https://www.uniport.org, is a membrane-binding protein involved in RNA binding and host-virus infection.

Supplementary Material

Supplement_iaad040

Click here for additional data file.^{(74.2KB, zip)}

Contributor Information

Nilanjana Laha, Department of Statistics, Texas A&M, College Station, TX 77843, USA.

Nathan Huey, Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.

Brent Coull, Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.

Rajarshi Mukherjee, Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.

Funding

The research of Rajarshi Mukherjee is partially supported by National Institutes of Health grant P42ES030990. The work of Nilanjana Laha is partially supported by National Science Foundation grant DMS 2311098. The research of Brent Coull is supported by National Institute of Health Grants ES000002 and ES030990.

Data availability statement

No new data were generated or analysed in support of this review.

References

1. Anderson, T. (2003) An Introduction to Multivariate Statistical Analysis Wiley Series in Probability and Statistics.Wiley. [Google Scholar]
2. Bai, Z. D. (2008) Methodologies in spectral analysis of large dimensional random matrices, a review. Advances in statistics. World Scientific, pp. 174–240. [Google Scholar]
3. Bao, Z., Hu, J., Pan, G. & Zhou, W. (2019) Canonical correlation coefficients of high-dimensional gaussian vectors: finite rank case. Ann. Stat., 47, 612–640. [Google Scholar]
4. Bellec, P. C. & Zhang, C.-H. (2019) De-biasing the lasso with degrees-of-freedom adjustment. arXiv preprint arXiv:1902.08885.
5. Bickel, P. J. & Levina, E. (2008) Covariance regularization by thresholding. Ann. Stat., 36, 2577–2604. [Google Scholar]
6. Boyd, S., Boyd, S. P. & Vandenberghe, L. (2004) Convex optimization. Cambridge university press. [Google Scholar]
7. Bühlmann, P. & Van De Geer, S. (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media. [Google Scholar]
8. Cai, T. & Liu, W. (2011) Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc., 106, 672–684. [Google Scholar]
9. Cai, T. T. & Zhou, H. H. (2012a) Minimax estimation of large covariance matrices under -norm. Stat. Sin., 1319–1349. [Google Scholar]
10. Cai, T. T. & Zhou, H. H. (2012b) Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat., 40, 2389–2420. [Google Scholar]
11. Cai, T. T., Ren, Z. & Zhou, H. H. (2013) Optimal rates of convergence for estimating toeplitz covariance matrices. Probab. Theory Relat. Fields, 156, 101–143. [Google Scholar]
12. Cai, T. T., Guo, Z., et al. (2017) Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity. Ann. Stat., 45, 615–626. [Google Scholar]
13. Cai, T. T., Zhang, A., et al. (2018) Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Stat., 46, 60–89. [Google Scholar]
14. Carpentier, A. & Verzelen, N. (2019) Adaptive estimation of the sparsity in the gaussian vector model. Ann. Stat., 47, 93–126. [Google Scholar]
15. Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R. & Wang, Q. (2010) Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices. Bioinformatics, 26, 2680–2688. [Google Scholar]
16. Chatterjee, A. & Lahiri, S. (2010) Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc., 138, 4497–4509. [Google Scholar]
17. Chatterjee, A. & Lahiri, S. N. (2011) Bootstrapping lasso estimators. J. Am. Stat. Assoc., 106, 608–625. [Google Scholar]
18. Chatterjee, A. & Lahiri, S. N. (2013) Rates of convergence of the adaptive lasso estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Stat., 41, 1232–1259. [Google Scholar]
19. Chen, M., Gao, C., Ren, Z. & Zhou, H. H. (2013) Sparse cca via precision adjusted iterative thresholding. arXiv preprint arXiv:1311.6186.
20. Chen, X., Han, L. & Carbonell, J. (2012) Structured sparse canonical correlation analysis. Artificial intelligence and statistics. PMLR, pp. 199–207. [Google Scholar]
21. Chernozhukov, V., Chetverikov, D. & Kato, K. (2017) Central limit theorems and bootstrap in high dimensions. Ann. Probab., 45, 2309–2352. [Google Scholar]
22. Dey, S. S., Mazumder, R. & Wang, G. (2018) A convex integer programming approach for optimal sparse pca. arXiv preprint arXiv:1810.09062.
23. Eckart, C. & Young, G. (1936) The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218. [Google Scholar]
24. Gao, C., Ma, Z., Ren, Z. & Zhou, H. H. (2015) Minimax estimation in sparse canonical correlation analysis. Ann. Stat., 43, 2168–2197. [Google Scholar]
25. Gao, C., Ma, Z. & Zhou, H. H. (2017) Sparse cca: adaptive estimation and computational barriers. Ann. Stat., 45, 2074–2101. [Google Scholar]
26. Holm, K., Hegardt, C., Staaf, J., Vallon-Christersson, J., Jönsson, G., Olsson, H., Borg, Å. & Ringnér, M. (2010) Molecular subtypes of breast cancer are associated with characteristic dna methylation patterns. Breast Cancer Res., 12, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Horn, R. A. & Johnson, C. R. (2012) Matrix analysis. Cambridge university press. [Google Scholar]
28. Hotelling, H. (1992) Relations between two sets of variates. Breakthroughs in statistics. Springer, pp. 162–190. [Google Scholar]
29. Hu, W., Lin, D., Calhoun, V. D., and Wang, Y.-P. (2016). Integration of snps-fmri-methylation data with sparse multi-cca for schizophrenia study. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3310–3313. IEEE. [DOI] [PubMed] [Google Scholar]
30. Hu, W., Lin, D., Cao, S., Liu, J., Chen, J., Calhoun, V. D. & Wang, Y.-P. (2017) Adaptive sparse multiple canonical correlation analysis with application to imaging (epi) genomics study of schizophrenia. IEEE Trans. Biomed. Eng., 65, 1–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Huey, N. and Laha, N. (2021). R package for computing de-biased estimators. de.bias.cca. https://github.com/nilanjanalaha/de.bias.CCA.
32. Janková, J. & van de Geer, S. (2021) De-biased sparse pca: inference for eigenstructure of large covariance matrices. IEEE Trans. Inf. Theory, 67, 2507–2527. [Google Scholar]
33. Janková, J. & Van De Geer, S. (2016) Confidence regions for high-dimensional generalized linear models under sparsity. arXiv preprint arXiv:1610.01353.
34. Janková, J. & van de Geer, S. (2017) Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test, 26, 143–162. [Google Scholar]
35. Javanmard, A. & Montanari, A. (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res., 15, 2869–2909. [Google Scholar]
36. Johnstone, I. M. & Lu, A. Y. (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc., 104, 682–693. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Kang, M., Zhang, B., Wu, X., Liu, C., and Gao, J. (2013). Sparse generalized canonical correlation analysis for biological model integration: a genetic study of psychiatric disorders. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1490–1493. IEEE. [DOI] [PubMed] [Google Scholar]
38. Kuchibhotla, A. K. & Chakrabortty, A. (2022) Moving beyond sub-gaussianity in high-dimensional statistics: applications in covariance estimation and linear regression. Inf. Inference, 11, 1389–1456. [Google Scholar]
39. Laha, N. & Mukherjee, R. (2022) On support recovery with sparse cca: Information theoretic and computational limits. IEEE Transactions on Information Theory, 69, 1695–1738. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Lee, M. & Rhee, I. (2017) Cytokine signaling in tumor progression. Immune Netw., 17, 214–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Lee, W., Lee, D., Lee, Y. & Pawitan, Y. (2011) Sparse canonical covariance analysis for high-throughput data. Stat. Appl. Genet. Mol. Biol., 10. [Google Scholar]
42. Leeb, H. & Pötscher, B. M. (2005) Model selection and inference: facts and fiction. Econometric Theory, 21, 21–59. [Google Scholar]
43. Leeb, H. & Pötscher, B. M. (2006) Can one estimate the conditional distribution of post-model-selection estimators? Ann. Stat., 34, 2554–2591. [Google Scholar]
44. Leeb, H. & Pötscher, B. M. (2008) Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics, 142, 201–211. [Google Scholar]
45. Lin, D., Zhang, J., Li, J., Calhoun, V. D., Deng, H.-W. & Wang, Y.-P. (2013) Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics, 14, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Ma, Z., et al. (2013) Sparse principal component analysis and iterative thresholding. Ann. Stat., 41, 772–801. [Google Scholar]
47. Ma, Z. & Yang, F. (2021) Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations. arXiv preprint arXiv:2102.03297.
48. Ma, Z. & Li, X. (2020) Subspace perspective on canonical correlation analysis: dimension reduction and minimax rates. Bernoulli, 26, 432–470. [Google Scholar]
49. Mai, Q. & Zhang, X. (2019) An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics, 75, 734–744. [DOI] [PubMed] [Google Scholar]
50. Mitra, R. & Zhang, C.-H. (2016) The benefit of group sparsity in group inference with de-biased scaled group lasso. Electron. J. Stat., 10, 1829–1873. [Google Scholar]
51. Neykov, M., Ning, Y., Liu, J. S. & Liu, H. (2018) A unified theory of confidence regions and testing for high-dimensional estimating equations. Stat. Sci., 33, 427–443. [Google Scholar]
52. Ning, Y. & Liu, H. (2017) A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat., 45, 158–195. [Google Scholar]
53. Peng, Y., Gao, F. & Zhang, J. (2015) Sparse inverse covariance matrix estimation for functional connectivity analysis of fmri data. Brain Connect., 5, 607–621. [Google Scholar]
54. Pittas, A. G., Joseph, N. A. & Greenberg, A. S. (2004) Adipocytokines and insulin resistance. J. Clin. Endocrinol. Metab., 89, 447–452. [DOI] [PubMed] [Google Scholar]
55. Pötscher, B. M. & Leeb, H. (2009) On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding. J. Multivariate Anal., 100, 2065–2082. [Google Scholar]
56. Ravikumar, P., Wainwright, M. J. & Lafferty, J. D. (2008) Covariance estimation in high-dimensional sparse gaussian graphical models. J. Mach. Learn. Res., 9, 2579–2607. [Google Scholar]
57. Sofer, T., Maity, A., Coull, B., Baccarelli, A. A., Schwartz, J. & Lin, X. (2012) Multivariate gene selection and testing in studying the exposure effects on a gene set. Stat. Biosci., 4, 319–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
58. Sun, T. & Zhang, C.-H. (2012) Banded and sparse covariance matrices: estimation and applications. J. Multivariate Anal., 110, 149–165. [Google Scholar]
59. Trench, W. F. (1999) Asymptotic distribution of the spectra of a class of generalized kac–Murdock–szegö matrices. Linear Algebra Appl., 294, 181–192. [Google Scholar]
60. Tsiatis, A. (2007) Semiparametric theory and missing data. Springer Science & Business Media. [Google Scholar]
61. van de Geer, S., Bühlmann, P., Ritov, Y. & Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat., 42, 1166–1202. [Google Scholar]
62. Vershynin, R. (2010) Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.
63. Vershynin, R. (2018) High-dimensional probability: An introduction with applications in data science, vol. 47. Cambridge university press. [Google Scholar]
64. Wang, T., Berthet, Q. & Samworth, R. J. (2016) Statistical and computational trade-offs in estimation of sparse principal components. Ann. Stat., 44, 1896–1930. [Google Scholar]
65. Wilms, I. & Croux, C. (2015) Sparse canonical correlation analysis from a predictive point of view. Biom. J., 57, 834–851. [DOI] [PubMed] [Google Scholar]
66. Witten, D. M., Tibshirani, R. & Hastie, T. (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515–534. [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Wu, Y. & Liu, Y. (2014) Sparse modeling and estimation for high-dimensional genetic data. Cambridge University Press. [Google Scholar]
68. Yang, F. (2022) Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors. Electron. J. Probab., 27, 1–71. [Google Scholar]
69. Yuan, M. (2010) High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res., 11, 2261–2286. [Google Scholar]
70. Yuan, X.-T. & Zhang, T. (2013) Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res., 14, 899–925. [Google Scholar]
71. Zhang, C.-H. & Zhang, S. S. (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. B, 76, 217–242. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement_iaad040

Click here for additional data file.^{(74.2KB, zip)}

Data Availability Statement

No new data were generated or analysed in support of this review.

[ref1] 1. Anderson, T. (2003) An Introduction to Multivariate Statistical Analysis Wiley Series in Probability and Statistics.Wiley. [Google Scholar]

[ref2] 2. Bai, Z. D. (2008) Methodologies in spectral analysis of large dimensional random matrices, a review. Advances in statistics. World Scientific, pp. 174–240. [Google Scholar]

[ref3] 3. Bao, Z., Hu, J., Pan, G. & Zhou, W. (2019) Canonical correlation coefficients of high-dimensional gaussian vectors: finite rank case. Ann. Stat., 47, 612–640. [Google Scholar]

[ref4] 4. Bellec, P. C. & Zhang, C.-H. (2019) De-biasing the lasso with degrees-of-freedom adjustment. arXiv preprint arXiv:1902.08885.

[ref5] 5. Bickel, P. J. & Levina, E. (2008) Covariance regularization by thresholding. Ann. Stat., 36, 2577–2604. [Google Scholar]

[ref6] 6. Boyd, S., Boyd, S. P. & Vandenberghe, L. (2004) Convex optimization. Cambridge university press. [Google Scholar]

[ref7] 7. Bühlmann, P. & Van De Geer, S. (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media. [Google Scholar]

[ref8] 8. Cai, T. & Liu, W. (2011) Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc., 106, 672–684. [Google Scholar]

[ref9] 9. Cai, T. T. & Zhou, H. H. (2012a) Minimax estimation of large covariance matrices under -norm. Stat. Sin., 1319–1349. [Google Scholar]

[ref10] 10. Cai, T. T. & Zhou, H. H. (2012b) Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat., 40, 2389–2420. [Google Scholar]

[ref11] 11. Cai, T. T., Ren, Z. & Zhou, H. H. (2013) Optimal rates of convergence for estimating toeplitz covariance matrices. Probab. Theory Relat. Fields, 156, 101–143. [Google Scholar]

[ref12] 12. Cai, T. T., Guo, Z., et al. (2017) Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity. Ann. Stat., 45, 615–626. [Google Scholar]

[ref13] 13. Cai, T. T., Zhang, A., et al. (2018) Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Stat., 46, 60–89. [Google Scholar]

[ref14] 14. Carpentier, A. & Verzelen, N. (2019) Adaptive estimation of the sparsity in the gaussian vector model. Ann. Stat., 47, 93–126. [Google Scholar]

[ref15] 15. Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R. & Wang, Q. (2010) Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices. Bioinformatics, 26, 2680–2688. [Google Scholar]

[ref16] 16. Chatterjee, A. & Lahiri, S. (2010) Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc., 138, 4497–4509. [Google Scholar]

[ref17] 17. Chatterjee, A. & Lahiri, S. N. (2011) Bootstrapping lasso estimators. J. Am. Stat. Assoc., 106, 608–625. [Google Scholar]

[ref18] 18. Chatterjee, A. & Lahiri, S. N. (2013) Rates of convergence of the adaptive lasso estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Stat., 41, 1232–1259. [Google Scholar]

[ref19] 19. Chen, M., Gao, C., Ren, Z. & Zhou, H. H. (2013) Sparse cca via precision adjusted iterative thresholding. arXiv preprint arXiv:1311.6186.

[ref20] 20. Chen, X., Han, L. & Carbonell, J. (2012) Structured sparse canonical correlation analysis. Artificial intelligence and statistics. PMLR, pp. 199–207. [Google Scholar]

[ref21] 21. Chernozhukov, V., Chetverikov, D. & Kato, K. (2017) Central limit theorems and bootstrap in high dimensions. Ann. Probab., 45, 2309–2352. [Google Scholar]

[ref22] 22. Dey, S. S., Mazumder, R. & Wang, G. (2018) A convex integer programming approach for optimal sparse pca. arXiv preprint arXiv:1810.09062.

[ref23] 23. Eckart, C. & Young, G. (1936) The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218. [Google Scholar]

[ref24] 24. Gao, C., Ma, Z., Ren, Z. & Zhou, H. H. (2015) Minimax estimation in sparse canonical correlation analysis. Ann. Stat., 43, 2168–2197. [Google Scholar]

[ref25] 25. Gao, C., Ma, Z. & Zhou, H. H. (2017) Sparse cca: adaptive estimation and computational barriers. Ann. Stat., 45, 2074–2101. [Google Scholar]

[ref26] 26. Holm, K., Hegardt, C., Staaf, J., Vallon-Christersson, J., Jönsson, G., Olsson, H., Borg, Å. & Ringnér, M. (2010) Molecular subtypes of breast cancer are associated with characteristic dna methylation patterns. Breast Cancer Res., 12, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] 27. Horn, R. A. & Johnson, C. R. (2012) Matrix analysis. Cambridge university press. [Google Scholar]

[ref28] 28. Hotelling, H. (1992) Relations between two sets of variates. Breakthroughs in statistics. Springer, pp. 162–190. [Google Scholar]

[ref29] 29. Hu, W., Lin, D., Calhoun, V. D., and Wang, Y.-P. (2016). Integration of snps-fmri-methylation data with sparse multi-cca for schizophrenia study. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 3310–3313. IEEE. [DOI] [PubMed] [Google Scholar]

[ref30] 30. Hu, W., Lin, D., Cao, S., Liu, J., Chen, J., Calhoun, V. D. & Wang, Y.-P. (2017) Adaptive sparse multiple canonical correlation analysis with application to imaging (epi) genomics study of schizophrenia. IEEE Trans. Biomed. Eng., 65, 1–399. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31. Huey, N. and Laha, N. (2021). R package for computing de-biased estimators. de.bias.cca. https://github.com/nilanjanalaha/de.bias.CCA.

[ref32] 32. Janková, J. & van de Geer, S. (2021) De-biased sparse pca: inference for eigenstructure of large covariance matrices. IEEE Trans. Inf. Theory, 67, 2507–2527. [Google Scholar]

[ref33] 33. Janková, J. & Van De Geer, S. (2016) Confidence regions for high-dimensional generalized linear models under sparsity. arXiv preprint arXiv:1610.01353.

[ref34] 34. Janková, J. & van de Geer, S. (2017) Honest confidence regions and optimality in high-dimensional precision matrix estimation. Test, 26, 143–162. [Google Scholar]

[ref35] 35. Javanmard, A. & Montanari, A. (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res., 15, 2869–2909. [Google Scholar]

[ref36] 36. Johnstone, I. M. & Lu, A. Y. (2009) On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc., 104, 682–693. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] 37. Kang, M., Zhang, B., Wu, X., Liu, C., and Gao, J. (2013). Sparse generalized canonical correlation analysis for biological model integration: a genetic study of psychiatric disorders. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1490–1493. IEEE. [DOI] [PubMed] [Google Scholar]

[ref38] 38. Kuchibhotla, A. K. & Chakrabortty, A. (2022) Moving beyond sub-gaussianity in high-dimensional statistics: applications in covariance estimation and linear regression. Inf. Inference, 11, 1389–1456. [Google Scholar]

[ref39] 39. Laha, N. & Mukherjee, R. (2022) On support recovery with sparse cca: Information theoretic and computational limits. IEEE Transactions on Information Theory, 69, 1695–1738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40. Lee, M. & Rhee, I. (2017) Cytokine signaling in tumor progression. Immune Netw., 17, 214–227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Lee, W., Lee, D., Lee, Y. & Pawitan, Y. (2011) Sparse canonical covariance analysis for high-throughput data. Stat. Appl. Genet. Mol. Biol., 10. [Google Scholar]

[ref42] 42. Leeb, H. & Pötscher, B. M. (2005) Model selection and inference: facts and fiction. Econometric Theory, 21, 21–59. [Google Scholar]

[ref43] 43. Leeb, H. & Pötscher, B. M. (2006) Can one estimate the conditional distribution of post-model-selection estimators? Ann. Stat., 34, 2554–2591. [Google Scholar]

[ref44] 44. Leeb, H. & Pötscher, B. M. (2008) Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics, 142, 201–211. [Google Scholar]

[ref45] 45. Lin, D., Zhang, J., Li, J., Calhoun, V. D., Deng, H.-W. & Wang, Y.-P. (2013) Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics, 14, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 46. Ma, Z., et al. (2013) Sparse principal component analysis and iterative thresholding. Ann. Stat., 41, 772–801. [Google Scholar]

[ref47] 47. Ma, Z. & Yang, F. (2021) Sample canonical correlation coefficients of high-dimensional random vectors with finite rank correlations. arXiv preprint arXiv:2102.03297.

[ref48] 48. Ma, Z. & Li, X. (2020) Subspace perspective on canonical correlation analysis: dimension reduction and minimax rates. Bernoulli, 26, 432–470. [Google Scholar]

[ref49] 49. Mai, Q. & Zhang, X. (2019) An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics, 75, 734–744. [DOI] [PubMed] [Google Scholar]

[ref50] 50. Mitra, R. & Zhang, C.-H. (2016) The benefit of group sparsity in group inference with de-biased scaled group lasso. Electron. J. Stat., 10, 1829–1873. [Google Scholar]

[ref51] 51. Neykov, M., Ning, Y., Liu, J. S. & Liu, H. (2018) A unified theory of confidence regions and testing for high-dimensional estimating equations. Stat. Sci., 33, 427–443. [Google Scholar]

[ref52] 52. Ning, Y. & Liu, H. (2017) A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat., 45, 158–195. [Google Scholar]

[ref53] 53. Peng, Y., Gao, F. & Zhang, J. (2015) Sparse inverse covariance matrix estimation for functional connectivity analysis of fmri data. Brain Connect., 5, 607–621. [Google Scholar]

[ref54] 54. Pittas, A. G., Joseph, N. A. & Greenberg, A. S. (2004) Adipocytokines and insulin resistance. J. Clin. Endocrinol. Metab., 89, 447–452. [DOI] [PubMed] [Google Scholar]

[ref55] 55. Pötscher, B. M. & Leeb, H. (2009) On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding. J. Multivariate Anal., 100, 2065–2082. [Google Scholar]

[ref56] 56. Ravikumar, P., Wainwright, M. J. & Lafferty, J. D. (2008) Covariance estimation in high-dimensional sparse gaussian graphical models. J. Mach. Learn. Res., 9, 2579–2607. [Google Scholar]

[ref57] 57. Sofer, T., Maity, A., Coull, B., Baccarelli, A. A., Schwartz, J. & Lin, X. (2012) Multivariate gene selection and testing in studying the exposure effects on a gene set. Stat. Biosci., 4, 319–338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref58] 58. Sun, T. & Zhang, C.-H. (2012) Banded and sparse covariance matrices: estimation and applications. J. Multivariate Anal., 110, 149–165. [Google Scholar]

[ref59] 59. Trench, W. F. (1999) Asymptotic distribution of the spectra of a class of generalized kac–Murdock–szegö matrices. Linear Algebra Appl., 294, 181–192. [Google Scholar]

[ref60] 60. Tsiatis, A. (2007) Semiparametric theory and missing data. Springer Science & Business Media. [Google Scholar]

[ref61] 61. van de Geer, S., Bühlmann, P., Ritov, Y. & Dezeure, R. (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat., 42, 1166–1202. [Google Scholar]

[ref62] 62. Vershynin, R. (2010) Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.

[ref63] 63. Vershynin, R. (2018) High-dimensional probability: An introduction with applications in data science, vol. 47. Cambridge university press. [Google Scholar]

[ref64] 64. Wang, T., Berthet, Q. & Samworth, R. J. (2016) Statistical and computational trade-offs in estimation of sparse principal components. Ann. Stat., 44, 1896–1930. [Google Scholar]

[ref65] 65. Wilms, I. & Croux, C. (2015) Sparse canonical correlation analysis from a predictive point of view. Biom. J., 57, 834–851. [DOI] [PubMed] [Google Scholar]

[ref66] 66. Witten, D. M., Tibshirani, R. & Hastie, T. (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10, 515–534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref67] 67. Wu, Y. & Liu, Y. (2014) Sparse modeling and estimation for high-dimensional genetic data. Cambridge University Press. [Google Scholar]

[ref68] 68. Yang, F. (2022) Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors. Electron. J. Probab., 27, 1–71. [Google Scholar]

[ref69] 69. Yuan, M. (2010) High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res., 11, 2261–2286. [Google Scholar]

[ref70] 70. Yuan, X.-T. & Zhang, T. (2013) Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res., 14, 899–925. [Google Scholar]

[ref71] 71. Zhang, C.-H. & Zhang, S. S. (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. B, 76, 217–242. [Google Scholar]

PERMALINK

On statistical inference with high-dimensional sparse CCA

Nilanjana Laha

Nathan Huey

Brent Coull

Rajarshi Mukherjee

Abstract

1. Introduction

1.1 Main contribution

2. Mathematical formalism

2.1 Structural assumptions

Assumption 2.1.

Assumption 2.2.

2.2 Notation

3. Methodology

3.1 The De-biasing method in general

Remark 3.1.

3.2 The De-biasing method for SCCA

Lemma 3.1.

Remark 3.2.

Lemma 3.2.

3.3 Subtleties with other representations of and

3.4 Possible extension to higher order canonical directions

Lemma 3.3.

4. Asymptotic theory for the de-biased estimator

4.1 Assumptions on , and

Condition 4.1.

Assumption 4.1.

Condition 4.2.

4.2 Theoretical analyses

Theorem 4.1.

Corollary 1.

Proposition 4.2.

Corollary 2.

Corollary 3.

Remark 4.1.

4.2.1 Inference on

Assumption 4.2.

Condition 4.3.

Theorem 4.3.

5. Connection to related literature

6. On the conditions and assumptions of section 4

7. Numerical experiments

7.1 Preliminaries

Fig. 1.

7.2 Results

Fig. 2.

Fig. 3.

8. Real data application

Table 1.

Table 2.

Table 3.

Supplementary Material

Contributor Information

Funding

Data availability statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases