Inference for High-dimensional Differential Correlation Matrices

T Tony Cai; Anru Zhang

doi:10.1016/j.jmva.2015.08.019

. Author manuscript; available in PMC: 2017 Jan 1.

Published in final edited form as: J Multivar Anal. 2016 Jan 1;143:107–126. doi: 10.1016/j.jmva.2015.08.019

Inference for High-dimensional Differential Correlation Matrices ^*

T Tony Cai ¹, Anru Zhang ^1,^†

PMCID: PMC4612396 NIHMSID: NIHMS727286 PMID: 26500380

Abstract

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

Keywords: Adaptive thresholding, covariance matrix, differential co-expression analysis, differential correlation matrix, optimal rate of convergence, sparse correlation matrix, thresholding

1 Introduction

Statistical inference on the correlation structure has a wide array of applications, ranging from gene co-expression network analysis (Carter et al., 2004; Lee et al., 2004; Zhang et al., 2008; Dubois et al., 2010; Fuller et al., 2007) to brain intelligence analysis (Shaw et al., 2006). For example, understanding the correlations between the genes is critical for the construction of the gene co-expression network. See Kostka and Spang (2004), Lai et al. (2004), and Fuller et al. (2007). Driven by these and other applications in genomics, signal processing, empirical finance, and many other fields, making sound inference on the high-dimensional correlation structure is becoming a crucial problem.

In addition to the correlation structure of a single population, the difference between the correlation matrices of two populations is of significant interest. Differential gene expression analysis is widely used in genomics to identify disease-associated genes for complex diseases. Conventional methods mainly focus on the comparisons of the mean expression levels between the disease and control groups. In some cases, clinical disease characteristics such as survival or tumor stage do not have significant associations with gene expression, but there may be significant effects on gene co-expression related to the clinical outcome (Shedden and Taylor (2005); Hudson et al. (2009); Bandyopadhyay et al. (2010)). Recent studies have shown that changes in the correlation networks from different stages of disease or from case and control groups are also of importance in identifying dysfunctional gene expressions in disease. See, for example, de la Fuente (2010). This differential co-expression network analysis has become an important complement to the original differential expression analysis as differential correlations among the genes may reflect the rewiring of genetic networks between two different conditions (See Shedden and Taylor (2005); Bandyopadhyay et al. (2010); de la Fuente (2010); Ideker and Krogan (2012); Fukushima (2013)).

Motivated by these applications, we consider in this paper optimal estimation of the differential correlation matrix. Specifically, suppose we observe two independent sets of p-dimensional i.i.d. random samples $X^{(t)} = {X_{1}^{(t)}, \dots, X_{n_{t}}^{(t)}}$ with mean μ_t, covariance matrix Σ_t, and correlation matrix R_t, where t = 1 and 2. The goal is to estimate the differential correlation matrix D = R₁ − R₂. A particular focus of the paper is on estimating an approximately sparse differential correlation matrix in the high dimensional setting where the dimension is much larger than the sample sizes, i.e., p ≫ max(n₁, n₂). The estimation accuracy is evaluated under both the spectral norm loss and the Frobenius norm loss.

A naive approach to estimating the differential correlation matrix D = R₁ − R₂ is to first estimate the covariance matrices Σ₁ and Σ₂ separately and then normalize to obtain estimators R̂₁ and R̂₂ of the individual correlation matrices R₁ and R₂, and finally take the difference D̂ = R̂₁ − R̂₂ as the estimator of the differential correlation matrix D. A simple estimate of a correlation matrix is the sample correlation matrix. However, in the high-dimensional setting, the sample correlation matrix is a poor estimate. Significant advances have been made in the last few years on optimal estimation of a high-dimensional covariance matrix. Regularization methods such as banding, tapering, and thresholding have been proposed. In particular, Cai et al. (2010) established the optimal rate of convergence and Cai and Yuan (2012) developed an adaptive estimator of bandable covariance matrices. For sparse covariance matrices where each row and each column has relatively few nonzero entries, Bickel and Levina (2008) introduced a thresholding estimator and obtained rates of convergence; Cai and Liu (2011) proposed an adaptive thresholding procedure and Cai and Zhou (2012) established the minimax rates of convergence for estimating sparse covariance matrices.

Structural assumptions on the individual correlation matrices R₁ and R₂ are crucial for the good performance of the difference estimator. These assumptions, however, may not hold in practice. For example, gene transcriptional networks often contain the so-called hub nodes where the corresponding gene expressions are correlated with many other gene expressions. See, for example, (Barabási and Oltvai, 2004; Barabási et al., 2011). In such settings, some of the rows and columns of R₁ and R₂ have many nonzero entries which mean that R₁ and R₂ are not sparse. In genomic applications, the correlation matrices are rarely bandable as the genes are not ordered in any particular way.

In this paper, we propose a direct estimation method for the differential correlation matrix D = R₁ − R₂ without first estimating R₁ and R₂ individually. This direct estimation method assumes that D is approximately sparse, but otherwise does not impose any structural assumptions on the individual correlation matrices R₁ and R₂. An adaptive thresholding procedure is introduced and analyzed. The estimator can still perform well even when the individual correlation matrices cannot be estimated consistently. For example, direct estimation can recover the differential correlation network accurately even in the presence of hub nodes in R₁ and R₂ as long as the differential correlation network is approximately sparse. The key is that sparsity is assumed for D and not for R₁ or R₂.

Theoretical performance guarantees are provided for direct estimator of the differential correlation matrix. Minimax rates of convergence are established for the collections of paired correlation matrices with approximately sparse differences. The proposed estimator is shown to be adaptively rate-optimal. In comparison to adaptive estimation of a single sparse covariance matrix considered in Cai and Liu (2011), both the procedure and the technical analysis of our method are different and more involved. Technically speaking, correlation matrix estimators are harder to analyze than those of covariance matrices and the two-sample setting in our problem further increases the difficulty.

Numerical performance of the proposed estimator is investigated through simulations. The results indicate significant advantage of estimating the differential correlation matrix directly. The estimator outperforms two other natural alternatives that are based on separate estimation of R₁ and R₂. To further illustrate the merit of the method, we apply the procedure to the analysis of a breast cancer dataset from the study by van de Vijver et al. (2002) and investigate the differential co-expressions among genes in different tumor stages of breast cancer. The adaptive thresholding procedure is applied to analyze the difference in the correlation alternation in different grades of tumor. The study provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer.

In addition to optimal estimation of the differential correlation matrix, we also consider hypothesis testing of the differential correlation matrices, H₀ : R₁−R₂ = 0 versus H₁ : R₁−R₂ ≠ 0. We propose a test which is particularly well suited for testing again sparse alternatives. The same ideas and techniques can also be used to treat other related problems. We also consider estimation of a single sparse correlation matrix from one random sample, estimation of the differential covariance matrices as well as estimation of the differential cross-correlation matrices.

The rest of the paper is organized as follows. Section 2 presents in detail the adaptive thresholding procedure for estimating the differential correlation matrix. The theoretical properties of the proposed estimator are analyzed in Section 3. In Section 4, simulation studies are carried out to investigate the numerical performance of the thresholding estimator and Section 5 illustrates the procedure through an analysis of a breast cancer dataset. Hypothesis testing on the differential correlation matrices is discussed in Section 6.1, and other related problems are considered in the rest of Section 6. All the proofs are given in the Appendix.

2 Estimation of Differential Correlation Matrix

We consider in this section estimation of the differential correlation matrix and introduce a data-driven adaptive thresholding estimator. The theoretical and numerical properties of the estimator are investigated in Sections 3 and 4 respectively.

Let $X^{(t)} = {(X_{1}^{(t)}, \dots, X_{p}^{(t)})}^{⊤}$ be a p-variate random vector with mean μ_t, covariance matrix Σ_t = (σ_ijt)_1≤_i_,_j_≤_p, and correlation matrix R_t = (r_ijt)_1≤_i_,_j_≤_p, for t = 1 and 2. Suppose we observe two i.i.d. random samples, { $X_{1}^{(1)}, \dots, X_{n_{1}}^{(1)}$ } from X⁽¹⁾ and { $X_{1}^{(2)}, \dots, X_{n_{2}}^{(2)}$ } from X⁽²⁾, and the two samples are independent. The goal is to estimate the differential correlation matrix D = R₁ − R₂ under the assumption that D is approximately sparse.

Given the two random samples, the sample covariance matrices and sample correlation matrices are defined as, for t = 1 and 2,

{\sum^{^}}_{t} = {({\hat{σ}}_{ijt})}_{1 \leq i, j \leq p} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} (X_{k}^{(t)} - {\bar{X}}^{(t)}) {(X_{k}^{(t)} - {\bar{X}}^{(t)})}^{⊤},

(1)

{\hat{R}}_{t} = {({\hat{r}}_{ijt})}_{1 \leq i, j \leq p} = diag {({\sum^{^}}_{t})}^{- 1 / 2} \cdot {\sum^{^}}_{t} \cdot diag {({\sum^{^}}_{t})}^{- 1 / 2},

(2)

where ${\bar{X}}^{(t)} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} X_{k}^{(t)}$ and diag(Σ̂_t) is the diagonal matrix with the same diagonal as Σ̂_t. We propose a thresholding estimator of the differential correlation matrix D by individually thresholding the entries of the difference of the two sample correlation matrices R̂₁ − R̂₂ with the threshold adaptive to the noise level of each entry. A key to the construction of the procedure is the estimation of the noise levels of the individual entries of R̂₁ − R̂₂, as these entries are random variables themselves.

We first provide some intuition before formally introducing the estimate of the noise levels of the individual entries of R̂₁ − R̂₂. Note that $E ((X_{i}^{(t)} - μ_{i t}) (X_{j}^{(t)} - μ_{j t})) = σ_{ijt}$ and $μ_{i t} \approx {\bar{X}}_{i}^{(t)} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} X_{i k}$ . Define

θ_{ijt} = var ((X_{i}^{(t)} - μ_{i t}) (X_{j}^{(t)} - μ_{j t})), 1 \leq i, j \leq p, t = 1, 2.

(3)

Then one can intuitively write

{\hat{σ}}_{ijt} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} (X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)}) \approx σ_{ijt} + {(\frac{θ_{ijt}}{n_{t}})}^{1 / 2} z_{ijt},

(4)

where z_ijt is approximately normal with mean 0 and variance 1. Hence, θ_ijt/n_t measures the uncertainty of the sample covariance σ̂_ijt. Based on the first order Taylor expansion of the 3-variate function $\frac{x}{{(y^{z})}^{1 / 2}}$ for x ∈ ℝ, and y, z > 0,

\frac{\hat{x}}{{(\hat{y} \hat{z})}^{1 / 2}} = \frac{x}{{(y z)}^{1 / 2}} + \frac{\hat{x} - x}{{(y z)}^{1 / 2}} - \frac{x}{{(y z)}^{1 / 2}} (\frac{\hat{y} - y}{2 y} + \frac{\hat{z} - z}{2 z}) + o (\hat{x} - x) + o (\hat{y} - y) + o (\hat{z} - z),

(5)

the entries r̂_ijt of the sample correlation matrix R̂_t = (r̂_ijt) can be approximated by

\begin{array}{l} {\hat{r}}_{ijt} = \frac{{\hat{σ}}_{ijt}}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} \approx \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} + {(\frac{θ_{ijt}}{n_{t} σ_{iit} σ_{jjt}})}^{1 / 2} z_{ijt} - \frac{σ_{ijt}}{2 {(σ_{iit} σ_{jjt})}^{1 / 2}} ({(\frac{θ_{iit}}{n_{t} σ_{iit}^{2}})}^{1 / 2} z_{iit} + {(\frac{θ_{jjt}}{n_{t} σ_{jjt}^{2}})}^{1 / 2} z_{jjt}) \\ = r_{ijt} + {(\frac{ξ_{ijt}}{n_{t}})}^{1 / 2} z_{ijt} - \frac{r_{ijt}}{2} ({(\frac{ξ_{iit}}{n_{t}})}^{1 / 2} z_{iit} + {(\frac{ξ_{jjt}}{n_{t}})}^{1 / 2} z_{jjt}), \end{array}

(6)

where we denote

ξ_{ijt} = \frac{θ_{ijt}}{σ_{iit} σ_{jjt}}, 1 \leq i, j \leq p, t = 1, 2.

It then follows from (6) that

\begin{array}{l} {\hat{r}}_{i j 1} - {\hat{r}}_{i j 2} \approx r_{i j 1} - r_{i j 2} + {(\frac{ξ_{i j 1}}{n_{1}})}^{1 / 2} z_{i j 1} - \frac{r_{i j 1}}{2} ({(\frac{ξ_{i i 1}}{n_{1}})}^{1 / 2} z_{i i 1} + {(\frac{ξ_{j j 1}}{n_{1}})}^{1 / 2} z_{j j 1}) \\ - ({(\frac{ξ_{i j 2}}{n_{2}})}^{1 / 2} z_{i j 2} - \frac{r_{i j 2}}{2} ({(\frac{ξ_{i i 2}}{n_{2}})}^{1 / 2} z_{i i 2} + {(\frac{ξ_{j j 2}}{n_{2}})}^{1 / 2} z_{j j 2})), 1 \leq i, j \leq p, \end{array}

(7)

where the random variables z_ij₁ and z_ij₂ are approximately normal with mean 0 and variance 1, but not necessarily independent for 1 ≤ i, j ≤ p.

Equation (7) suggests that estimation of r_ij₁ − r_ij₂ is similar to the sparse covariance matrix estimation considered in Cai and Liu (2011), where it is proposed to adaptively threshold entries according to their individual noise levels. However, the setting here is more complicated as r̂_ij₁ − r̂_ij₂ is not an unbiased estimate of r_ij₁ − r_ij₂ and the noise levels are harder to estimate. These make the technical analysis more involved. The noise levels are unknown here but can be estimated based on the observed data. Specifically, we estimate θ_ijt and ξ_ijt by the following data-driven quantities,

{\hat{θ}}_{ijt} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} {((X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)}) - {\hat{σ}}_{ijt})}^{2},

(8)

{\hat{ξ}}_{ijt} = \frac{{\hat{θ}}_{ijt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}} = \frac{1}{n_{t} {\hat{σ}}_{iit} {\hat{σ}}_{jjt}} \sum_{k = 1}^{n_{t}} {((X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)}) - {\hat{σ}}_{ijt})}^{2} .

(9)

We are now ready to introduce the adaptive thresholding estimator of R₁ − R₂ using data-driven threshold levels. Let s_λ(z) be a thresholding function satisfying the following conditions:

(C1)
|s_λ(z)| ≤ c|y| for all z, y satisfying |z − y| ≤ λ for some c > 0;
(C2)
s_λ(z) = 0 for |z| ≤ λ;
(C3)
|s_λ(z) − z| ≤ λ, for all z ∈ ℝ.

Note that the commonly used soft thresholding function s_λ(z) = sgn(z)(z−λ)₊ and the adaptive lasso rule s_λ = z(1 − |λ/z|^η)₊ with η ≥ 1 satisfy these three conditions. See Rothman et al. (2009) and Cai and Liu (2011). Although the hard thresholding function s_λ(z) = z · 1_{|_z_|≥_λ_} does not satisfy Condition (C1), the technical arguments given in this paper still work with very minor changes.

We propose to estimate the sparse differential correlation matrix D by the entrywise thresholding estimator ${\hat{D}}^{*} = ({\hat{d}}_{i j}^{*}) \in ℝ^{p \times p}$ defined as

{\hat{d}}_{i j}^{*} = s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}), 1 \leq i, j \leq p,

where s_λ(z) is a thresholding function satisfying (C1)–(C3) and the threshold level λ_ij is given by λ_ij = λ_ij₁ + λ_ij₂ with

λ_{ijt} = τ {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ {\hat{r}}_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2})), 1 \leq i, j \leq p, t = 1, 2.

(10)

Here ξ̂_ijt are given by (9) and the thresholding constant τ can be chosen empirically through cross-validation. See Section 4.1 for more discussions on the empirical choice of τ.

3 Theoretical Properties

We now analyze the theoretical properties of the data-driven thresholding estimator D̂^* proposed in the last section. We will establish the minimax rate of convergence for estimating the differential correlation matrix D over certain classes of paired correlation matrices (R₁, R₂) with approximately sparse difference D = R₁ − R₂ under the spectral norm loss. The results show that D̂^* is rate-optimal under mild conditions.

3.1 Rate Optimality of the Thresholding Estimator

We consider the following class of paired correlation matrices in ℝ^p^×^p with approximately sparse difference

G_{q} (s_{0} (p)) = {(R_{1}, R_{2}) : R_{1}, R_{2} ≽ 0; diag (R_{1}) = diag (R_{2}) = 1; max_{i} \sum_{j} {∣ r_{i j 1} - r_{i j 2} ∣}^{q} \leq s_{0} (p)}

(11)

for some 0 ≤ q < 1. Here R₁, R₂ ⪰ 0 and diag(R₁) = diag(R₂) = 1 mean that R₁ and R₂ are symmetric, semi-positive definite, and with all diagonal entries 1. For (R₁, R₂) ∈ Inline graphic (s₀(p)), their difference R₁ − R₂ is approximately sparse in the sense that each row vector of R₁ − R₂ lies in the ℓ_q ball with radius s₀(p) and 0 ≤ q < 1. When q = 0, this constraint becomes the commonly used exact sparsity condition.

Let

Y_{i}^{(t)} = (X_{i}^{(t)} - μ_{i t}) / {(var (X_{i}^{(t)}))}^{1 / 2}, i = 1, \dots, p, t = 1, 2.

We assume that for each i, Y_i is sub-Gaussian distributed, i.e. there exist constants K, η > 0 such that for all 1 ≤ i ≤ p and t = 1, 2,

E e^{u {(Y_{i}^{(t)})}^{2}} \leq K, for ∣ u ∣ \leq η .

(12)

In addition, we assume for some constant ν₀ > 0

min_{1 \leq i, j \leq p; t = 1, 2} var (Y_{i}^{(t)} Y_{j}^{(t)}) \geq ν_{0} .

(13)

The following theorem provides an upper bound for the risk of the thresholding estimator D̂^* under the spectral norm loss.

Theorem 3.1 (Upper bound)

Suppose log p = o(min(n₁, n₂)^1/3) and (12) and (13) hold. Suppose the thresholding function s_λ(z) satisfy Conditions (C1)–(C3). Then the thresholding estimator D̂^* defined in (2) and (10) with τ > 4 satisfies

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(14)

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(15)

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} \leq C p (s_{0} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q / 2}

(16)

for some constant C > 0 that does not depend on n₁, n₂ or p.

Remark 3.1

Condition (13) holds naturally when X⁽^t⁾ are jointly Gaussian. To see this point, we suppose ρ_ijt is the correlation between $Y_{i}^{(t)}$ and $Y_{j}^{(t)}$ . Then one can write $Y_{j}^{(t)} = ρ_{ijt} Y_{i}^{(t)} + \sqrt{1 - ρ_{ijt}^{2}} W$ , where $Y_{i}^{(t)}$ , W are independently standard Gaussian. It is easy to calculate that $Var (Y_{i}^{(t)} Y_{j}^{(t)}) = 1 + ρ_{ijt}^{2} \geq 1$ , which implies (13) holds for ν₀ = 1. Condition (13) is used in Lemma 6.1 to show that θ̂_ijt is a good estimate of θ_ijt and |σ̂_ijt − σ_ijt| can be controlled by C(θ̂_ijt log p/n_t)^1/2 with high probability.

Theorem 3.1 gives the rate of convergence for the thresholding estimator D̂^*. The following result provides the lower bound for the minimax risk of estimating the differential correlation matrix D = R₁ − R₂ with (R₁, R₂) ∈ Inline graphic (s₀(p)).

Theorem 3.2 (Lower Bound)

Suppose log p = o(min(n₁, n₂)) and s₀(p) ≤ M min(n₁, n₂)⁽¹⁻^q^)/2 × (log p)⁻⁽³⁻^q^)/2 for some constant M > 0. Then minimax risk for estimating D = R₁ − R₂ satisfies

inf_{\hat{D}} sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ \hat{D} - (R_{1} - R_{2}) ‖}^{2} \geq {c s}_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q},

(17)

inf_{\hat{D}} sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ \hat{D} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} \geq {c s}_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q},

(18)

inf_{\hat{D}} sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ \hat{D} - (R_{1} - R_{2}) ‖}_{F}^{2} \geq c s_{0} (p) p {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q / 2},

(19)

for some constant c > 0.

Theorems 3.1 and 3.2 together yield the minimax rate of convergence

s_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

for estimating D = R₁ − R₂ with (R₁, R₂) ∈ Inline graphic (s₀(p)) under the spectral norm loss, and show that the thresholding estimator D̂^* defined in (2) and (10) is adaptively rate-optimal.

Remark 3.2

The technical analysis here for the different of two correlation matrices is more complicated in comparison to the problem of estimating a sparse covariance matrix considered in Cai and Liu (2011). It can be seen in (7), i.e. the “signal + noise” expression of r̂_ij₁ − r̂_ij₂, the difference of the sample correlation matrices has six “noise terms”. It is necessary to deal with all these six terms in the theoretical analysis of Theorem 3.1.

4 Numerical Studies

We investigate in this section the numerical performance of the adaptive thresholding estimator of the differential correlation matrix through simulations. The method is applied to the analysis of a breast cancer dataset in the next section.

In the previous sections, we proposed the entrywise thresholding method for estimating R₁ − R₂ and then studied the theoretical properties of D̂^* with a fixed τ > 4. However, the theoretical choice of τ may not be optimal in finite sample performance, as we can see in the following example. Let R₁ and R₂ be 200 × 200-dimensional matrices such that R_1,_ij = (−1)^|ⁱ⁻^j^|×max(1−|i−j|/10, 0)×(1_{_i₌_j_}+f_if_j1_{_i_≠_j_}) and R_2,_ij = max(1−|i−j|/10, 0)×(1_{_i₌_j_}+f_if_j1_{_i_≠_j_}). Here 1_{·} is the indicator function, f₁, ···, f₂₀₀ are i.i.d. random variables that are uniformly distributed on [0, 1]. In this setting, both R₁ and R₂ are sparse, but their difference is even more sparse. We set Σ_t = R_t and generate 200 independent samples from X⁽¹⁾ ~ N(0, Σ₁) and 200 independent samples from X⁽²⁾ ~ N(0, Σ₂). For various values of τ ∈ [0, 5], we implement the proposed method with hard thresholding and repeat the experiments for 100 times. The average loss in spectral, ℓ₁ and Frobenious norms are shown in Figure 1. Obviously in this example, τ > 4 is not the best choice.

Average (Spectral, ℓ₁, Frobenious) norm losses for τ ∈ [0, 5]. p = 100, n₁ = n₂ = 50.

Empirically, we find that the numerical performance of the estimator can often be improved by using a data-driven choice of τ based on cross-validation. We thus begin by introducing the following K-fold cross-validation method for the empirical selection of τ.

4.1 Empirical Choice of τ

For an integer K ≥ 2, we first divide both samples $X^{(1)} = {X_{1}^{(1)}, X_{2}^{(1)}, \dots, X_{n_{1}}^{(1)}}$ and $X^{(2)} = {X_{1}^{(2)}, X_{2}^{(2)}, \dots, X_{n_{2}}^{(2)}}$ randomly into two groups for H times as $X_{I_{1}^{h}}^{(1)}, X_{T_{1}^{h}}^{(1)}, X_{I_{2}^{h}}^{(2)}$ and $X_{T_{2}^{h}}^{(2)}$ . Here h = 1, …, H represents the h-th division. For t = 1 and 2, the size of the first group $X_{I_{t}^{h}}^{(t)}$ is approximately (K − 1)/K · n_t and the size of the second group $X_{T_{t}^{h}}^{(t)}$ is approximately n_t/K. We then calculate the corresponding sample correlation matrices as ${\hat{R}}_{I_{1}^{h}}^{(1)}, {\hat{R}}_{T_{1}^{h}}^{(1)}, {\hat{R}}_{I_{2}^{h}}^{(2)}$ and ${\hat{R}}_{T_{2}^{h}}^{(2)}$ for all four sub-samples. Partition the interval [0, 5] into an equi-spaced grid {0, $\frac{1}{N}, \dots, \frac{5 N}{N}$ }. For each value of τ ∈ {0, $\frac{1}{N}, \dots, \frac{5 N}{N}$ }, we obtain the thresholding estimator ${\hat{D}}_{I^{h}}^{*}$ defined in (2) and (10) with the thresholding constant τ based on the subsamples $X_{I_{1}^{h}}^{(1)}$ and $X_{I_{2}^{h}}^{(2)}$ . Denote the average loss for each τ for the second sub-samples $X_{T_{1}^{h}}^{(1)}$ and $X_{T_{2}^{h}}^{(2)}$ as

L (τ) = \frac{1}{H} \sum_{h = 1}^{H} {‖ {\hat{D}}_{I^{h}}^{*} - ({\hat{R}}_{T_{1}^{h}}^{(1)} - {\hat{R}}_{T_{2}^{h}}^{(2)}) ‖}_{F}^{2} .

We select

\hat{τ} = \underset{τ \in {0, \frac{1}{N}, \dots, \frac{5 N}{N}}}{argmin} L (τ)

as our empirical choice of the thresholding constant τ, and calculate the final estimator D̂^*(τ̂) with the thresholding constant τ̂ based on the whole samples X⁽¹⁾ and X⁽²⁾.

4.2 Estimation of Differential Correlation Matrix

The adaptive thresholding estimator is easy to implement. We consider the following two models under which the differential correlation matrix is sparse.

Model 1 (Random Sparse Difference) R₁ and R₂ are p-dimensional symmetric positive definite matrices such that $R_{1} = diag (B_{1}, I_{\frac{p}{2} \times \frac{p}{2}})$ is a fixed matrix, where $B_{1} \in ℝ^{\frac{p}{2} \times \frac{p}{2}}$ with B_1,_ij = 1 if i = j and B_1,_ij = 0·2 if i ≠ j, $I_{\frac{p}{2} \times \frac{p}{2}}$ is the $\frac{p}{2} \times \frac{p}{2}$ identity matrix, and R₂ is randomly generated as $R_{2} = diag (B_{1} + λ D_{0}, I_{\frac{p}{2} \times \frac{p}{2}})$ , where $D_{0} \in ℝ^{\frac{p}{2} \times \frac{p}{2}}$ with
$D_{i j, 0} = {\begin{matrix} 1, & with probability 0 \cdot 05 \\ 0, & with probability 0 \cdot 9 \\ - 1, & with probability 0 \cdot 05 \end{matrix}$

and λ is a constant that ensures the positive definiteness of R₂.
Model 2 (Banded Difference) In this setting, p-dimensional matrices R₁ and R₂ satisfy R_1,_ij =0·2×1_{_i₌_j_}+0·8×(−1)^|ⁱ⁻^j^| × max(1 − |i − j|/10, 0) and R_2,_ij = R_1,_ij+0·2×1_{_i_≠_j_} × max(1 − |i − j|/3, 0). Here 1_{·} is the indicator function.

In each of the two settings, we set Σ_t = diag(|ω_t|^1/2)R_tdiag(|ω_t|^1/2) for both t = 1, 2, where ω₁, ω₂ ∈ ℝ^p are two i.i.d. samples from N(0, I_p). These operations make the covariance matrices Σ₁ and Σ₂ have different values along the diagonals.

We generate i.i.d. samples from X⁽¹⁾ ~ N(0, Σ₁) and X⁽²⁾ ~ N(0, Σ₂) for various values of p, n₁, and n₂ and then apply the proposed algorithm with 5-fold cross-validation for the selection of the thresholding constant τ. For each setting, both the hard thresholding and adaptive-Lasso thresholding (Rothman et al. (2009)),

s_{λ} (z) = z \cdot max (1 - {∣ λ / z ∣}^{η}, 0) with η = 4,

(20)

are used. For comparison, we also implement three natural estimators of D.

The covariance matrices Σ₁ and Σ₂ are estimated individually by the adaptive thresholding method proposed in Cai and Liu (2011) with 5-fold cross-validation and then ${\sum^{^}}_{1}^{*}$ and ${\sum^{^}}_{2}^{*}$ are normalized to yield estimators of R₁ and R₂,
${\hat{R}}_{1}^{*} = diag {({\sum^{^}}_{1}^{*})}^{- 1 / 2} {\sum^{^}}_{1}^{*} diag {({\sum^{^}}_{1}^{*})}^{- 1 / 2}, {\hat{R}}_{2}^{*} = diag {({\sum^{^}}_{2}^{*})}^{- 1 / 2} {\sum^{^}}_{2}^{*} diag {({\sum^{^}}_{2}^{*})}^{- 1 / 2},$

and finally D = R₁ − R₂ is estimated by the difference ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ .
The correlation matrices ${\hat{R}}_{1}^{•}$ and ${\hat{R}}_{2}^{•}$ are estimated separately using the method proposed in Section 6.2 and then take the difference.
D is estimated directly the difference of the sample correlation matrices R̂₁ − R̂₂.

The numerical results are summarized in Tables 1 and 4.2 for the two models respectively. In each case, we compare the performance of the three estimators D^*, ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ and R̂₁ − R̂₂ under the spectral norm, matrix ℓ₁ norm, and Frobenius norm losses. For both models, it is easy to see that the direct thresholding estimator D^* significantly outperforms ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ and R̂₁ − R̂₂. Under Model 1, the individual correlation matrices R₁ and R₂ are “dense” in the sense that half of the rows and columns contain many non zeros entries, but their difference D is sparse. In this case, R₁ and R₂ cannot be estimated consistently and the two difference estimators ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ and R̂₁ − R̂₂ based on the individual estimators of R₁ and R₂ perform very poorly, while the direct estimator D^* performs very well. Moreover, the numerical performance of the thresholding estimators does not depend on the specific thresholding rules in a significant way. Different thresholding rules including hard thresholding and adaptive Lasso behave similarly.

Table 1.

Comparison of D̂^* with ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ and R̂₁ − R̂₂ under Model 1.

n₁

n₂

Hard

Adaptive Lasso

Sample

D̂^*

{\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}

{\hat{R}}_{1}^{•} - {\hat{R}}_{2}^{•}

D̂^*

{\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}

{\hat{R}}_{1}^{•} - {\hat{R}}_{2}^{•}

R̂₁ − R̂₂

Spectral Norm

100

0·50(0·41)

1·75(1·37)

6·94(1·07)

0·33(0·31)

1·51(1·75)

6·17(0·98)

7·28(0·93)

100

0·34(0·21)

3·79(3·17)

4·74(0·55)

0·28(0·23)

3·53(2·71)

4·49(0·71)

5·02(0·65)

100

200

0·29(0·19)

4·22(2·14)

3·23(0·47)

0·24(0·13)

4·52(1·86)

3·14(0·54)

3·55(0·47)

100

500

0·24(0·10)

1·72(0·35)

2·07(0·32)

0·22(0·08)

1·82(0·35)

1·87(0·26)

2·23(0·25)

500

0·56(0·77)

3·02(2·76)

31·88(4·04)

0·40(0·65)

3·47(4·63)

29·15(4·26)

34·66(3·84)

500

100

0·41(0·48)

8·09(11·02)

23·38(4·52)

0·34(0·39)

12·99(13·26)

21·82(3·23)

24·19(2·77)

500

200

0·32(0·40)

22·22(13·06)

15·67(3·30)

0·26(0·34)

21·31(9·29)

14·61(2·24)

16·50(1·97)

500

0·20(0·19)

7·80(1·37)

7·80(1·29)

0·18(0·14)

8·21(1·69)

8·70(1·39)

10·46(1·31)

Matrix ℓ₁ Norm

100

0·89(0·68)

3·63(2·97)

18·91(1·55)

0·78(0·80)

3·14(3·20)

16·88(1·42)

21·33(1·61)

100

0·64(0·25)

7·34(4·85)

13·42(1·08)

0·70(0·87)

7·03(4·14)

12·72(1·18)

14·97(1·14)

100

200

0·64(0·34)

9·63(1·85)

9·22(0·75)

0·61(0·37)

9·37(1·77)

8·67(0·87)

10·60(0·81)

100

500

0·58(0·22)

4·54(0·56)

5·92(0·54)

0·56(0·21)

4·85(0·61)

5·33(0·45)

6·69(0·44)

500

1·69(3·09)

7·85(8·14)

97·28(6·64)

1·37(2·87)

9·49(11·93)

87·02(7·31)

112·40(8·97)

500

100

1·06(1·28)

20·12(19·50)

64·98(5·93)

1·17(1·47)

27·95(22·75)

65·60(5·07)

79·66(5·37)

500

200

0·97(1·48)

51·64(10·13)

46·74(5·01)

0·95(1·24)

47·87(8·50)

45·54(3·70)

55·77(3·70)

500

0·68(0·54)

23·19(2·86)

23·47(2·56)

0·69(0·52)

24·67(2·92)

27·21(2·09)

35·32(2·04)

Frobenious Norm

100

1·40(1·32)

4·34(2·51)

19·01(0·37)

1·06(1·04)

3·26(2·61)

16·60(0·42)

19·87(0·38)

100

0·96(0·59)

7·14(3·67)

13·38(0·23)

0·94(0·81)

6·23(3·31)

12·10(0·27)

14·05(0·25)

100

200

0·89(0·57)

9·09(1·03)

9·30(0·20)

0·84(0·50)

8·15(1·07)

8·42(0·25)

9·94(0·18)

100

500

0·85(0·32)

4·37(0·27)

5·92(0·11)

0·82(0·30)

4·43(0·25)

5·26(0·11)

6·39(0·10)

500

3·33(5·63)

11·18(7·71)

95·05(0·91)

2·27(3·92)

9·57(9·31)

83·40(1·24)

99·97(0·93)

500

100

2·18(2·98)

20·17(14·55)

61·37(2·09)

2·10(2·47)

22·54(16·69)

60·58(0·85)

70·40(0·67)

500

200

1·77(2·39)

45·06(5·89)

42·11(1·67)

1·63(1·96)

39·52(5·14)

41·88(0·73)

49·53(0·49)

500

1·27(1·09)

20·17(0·99)

21·74(0·65)

1·22(0·84)

20·48(0·91)

25·47(0·41)

31·34(0·33)

Open in a new tab

5 Analysis of A Breast Cancer Dataset

Identifying gene expression networks can be helpful for conducting more effective treatment based to the condition of patients. de la Fuente (2010) demonstrated that the gene expression networks can vary in different disease states and the differential correlations in gene expression (i.e. co-expression) are useful in disease studies.

In this section, we consider the dataset “70pathwaygenes-by-grade” from the study by van de Vijver et al. (2002) and investigate the differential co-expressions among genes in different tumor stages of breast cancer. In this dataset, there are 295 records of patients with 1624 gene expressions, which are categorized into three groups based on the histological grades of tumor (“Good”, “Intermediate” and “Poor”) with 74, 101 and 119 records, respectively. We denote these three groups of samples as X⁽¹⁾, X⁽²⁾ and X⁽³⁾. In order to analyze the difference in the correlation alternation in different grades of tumor, we apply our adaptive thresholding method with cross-validation to estimate the differential correlation matrices among those gene expressions from different stages.

The number of gene pairs with significant difference in correlation are listed in Table 3. The results show that the correlation structures between the “Good” and “Intermediate” groups are similar and there is some significant changes between the “Good” and “Poor” group.

Table 3.

The number of gene pairs that have significant differential correlation betweens two groups of different tumor grades

	Good v.s. Intermediate	Intermediate v.s. Poor	Good v.s. Poor
# of selected pairs	0	2	152

Open in a new tab

More interestingly, by combining the “Good” and “Intermediate” sub-samples and comparing with the “Poor” group, we find significant differences between their correlation structure. There are 4526 pairs of genes that have significantly different correlations between the “Good + Intermediate” and “Poor” groups. For each given gene, we count the number of the genes whose correlation with this gene is significantly different between these two groups, and rank all the genes by the counts. That is, we rank the genes by the size of the support of D̂^* in each row. The top ten genes are listed in Table 4.

Table 4.

The top ten genes that appear for most times in the selected pairs in “Good + Intermediate” v.s. “Poor”

Gene	number of pairs
growth differentiation factor 5 (GDF5)	67
transcription factor 7-like 1 (TCF7L1)	64
3′-phosphoadenosine 5′-phosphosulfate synthase 1 (PAPSS1)	51
secreted frizzled-related protein 1(SFRP1)	43
gamma-aminobutyric acid A receptor, pi (GABRP)	41
mannosidase, alpha, class 2B, member 2 (MAN2B2)	37
desmocollin 2 (DSC2)	36
transforming growth factor, beta 3 (TGFB3)	35
CRADD	35
ELOVL fatty acid elongase 5(ELOVL5)	32

Open in a new tab

Among these ten genes, six of them, GDF5, TCF7L1, PAPSS1, SFRP1, GABRP, TGFB1, have been previously studied and verified in the literature that are associated with the breast cancer (See Margheri et al. (2012), Shy et al. (2013), Xu et al. (2012), Klopocki et al. (2004), Zafrakas et al. (2006), and Ghellal et al. (2000), respectively). Take for example GDF5 and TCF7L1, the overproduction of Transforming growth factor beta-1 (TGFβ), a multifunctional cytokine, is an important characteristic of late tumor progression. Based on the study by Margheri et al. (2012), TGFβ produced by breast cancer cells brings about in endothelial cells expression of GDF5. The findings in (Shy et al. (2013)) suggested the important role played by TCF7L1 in breast cancer. Although these biological studies mainly focus on the the behavior of the single gene expression, our study provides evidence in the gene co-expression level that these gene expressions are related with the breast cancer.

We should point out that the two well-known genes related to the breast cancer, BRCA1 and BRCA2, were not detected by our method. This is mainly due to the fact that our method focus on the differential gene co-expressions, not the changes in the gene expression levels.

6 Other Related Problems

We have so far focused on optimal estimation of the differential correlation matrix. In addition to optimal estimation, hypothesis testing of the differential correlation matrix is also an important problem. In this section we consider testing the hypotheses H₀ : R₁ − R₂ = 0 versus H₁ : R₁ − R₂ ≠ 0 and propose a test which is particularly well suited for testing again sparse alternatives.

Similar ideas and techniques can also be used to treat several other related problems, including estimation of a single sparse correlation matrix from one random sample, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices. We also briefly discuss these problems in this section.

6.1 Testing Differential Correlation Matrices

Suppose we are given two sets of independent and identical distributed samples $X^{(t)} = {X_{1}^{(t)}, \dots, X_{n_{t}}^{(t)}}$ with the mean μ_t, covariance matrix Σ_t and correlation matrix R_t, where t = 1 and 2, and wish to test the hypotheses

H_{0} : R_{1} - R_{2} = 0 v.s. H_{1} : R_{1} - R_{2} \neq 0.

(21)

This testing problem is similar to, but also different from, testing the equality of two high-dimensional covariance matrices, which has been considered in several recent papers. See, for example, Schott (2007), Srivastava and Yanagihara (2010), Li et al. (2012), and Cai et al. (2013). Here we are particularly interested in testing against sparse alternatives and follow similar ideas as those in Cai et al. (2013).

To construct the test statistic, we need more precise understanding of the sample correlation coefficients r̂_ijt. It follows from (5) that

\begin{array}{l} {\hat{r}}_{ijt} = \frac{{\hat{σ}}_{ijt}}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} \approx \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} + \frac{{\hat{σ}}_{ijt} - σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} - \frac{σ_{ijt}}{2 {(σ_{iit} σ_{jjt})}^{1 / 2}} (\frac{{\hat{σ}}_{iit} - σ_{iit}}{{(σ_{iit} σ_{iit})}^{1 / 2}} + \frac{{\hat{σ}}_{jjt} - σ_{jjt}}{{(σ_{jjt} σ_{jjt})}^{1 / 2}}) \\ = r_{ijt} + \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} [\frac{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)}) - σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} - \frac{r_{ijt}}{2} (\frac{{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)})}^{2} - σ_{iit}}{σ_{iit}} + \frac{{(X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)})}^{2} - σ_{jjt}}{σ_{jjt}})] \end{array}

Since ${\bar{X}}_{i}^{(t)} \approx μ_{i t}, {\bar{X}}_{j}^{(t)} \approx μ_{j t}, E (X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)}) \approx σ_{ijt}$ , we introduce

η_{ijt} = var [\frac{(X_{i}^{(t)} - μ_{i t}) (X_{j}^{(t)} - μ_{j t})}{{(σ_{iit} σ_{jjt})}^{1 / 2}} - \frac{r_{ijt}}{2} (\frac{{(X_{i}^{(t)} - μ_{i t})}^{2}}{σ_{iit}} + \frac{{(X_{j}^{(t)} - μ_{j t})}^{2}}{σ_{jjt}})] .

Then asymptotically as n, p → ∞,

{\hat{r}}_{ijt} - r_{ijt} \approx {(\frac{n_{ijt}}{n_{t}})}^{1 / 2} z_{ijt}, where z_{ijt} ~ N (0, 1) .

The true value of η_ijt is unknown but can be estimated by

\begin{array}{l} {\hat{η}}_{ijt} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} {\frac{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k} - {\bar{X}}_{j}^{(t)}) - {\hat{σ}}_{ijt}}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} - \frac{{\hat{r}}_{ijt}}{2} (\frac{{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)})}^{2} - {\hat{σ}}_{iit}}{{\hat{σ}}_{iit}} + \frac{{(X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)})}^{2} - {\hat{σ}}_{jjt}}{{\hat{σ}}_{jjt}})}^{2} \\ = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} {\frac{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)}) (X_{j k} - {\bar{X}}_{j}^{(t)})}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} - \frac{{\hat{r}}_{ijt}}{2} (\frac{{(X_{i k}^{(t)} - {\bar{X}}_{i}^{(t)})}^{2}}{{\hat{σ}}_{iit}} + \frac{{(X_{j k}^{(t)} - {\bar{X}}_{j}^{(t)})}^{2}}{{\hat{σ}}_{jjt}})}^{2} . \end{array}

We define the test statistic by

T_{n} = max_{1 \leq i \leq j \leq p} T_{i j}

where

T_{i j} = \frac{{({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2})}^{2}}{{\hat{η}}_{i j 1} / n_{1} + {\hat{η}}_{i j 2} / n_{2}}, 1 \leq i, j \leq p .

Under regularity conditions (similar to (C1)–(C3) in Cai et al. (2013)), the asymptotic distribution of T_n can be shown to be the type I extreme value distribution. More precisely,

P (T_{n} - 4 log p + log log p \leq t) \to exp (- {(8 π)}^{- 1 / 2} exp (- t / 2))

(22)

for any given t ∈ ℝ.

The asymptotic null distribution (22) can then be used to construct a test for testing the hypothesis H₀ : R₁ − R₂ = 0. For a given significance level 0 < α < 1, define the test Ψ_α by

Ψ_{α} = I (M_{n} \geq 4 log p - log log p + τ_{α})

(23)

where τ_α = −log(8π) − 2 log log(1 − α)⁻¹ is the 1 − α quantile of the type I extreme value distribution with the cumulative distribution function exp(−(8π)^−1/2 exp(−x/2)). The hypothesis H₀ : R₁ − R₂ = 0 is rejected whenever Ψ_α = 1. As the test proposed in Cai et al. (2013) for testing the equality of two covariance matrices, the test Ψ_α defined in (23) can also be shown to be particularly well suited for testing H₀ : R₁ − R₂ = 0 against sparse alternatives.

6.2 Optimal Estimation of a Sparse Correlation Matrix

The ideas and technical tools can also be used for estimation of a single correlation matrix from one random sample, which is a simpler problem. Suppose we observe an independent and identical distributed sample X = (X₁, …, X_n) from a p-dimensional distribution with mean μ ∈ ℝ^p, covariance matrix Σ, and correlation matrix R ∈ ℝ^p^×^p. When R is approximately sparse, it can be naturally estimated by a thresholding estimator R̂ as follows. Let ${\bar{X}}_{i} = \frac{1}{n} \sum_{k = 1}^{n} X_{i k}$ . Define the sample covariance matrix Σ̂ = (σ̂_ij)_1≤_i_,_j_≤_p and the sample correlation matrix R̂ = (r̂_ij)_1≤_i_,_j_≤_p respectively by

{\hat{σ}}_{i j} = \frac{1}{n} \sum_{k = 1}^{n} (X_{i k} - {\bar{X}}_{i}) (X_{j k} - {\bar{X}}_{j}) and {\hat{r}}_{i j} = \frac{{\hat{σ}}_{i j}}{{({\hat{σ}}_{i i} {\hat{σ}}_{j j})}^{1 / 2}} .

Same as in (8) and (9), we define

{\hat{θ}}_{i j} = \frac{1}{n} \sum_{k = 1}^{n} {((X_{i k} - {\bar{X}}_{i}) (X_{j k} - {\bar{X}}_{j}) - {\hat{σ}}_{i j})}^{2},

(24)

{\hat{ξ}}_{i j} = \frac{{\hat{θ}}_{i j}}{{\hat{σ}}_{i i} {\hat{σ}}_{j j}} = \frac{1}{n {\hat{σ}}_{i i} {\hat{σ}}_{j j}} \sum_{k = 1}^{n} {((X_{i k} - {\bar{X}}_{i}) (X_{j k} - {\bar{X}}_{j}) - {\hat{σ}}_{i j})}^{2},

(25)

λ_{i j} = τ {(\frac{log p}{n})}^{1 / 2} ({\hat{ξ}}_{i j}^{1 / 2} + \frac{∣ {\hat{r}}_{i j} ∣}{2} ({\hat{ξ}}_{i i}^{1 / 2} + {\hat{ξ}}_{j j}^{1 / 2})),

(26)

where τ is the thresholding constant that can be chosen empirically through cross-validation. The correlation matrix R is then estimated by ${\hat{R}}^{*} = {({\hat{r}}_{i j}^{*})}_{1 \leq i, j \leq p}$ with

{\hat{r}}_{i j}^{*} = s_{λ_{i j}} ({\hat{r}}_{i j}) .

We consider the following class of approximately sparse correlation matrices

G_{q}^{1} (s_{0} (p)) = {R = (r_{i j}) : R ≻ 0; diag (R) = 1; max_{j} \sum_{i = 1, i \neq j}^{p} {∣ r_{i j} ∣}^{q} \leq s_{0} (p)}, 0 \leq q < 1.

The following theoretical results for R̂^* can be established using a similar analysis.

Proposition 6.1

Suppose log p = o(n^1/3) and X satisfies (12), (13). For τ > 6, there exists some constant C does not depend on n or p such that

sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ {\hat{R}}^{*} - R ‖}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n})}^{1 - q}

(27)

sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ {\hat{R}}^{*} - R ‖}_{ℓ_{1}}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n})}^{1 - q}

(28)

sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ {\hat{R}}^{*} - R ‖}_{F}^{2} \leq C p (s_{0} (p) + 1) {(\frac{log p}{n})}^{1 - q / 2} .

(29)

Moreover, when log p = o(n), s₀(p) ≤ Mn⁽¹⁻^q^)/2(log p)⁻⁽³⁻^q^)/2 for some constant M > 0, the rate in (27) is optimal as we also have the lower bound

inf_{\hat{R}} sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ \hat{R} - R ‖}^{2} \geq {c s}_{0}^{2} (p) {(\frac{log p}{n})}^{1 - q}

(30)

inf_{\hat{R}} sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ \hat{R} - R ‖}_{ℓ_{1}}^{2} \geq {c s}_{0}^{2} (p) {(\frac{log p}{n})}^{1 - q}

(31)

inf_{\hat{R}} sup_{R \in G_{q}^{1} (s_{0} (p))} E {‖ \hat{R} - R ‖}_{F}^{2} \geq {cps}_{0} (p) {(\frac{log p}{n})}^{1 - q / 2} .

(32)

Remark 6.1

Cai and Liu (2011) proposed an adaptive thresholding estimator Σ̂^* of a sparse covariance matrix Σ. This estimator leads naturally to an estimator R̃ = (r̃_ij) of a sparse correlation matrix R by normalizing ${\sum^{^}}^{*} = ({\hat{σ}}_{i j}^{*})$ via ${\tilde{r}}_{i j} = {\hat{σ}}_{i j}^{*} {(σ_{i i}^{*} σ_{j j}^{*})}^{- 1 / 2}$ . The correlation matrix estimator R̃ has similar properties as the estimator introduced above. For example, R̃ and R̂^* achieve the same rate of convergence.

6.3 Optimal Estimation of Sparse Differential Covariance Matrices

Our analysis can also be used for estimation of sparse differential covariance matrices, Δ = Σ₁ − Σ₂. Define θ_ijt as in (3) and its estimate θ̂_ijt as in (8). Similar to the estimation of the differential correlation matrix D = R₁ − R₂, we estimate Δ = Σ₁ − Σ₂ by adaptive entrywise thresholding. Specifically, we define the thresholding estimator ${\hat{Δ}}^{*} = ({\hat{δ}}_{i j}^{*}) \in ℝ^{p \times p}$ by

{\hat{δ}}_{i j}^{*} = s_{γ_{i j}} ({\hat{σ}}_{i j 1} - {\hat{σ}}_{i j 2}), 1 \leq i, j \leq p

(33)

where γ_ij is the thresholding level given by

γ_{i j} = τ ({(\frac{log p}{n_{1}} {\hat{θ}}_{i j 1})}^{1 / 2} + {(\frac{log p}{n_{2}} {\hat{θ}}_{i j 2})}^{1 / 2}) .

(34)

Same as in the last section, here s_λ(z) belongs to the class of thresholding functions satisfying Conditions (C1)–(C3) and the thresholding constant τ can be taken chosen empirically by cross-validation.

We consider the following class of paired covariance matrices with approximately sparse differences, for 0 ≤ q < 1,

F_{q} (s_{0} (p)) ≜ {(\sum_{1}, \sum_{2}) : \sum_{1}, \sum_{2} ≽ 0, max_{1 \leq i \leq p, t = 1, 2} σ_{iit} \leq B, max_{i} \sum_{j = 1}^{p} {∣ σ_{i j 1} - σ_{i j 2} ∣}^{q} \leq s_{0} (p)} .

(35)

Under the same conditions as those in Theorems 3.1 and 3.2, a similar analysis can be used to derive the minimax upper and lower bounds. It can be shown that the estimator Δ̂^* given in (33) with τ > 4 satisfies

sup_{(\sum_{1}, \sum_{2}) \in F_{q} (s_{0} (p))} E {‖ {\hat{Δ}}^{*} - (\sum_{1} - \sum_{2}) ‖}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(36)

for some constant C > 0. Furthermore, the following minimax lower bound holds,

inf_{\hat{Δ}} sup_{(\sum_{1}, \sum_{2}) \in F_{q} (s_{0} (p))} E {‖ \hat{Δ} - (\sum_{1} - \sum_{2}) ‖}^{2} \geq {c s}_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(37)

for some constant c > 0. Equations (36) and (37) together show that the thresholding estimator Δ̂^* defined in (33) and (34) is rate-optimal.

6.4 Estimate Differential Cross-Correlation Matrices

In many applications such as phenome-wide association studies (PheWAS) which aims to study the relationship between a set of genomic markers X and a range of phenotypes Y, the main focus is on the cross-correlations between the components of X and those of Y. That is, the object of interest is a submatrix of the correlation matrix of the random vector $(\begin{matrix} X \\ Y \end{matrix})$ . More specifically, let X = (X₁, …, X_p₁)′ be a p₁-dimensional random vector and Y = (Y₁, …, Y_p₂)′ be a p₂-dimensional random vector. In the PheWAS setting, X may be all phenotypic disease conditions of interest and Y is a vector of genomic markers.

Suppose we have two independent and identical distributed samples of the (X, Y) pairs, one for the case group and one for the control group,

(\begin{matrix} X_{k}^{(1)} \\ Y_{k}^{(1)} \end{matrix}) = (\begin{matrix} X_{1 k}^{(1)} \\ ⋮ \\ X_{p_{1} k}^{(1)} \\ Y_{1 k} \\ ⋮ \\ Y_{p_{2} k}^{(1)} \end{matrix}), k = 1, \dots, n_{1}; (\begin{matrix} X_{k}^{(2)} \\ Y_{k}^{(2)} \end{matrix}) = (\begin{matrix} X_{1 k}^{(2)} \\ ⋮ \\ X_{p_{1} k}^{(2)} \\ Y_{1 k} \\ ⋮ \\ Y_{p_{2} k}^{(2)} \end{matrix}), k = 1, \dots, n_{2}

Here for $t = 1, 2, {(X_{k}^{(t) T}, Y_{k}^{(t) T})}^{T}$ , k = 1, …, n₁ are independent and identical distributed samples generated from some distribution with mean μ_t, covariance matrix Σ_t and correlation matrix R_t given by

μ_{t} = (\begin{matrix} μ_{X t} \\ μ_{Y t} \end{matrix}), \sum_{t} = [\begin{matrix} \sum_{XXt} & \sum_{XYt} \\ \sum_{YXt} & \sum_{YYt} \end{matrix}], R_{t} = [\begin{matrix} R_{XXt} & R_{XYt} \\ R_{YXt} & R_{YYt} \end{matrix}]

In applications such as PheWAS, it is of special interest to estimate the differential cross-correlation matrix of X and Y, i.e. D_XY = R_XY₁ − R_XY₂ ∈ ℝ^p₁×p₂. Again, we introduce the following set of paired correlation matrices with sparse cross-correlations,

G_{q} (s_{0} (p_{1}, p_{2})) = {(R_{1}, R_{2}) : R_{1}, R_{2} ≽ 0, diag (R_{1}) = diag (R_{2}) = 1; max_{1 \leq i \leq p_{1}} \sum_{j = 1}^{p_{2}} {∣ {(r_{X Y})}_{i j 1} - {(r_{X Y})}_{i j 2} ∣}^{q} \leq s_{0} (p_{1}, p_{2})}, 0 \leq q < 1.

The thresholding procedure proposed in Section 2 can be applied to estimate D_XY,

{({\hat{d}}_{X Y}^{*})}_{i j} = s_{λ_{i j}} ({({\hat{r}}_{X Y})}_{i j 1} - {({\hat{r}}_{X Y})}_{i j 2}), 1 \leq i \leq p_{1}, 1 \leq j \leq p_{2}

(38)

where R̂_XY is sample cross-correlation matrix of X and Y; λ_ij is given by (10). Similar to Theorem 3.1, the following theoretical results hold for the estimator ${\hat{D}}_{X Y}^{*} = ({\hat{d}}_{X Y}^{*})$ .

Proposition 6.2

Suppose p = p₁ + p₂, log(p) = o(min(n₁, n₂)^1/3) and (12) and (13) hold. Suppose the thresholding function s_λ(z) satisfies Conditions (C1)–(C3). Then D̂^* defined in (38) with the thresholding constant τ > 4 satisfies

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p_{1}, p_{2}))} E {‖ {\hat{D}}_{X Y}^{*} - (R_{X Y 1} - R_{X Y 2}) ‖}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(39)

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p_{1}, p_{2}))} E {‖ {\hat{D}}_{X Y}^{*} - {(R_{X Y 1} - R_{X Y 2})}^{⊤} ‖}_{ℓ_{1}}^{2} \leq C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q}

(40)

sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p_{1}, p_{2}))} E {‖ {\hat{D}}_{X Y}^{*} - (R_{X Y 1} - R_{X Y 2}) ‖}_{F}^{2} \leq C p (s_{0} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q / 2}

(41)

for some constant C > 0 that does not depend on n₁, n₂ or p.

The proof of Proposition 6.2 is similar to that of Theorem 3.1 by analyzing the block D̂_XY − (R_XY₁ − R_XY₂) instead of the whole matrix D* − (R₁ − R₂). We omit the detailed proof here.

Table 2.

Comparison of D̂^* with ${\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}$ and R̂₁ − R̂₂ under Model 2.

n₁

n₂

Hard

Adaptive Lasso

Sample

D̂^*

{\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}

{\hat{R}}_{1}^{•} - {\hat{R}}_{2}^{•}

D̂^*

{\hat{R}}_{1}^{*} - {\hat{R}}_{2}^{*}

{\hat{R}}_{1}^{•} - {\hat{R}}_{2}^{•}

R̂₁ − R̂₂

Spectral Norm

100

0·98(1·00)

4·61(1·49)

7·25(0·87)

0·71(0·70)

4·47(1·44)

6·05(0·74)

8·29(0·98)

100

0·70(0·51)

2·88(0·81)

5·01(0·57)

0·62(0·47)

2·93(0·87)

4·25(0·52)

5·83(0·59)

100

200

0·60(0·35)

1·93(0·55)

3·53(0·42)

0·48(0·24)

1·98(0·57)

2·98(0·37)

4·07(0·47)

100

500

0·47(0·14)

1·23(0·27)

2·32(0·23)

0·46(0·17)

1·30(0·36)

1·98(0·21)

2·66(0·27)

500

0·97(0·99)

5·03(1·00)

20·61(1·07)

0·80(0·75)

4·55(0·96)

16·91(0·90)

24·96(1·16)

500

100

0·79(0·62)

3·17(0·49)

13·64(0·59)

0·59(0·41)

3·14(0·63)

11·13(0·53)

16·39(0·71)

500

200

0·60(0·36)

2·13(0·30)

9·12(0·42)

0·51(0·31)

2·11(0·35)

7·44(0·37)

10·94(0·50)

500

0·51(0·20)

1·34(0·16)

5·63(0·23)

0·49(0·20)

1·35(0·22)

4·65(0·21)

6·78(0·29)

Matrix ℓ₁ Norm

100

1·84(2·66)

10·61(3·48)

19·11(1·55)

1·26(1·86)

9·88(3·14)

16·18(1·55)

21·92(1·61)

100

1·18(1·44)

6·73(2·24)

13·62(1·08)

1·10(1·26)

6·73(2·12)

11·73(1·20)

15·87(1·11)

100

200

0·98(0·98)

4·53(1·47)

9·79(0·85)

0·71(0·71)

4·68(1·46)

8·39(0·89)

11·37(0·97)

100

500

0·67(0·48)

2·95(0·89)

6·44(0·56)

0·65(0·59)

3·08(1·03)

5·58(0·47)

7·47(0·53)

500

1·79(2·65)

11·03(2·80)

79·71(2·64)

1·64(2·26)

10·38(3·02)

64·46(2·73)

97·88(2·55)

500

100

1·45(1·75)

7·66(1·79)

56·52(2·16)

1·02(1·35)

7·73(2·40)

45·65(1·86)

69·42(1·76)

500

200

1·02(1·18)

4·97(1·09)

39·86(1·39)

0·83(1·14)

5·03(1·27)

31·90(1·15)

49·11(1·33)

500

0·81(0·70)

3·15(0·72)

25·34(0·77)

0·82(0·77)

3·27(1·00)

20·39(0·77)

31·36(0·79)

Frobenious Norm

100

3·36(2·53)

13·82(1·83)

18·46(0·81)

2·66(1·47)

12·13(2·00)

15·87(0·79)

19·92(0·94)

100

2·67(1·19)

9·46(1·28)

13·26(0·55)

2·54(1·10)

8·77(1·18)

11·51(0·54)

14·32(0·58)

100

200

2·43(0·69)

6·94(0·72)

9·75(0·39)

2·26(0·51)

6·68(0·76)

8·59(0·37)

10·49(0·43)

100

500

2·24(0·34)

5·29(0·36)

6·96(0·19)

2·25(0·46)

5·28(0·44)

6·33(0·17)

7·39(0·19)

500

6·77(4·86)

34·24(3·33)

91·09(0·85)

6·18(3·86)

27·39(3·39)

75·97(0·83)

100·71(0·92)

500

100

500

6·19(2·98)

22·76(1·92)

64·37(0·56)

5·30(1·79)

20·12(2·21)

53·72(0·56)

71·23(0·58)

500

200

5·32(1·49)

16·34(1·18)

45·79(0·44)

5·10(1·36)

15·01(1·15)

38·36(0·42)

50·61(0·43)

500

5·00(0·62)

12·14(0·59)

29·77(0·27)

4·99(0·69)

11·76(0·70)

25·27(0·24)

32·80(0·25)

Open in a new tab

Appendix: Proofs

We prove the main theorems in the Appendix. Throughout the Appendix, we denote by C a constant which does not depend on p, n₁ and n₂, and may vary from place to place.

Proof of Theorem 3.1

To prove this theorem, we consider the following three events separately,

A_{1} = {max_{ijt} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{({\hat{θ}}_{ijt} log p / n_{t})}^{1 / 2}} \leq \frac{τ}{4} + 3, and max_{ijt} \frac{∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣}{σ_{iit} σ_{jjt}} \leq ε}

(42)

A_{2} = {max_{ijt} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{({\hat{θ}}_{ijt} log p / n_{t})}^{1 / 2}} > \frac{τ}{4} + 3, max_{ijt} \frac{∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣}{σ_{iit} σ_{jjt}} \leq ε, and max_{ijt} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{(σ_{iit} σ_{jjt})}^{1 / 2}} \leq min (0.5, C_{1} C_{3})}

(43)

A_{3} = {(A_{1} \cup A_{2})}^{c} .

(44)

Here ε is the fixed constant which satisfies 0 < ε < ν₀/2 where ν₀ is introduce in (13); C₁ and C₃ are constants which do not depends on p, n₁, n₂ and shall be specified later in Lemma 6.1.

1. First we would like to show that under the event A₁,

{‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} \leq {C s}_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q},

(45)

{‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} \leq {C s}_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q},

(46)

{‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} \leq {Cps}_{0} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q / 2} .

(47)

In fact,

{E Y}_{i}^{(t) 4} \leq \frac{2 E exp (η Y_{i}^{(t) 2})}{η^{2}} \leq \frac{2 K}{η^{2}}

for all 1 ≤ i ≤ p, so

\begin{array}{l} θ_{ijt} = Var (X_{i}^{(t)} - μ_{i}) (X_{j}^{(t)} - μ_{j}) \leq E {(X_{i}^{(t)} - μ_{i})}^{2} {(X_{j}^{(t)} - μ_{j})}^{2} \\ \leq {(E {(X_{i}^{(t)} - μ_{i})}^{4} E {(X_{j}^{(t)} - μ_{j})}^{4})}^{1 / 2} = σ_{iit} σ_{jjt} {({E Y}_{i}^{(t) 4} {E Y}_{j}^{(t) 4})}^{1 / 2} \leq C σ_{iit} σ_{jjt}, \\ θ_{ijt} = Var (Y_{i} Y_{j}) \cdot σ_{iit} σ_{jjt} \overset{(13)}{\geq} ν_{0} σ_{iit} σ_{jjt} . \end{array}

(48)

So by the definition of A₁, we have

{\hat{θ}}_{ijt} \leq θ_{ijt} + ∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣ \leq C σ_{iit} σ_{jjt}, for all i, j, t,

(49)

{\hat{θ}}_{ijt} \geq θ_{ijt} - ∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣ \geq ν_{0} σ_{iit} σ_{jjt} - ε σ_{iit} σ_{jjt} \geq \frac{ν_{0}}{2} σ_{iit} σ_{jjt} .

(50)

Hence,

| \frac{{\hat{σ}}_{iit}}{σ_{iit}} - 1 | \leq \frac{∣ {\hat{σ}}_{iit} - σ_{iit} ∣}{σ_{iit}} \overset{(42)}{\leq} \frac{τ / 4 + 3}{σ_{iit}} {(log p \frac{{\hat{θ}}_{iit}}{n_{t}})}^{1 / 2} \overset{(49)}{\leq} C {(\frac{log p}{n_{t}})}^{1 / 2}

(51)

| \frac{σ_{iit}}{{\hat{σ}}_{iit}} - 1 | \leq \frac{∣ {\hat{σ}}_{iit} - σ_{iit} ∣}{{\hat{σ}}_{iit}} \overset{(42)}{\leq} (τ / 4 + 3) {(\frac{log p}{n_{t}})}^{1 / 2} \frac{{\hat{θ}}_{iit}^{1 / 2}}{{\hat{σ}}_{iit}}

(52)

Suppose x = σ_iit/σ̂_iit, y = σ_jjt/σ̂_jjt. By (51) and ${(\frac{log p}{n_{t}})}^{1 / 2} \to 0$ , we have $max {∣ x - 1 ∣, ∣ y - 1 ∣} \leq C {(\frac{log p}{n_{t}})}^{1 / 2}$ when n_t is large enough. Thus for large n_t, we obtain

\begin{array}{l} | {(\frac{σ_{iit} σ_{jjt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}})}^{1 / 2} - 1 | = | {(x y)}^{1 / 2} - 1 | = \frac{∣ x y - 1 ∣}{{(x y)}^{1 / 2} + 1} \leq \frac{∣ x - 1 ∣ + x ∣ y - 1 ∣}{2 - max (∣ x - 1 ∣, ∣ y - 1 ∣)} \\ \leq \frac{max (1, x)}{2 - max (∣ x - 1 ∣, ∣ y - 1 ∣)} (∣ x - 1 ∣ + ∣ y - 1 ∣) \\ \leq (\frac{1}{2} + O ({(\frac{log p}{n_{t}})}^{1 / 2})) (∣ x - 1 ∣ + ∣ y - 1 ∣) . \end{array}

(53)

It then follows from the assumption $log p = o (n_{t}^{1 / 3})$ that for large n_t,

{\hat{ξ}}_{ijt} = \frac{{\hat{θ}}_{ijt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}} \overset{(49)}{\leq} \frac{C σ_{iit} σ_{jjt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}} \overset{(51)}{\leq} C

(54)

and

\begin{array}{l} ∣ {\hat{r}}_{ijt} - r_{ijt} ∣ = | \frac{{\hat{σ}}_{ijt}}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} - \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} | \leq \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} + \frac{∣ σ_{ijt} ∣}{{(σ_{iit} σ_{jjt})}^{1 / 2}} | {(\frac{σ_{iit} σ_{jjt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}})}^{1 / 2} - 1 | \\ \overset{(42) (53)}{\leq} (\frac{τ}{4} + 3) {(\frac{log p}{n_{t}} \frac{{\hat{θ}}_{ijt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}})}^{1 / 2} + ∣ r_{ijt} ∣ (\frac{1}{2} + O ({(\frac{log p}{n_{t}})}^{1 / 2})) (| \frac{σ_{iit}}{{\hat{σ}}_{iit}} - 1 | + | \frac{σ_{jjt}}{{\hat{σ}}_{jjt}} - 1 |) \\ \overset{(52)}{\leq} (\frac{τ}{4} + 3) {(\frac{log p}{n_{t}})}^{1 / 2} ({(\frac{{\hat{θ}}_{ijt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}})}^{1 / 2} + ∣ r_{ijt} ∣ (\frac{1}{2} + O ({(\frac{log p}{n_{t}})}^{1 / 2})) (\frac{{\hat{θ}}_{iit}^{1 / 2}}{{\hat{σ}}_{iit}} + \frac{{\hat{θ}}_{jjt}^{1 / 2}}{{\hat{σ}}_{jjt}})) \\ \leq (\frac{τ}{2} + 2) {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ r_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2})) \\ \leq (\frac{τ}{2} + 2) {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ {\hat{r}}_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2})) + ∣ {\hat{r}}_{ijt} - r_{ijt} ∣ (\frac{τ}{4} + 1) {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2}) \\ \overset{(54)}{\leq} (\frac{τ}{2} + 2) {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ {\hat{r}}_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2})) + C {(\frac{log p}{n_{t}})}^{1 / 2} ∣ {\hat{r}}_{ijt} - r_{ijt} ∣ . \end{array}

We shall note the difference between $\frac{∣ r_{ijt} ∣}{2}$ and $\frac{∣ {\hat{r}}_{ijt} ∣}{2}$ above. Next, we rearrange the inequality above and write it into an inequality for |r̂_ijt − r_ijt|,

\begin{array}{l} ∣ {\hat{r}}_{ijt} - r_{ijt} ∣ \leq \frac{(\frac{τ}{2} + 2) {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ {\hat{r}}_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2}))}{1 - C {(\frac{log p}{n_{t}})}^{1 / 2}} \\ \leq τ {(\frac{log p}{n_{t}})}^{1 / 2} ({\hat{ξ}}_{ijt}^{1 / 2} + \frac{∣ {\hat{r}}_{ijt} ∣}{2} ({\hat{ξ}}_{iit}^{1 / 2} + {\hat{ξ}}_{jjt}^{1 / 2})) \overset{(10)}{=} λ_{ijt} . \end{array}

(55)

(55) implies

∣ ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq λ_{i j 1} + λ_{i j 2} = λ_{i j} holds for all 1 \leq i, j \leq p

(56)

Next, by (56) and (C1) and (C3) of s_λ(z),

\begin{array}{l} ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) ∣ + ∣ (r_{i j 1} - r_{i j 2}) ∣ \\ \leq (1 + c) ∣ r_{i j 1} - r_{i j 2} ∣, \end{array}

(57)

∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) ∣ + ∣ ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq 2 λ_{i j},

(58)

which implies

∣ s_{r_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq {(2 λ_{i j})}^{1 - q} {(1 + c)}^{q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q},

(59)

{∣ s_{r_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣}^{2} \leq {(2 λ_{i j})}^{2 - q} {(1 + c)}^{q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q},

(60)

where 0 ≤ q < 1. Hence,

\begin{array}{l} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}} = max_{i} \sum_{j = 1}^{p} ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \\ \overset{(59)}{\leq} max_{i} 2^{1 - q} {(1 + c)}^{q} \sum_{j = 1}^{p} λ_{i j}^{1 - q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q} \\ \overset{(10)}{\leq} max_{i} 2^{1 - q} {(1 + c)}^{q} \sum_{j = 1}^{p} {τ^{1 - q} {(log p)}^{(1 - q) / 2} \times {(\frac{{\hat{ξ}}_{i j 1}^{1 / 2} + ∣ {\hat{r}}_{i j 1} ∣ ({\hat{ξ}}_{i i 1}^{1 / 2} + {\hat{ξ}}_{j j 1}^{1 / 2}) / 2}{n_{1}^{1 / 2}} + \frac{{\hat{ξ}}_{i j 2}^{1 / 2} + ∣ {\hat{r}}_{i j 2} ∣ ({\hat{ξ}}_{i i 2}^{1 / 2} + {\hat{ξ}}_{j j 2}^{1 / 2}) / 2}{n_{2}^{1 / 2}})}^{1 - q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q}} \\ \overset{(54)}{\leq} max_{i} C \sum_{j = 1}^{p} {{(log p)}^{(1 - q) / 2} {(\frac{1}{n_{1}} + \frac{1}{n_{2}})}^{(1 - q) / 2} {∣ r_{i j 1} - r_{i j 2} ∣}^{q}} \\ \leq C s_{0} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{(1 - q) / 2} . \end{array}

which yields to (46). (45) also holds due to the fact that ||A||₂ ≤ ||A||_L₁ for any symmetric matrix A. Similarly,

\begin{array}{l} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} = \sum_{i = 1}^{p} \sum_{j = 1}^{p} {∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣}^{2} \\ \overset{(60)}{\leq} 2^{2 - q} {(1 + c)}^{q} \sum_{i = 1}^{p} \sum_{j = 1}^{p} λ_{i j}^{2 - q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q} \\ \overset{(10)}{\leq} 2^{2 - q} {(1 + c)}^{q} \sum_{i = 1}^{p} \sum_{j = 1}^{p} {τ^{2 - q} {(log p)}^{(2 - q) / 2} \times {(\frac{{\hat{ξ}}_{i j 1}^{1 / 2} + ∣ {\hat{r}}_{i j 1} ∣ ({\hat{ξ}}_{i i 1}^{1 / 2} + {\hat{ξ}}_{j j 1}^{1 / 2}) / 2}{n_{1}^{1 / 2}} + \frac{{\hat{ξ}}_{i j 2}^{1 / 2} + ∣ {\hat{r}}_{i j 2} ∣ ({\hat{ξ}}_{i i 2}^{1 / 2} + {\hat{ξ}}_{j j 2}^{1 / 2}) / 2}{n_{2}^{1 / 2}})}^{2 - q} {∣ r_{i j 1} - r_{i j 2} ∣}^{q}} \\ \overset{(54)}{\leq} C p max_{i} \sum_{j = 1}^{p} {{(log p)}^{(2 - q / 2)} {(\frac{1}{n_{1}} + \frac{1}{n_{2}})}^{(2 - q) / 2} {∣ r_{i j 1} - r_{i j 2} ∣}^{q}} \\ \leq {Cps}_{0} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q / 2} . \end{array}

which implies (47).

2. For A₂, we wish to prove,

\int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} d P \leq C (p^{- τ / 4 + 1} log p) (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(61)

\int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} d P \leq C (p^{- τ / 4 + 1} log p) (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(62)

\int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} d P \leq C (p^{- τ / 4 + 1} log p) (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(63)

In order to prove these probability bounds, we introduce the following lemma, which revealed the relationship between θ̂_ijt, θ_ijt and σ̂_ijt, σ_ijt.

Lemma 6.1

For any τ > 0,

pr (max_{i, j, t} ∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣ > (τ / 4 + 3) {({\hat{θ}}_{ijt} log p / n_{t})}^{1 / 2}) \leq C {(log p)}^{- 1 / 2} p^{- τ / 4 - 1};

(64)

There exist constants C₁, C₂, C₃ which do not depend on p, n₁, n₂ such that

pr (max_{i, j} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{(σ_{iit} σ_{jjt})}^{1 / 2}} > C_{1} x) \leq C_{2} p^{2} (exp (- n_{t} x^{2})), for all 0 < x \leq C_{3}, t = 1, 2;

(65)

For any ε > 0 and M > 0,

pr (max_{i, j, t} ∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣ / (σ_{iit} σ_{jjt}) > ε) \leq C p^{- M} (1 / n_{1} + 1 / n_{2})

(66)

The proof of Lemma 6.1 is given later. Note that (64) immediately leads to

pr (A_{2}) \leq C {(log p)}^{- 1 / 2} p^{- τ / 4 - 1} .

(67)

By the definition of A₂ (43), we still have (49). Besides, by the definition of A₂, $\frac{∣ {\hat{σ}}_{iit} - σ_{iit} ∣}{σ_{iit}} \leq 0.5$ , which leads to σ̂_iit ≥ 0.5σ_iit. Thus,

{\hat{ξ}}_{ijt} = \frac{{\hat{θ}}_{ijt}}{{\hat{σ}}_{iit} {\hat{σ}}_{jjt}} \leq \frac{C σ_{iit} σ_{jjt}}{(0.5 σ_{iit}) (0.5 σ_{jjt})} = 4 C .

(68)

For convenience, we denote the random variable

T = max_{ijt} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{(σ_{iit} σ_{jjt})}^{1 / 2}} .

(69)

Under A₂, we have T ≤ 0.5. Then for all 1 ≤ i, j ≤ p, t = 1, 2,

\begin{array}{l} {\hat{r}}_{ijt} - r_{ijt} = \frac{{\hat{σ}}_{ijt}}{{({\hat{σ}}_{iit} {\hat{σ}}_{jjt})}^{1 / 2}} - \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} \\ = \frac{\frac{{\hat{σ}}_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}}}{{({\hat{σ}}_{iit} / σ_{iit})}^{1 / 2} \times {({\hat{σ}}_{jjt} / σ_{jjt})}^{1 / 2}} - \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} \\ \leq \frac{\frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} + T}{{(σ_{iit} / σ_{iit} - T)}^{1 / 2} \times {(σ_{jjt} / σ_{jjt} - T)}^{1 / 2}} - \frac{σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} \\ = \frac{r_{ijt} + T}{1 - T} - r_{ijt} \\ \leq (1 + 2 T) (r_{ijt} + T) - r_{ijt} \\ \leq 4 T . \end{array}

Similarly calculation also leads to r̂_ijt − r_ijt ≥ −4T. Then, by (C3) of s_{λ_ij} (z),

\begin{array}{l} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} = max_{i} {(\sum_{j = 1}^{p} ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣)}^{2} \\ \leq max_{i} {(\sum_{j = 1}^{p} (∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) ∣ + ∣ ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣))}^{2} \\ \leq max_{i} {(\sum_{j = 1}^{p} (λ_{i j} + 8 T))}^{2} \overset{(10) (68)}{\leq} C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}} + T^{2}) . \end{array}

(70)

In addition, due to ||·||_ℓ₁ ≥ ||·||, we also have ${‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} \leq C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}} + T^{2})$ . Similarly,

\begin{array}{l} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} = \sum_{i = 1}^{p} \sum_{j = 1}^{p} {∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣}^{2} \\ \leq 2 \sum_{i = 1}^{p} \sum_{j = 1}^{p} (∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - {({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2})}^{2} ∣ + {∣ ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣}^{2}) \\ \leq C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}} + T^{2}) \end{array}

(71)

Therefore,

\begin{array}{l} \int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} d P \overset{(70)}{\leq} \int_{A_{2}} C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}} + T^{2}) d P \\ \leq C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) pr (A_{2}) + C p^{2} \int_{0}^{min (0.5, C_{1} C_{3})} 2 x pr ({T \geq x} \cap A_{2}) d x \\ \leq C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) pr (A_{2}) + C p^{2} \int_{0}^{C_{1} {(M log p (1 / n_{1} + 1 / n_{2}))}^{1 / 2}} 2 x pr (A_{2}) d x + \int_{C_{1} {(M log p (1 / n_{1} + 1 / n_{2}))}^{1 / 2}}^{min (0.5, C_{1} C_{3})} 2 x pr (T \geq x) d x \\ \overset{(65)}{\leq} C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) pr (A_{2}) + C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) pr (A_{2}) + \int_{C_{1} {(M log p (1 / n_{1} + 1 / n_{2}))}^{1 / 2}}^{+ \infty} 2 {x C}_{2} (exp (- n_{1} {(x / C_{1})}^{2}) + exp (- n_{2} {(x / C_{1})}^{2})) d x \\ \leq C p^{2} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) pr (A_{2}) \\ + C (\frac{1}{n_{1}} exp (- n_{1} {(x / C_{1})}^{2}) + \frac{1}{n_{2}} exp (- n_{2} {(x / C_{1})}^{2})) |_{+ \infty}^{C_{1} {(M log p (1 / n_{1} + 1 / n_{2}))}^{1 / 2}} \\ \overset{(67)}{\leq} C p^{- τ / 4 + 1} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) + C p^{- M} (\frac{1}{n_{1}} + \frac{1}{n_{2}}) \end{array}

(72)

Similarly, we have

\begin{array}{l} \int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} d P \leq C p^{- τ / 4 + 1} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) + C p^{- M} (\frac{1}{n_{1}} + \frac{1}{n_{2}}), \\ \int_{A_{2}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} d P \leq C p^{- τ / 4 + 1} (\frac{log p}{n_{1}} + \frac{log p}{n_{2}}) + C p^{- M} (\frac{1}{n_{1}} + \frac{1}{n_{2}}), \end{array}

which finishes the proof of (61), (62) and (63) when we choose M > τ /4 − 1.

3. For A₃, (66) and log p = o(n^1/3) leads to

\begin{array}{l} pr (A_{3}) \leq pr (max_{ijt} \frac{∣ {\hat{θ}}_{ijt} - θ_{ijt} ∣}{σ_{iit} σ_{jjt}} > ε) + pr (max_{ijt} \frac{∣ {\hat{σ}}_{ijt} - σ_{ijt} ∣}{{(σ_{iit} σ_{jjt})}^{1 / 2}} > min (0.5, C_{1} C_{3})) \\ \leq C p^{- M} (1 / n_{1} + 1 / n_{2}) + C_{2} p^{2} (exp (- n_{1} min {(\frac{1}{2 C_{1}}, C_{3})}^{2}) + exp (- n_{2} min {(\frac{1}{2 C_{1}}, C_{3})}^{2})) \\ = C p^{- M} (1 / n_{1} + 1 / n_{2}) \end{array}

(73)

\begin{array}{l} ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣ \leq ∣ r_{i j 1} ∣ + ∣ r_{i j 2} ∣ + ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) ∣ \\ \leq 2 + c ∣ {\hat{r}}_{i j 1} - {\hat{r}}_{i j 2} ∣ \leq 2 + 2 c \end{array}

which yields

{‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} = max_{i} {(\sum_{j = 1}^{p} ∣ s_{λ_{i j}} ({\hat{r}}_{i j 1} - {\hat{r}}_{i j 2}) - (r_{i j 1} - r_{i j 2}) ∣)}^{2} \leq {(2 + 2 c)}^{2} p^{2}

(74)

Similarly, ||D̂^* − (R₁ − R₂)||² ≤ (2 + 2c)²p², ${‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} \leq {(2 + 2 c)}^{2} p^{2}$ . Therefore,

\int_{A_{3}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} d P \overset{(73)}{\leq} C p^{- M + 2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(75)

\int_{A_{3}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{ℓ_{1}}^{2} d P \overset{(73)}{\leq} C p^{- M + 2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(76)

\int_{A_{3}} {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}_{F}^{2} d P \overset{(73)}{\leq} C p^{- M + 2} (\frac{1}{n_{1}} + \frac{1}{n_{2}})

(77)

Finally, we combine the situations of A₁, A₂ and A₃. When τ > 4 and M > 2, we have

\begin{matrix} E {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} = (\int_{A_{1}} + \int_{A_{2}} + \int_{A_{3}}) {‖ {\hat{D}}^{*} - (R_{1} - R_{2}) ‖}^{2} d P \\ \overset{(45) (61) (75)}{\leq} C (s_{0}^{2} (p) + 1) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q} \end{matrix}

(78)

which has proved (14). (15) and (16) can be proved similarly by (46), (62), (76) and (47), (63), (77).

Proof of Lemma 6.1

(64) is directly from (25) in Cai and Liu (2011). For (66), the proof is essentially the same as the proof of (26) in Cai and Liu (2011) as long as we use x = ((M + 2) log p + log n)^1/2 in stead of x = ((M + 2) log p)^1/2 in their proof. Now we mainly focus on the proof of (65). Without loss of generality, we can translate X and assume that μ₁ = μ₂ = 0. Note that we have the following formulation,

\frac{{\hat{σ}}_{ijt} - σ_{ijt}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} = \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} \frac{(X_{i k}^{(t)} X_{j k}^{(t)} - σ_{ijt})}{{(σ_{iit} σ_{jjt})}^{1 / 2}} - \frac{{\bar{X}}_{i}^{(t)} {\bar{X}}_{j}^{(t)}}{{(σ_{iit} σ_{jjt})}^{1 / 2}} = (\frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} (Y_{i k}^{(t)} Y_{j k}^{(t)} - r_{ijt}) - {\bar{Y}}_{i}^{(t)} {\bar{Y}}_{j}^{(t)})

(79)

Since

\begin{array}{l} E {(Y_{i}^{(t)} Y_{j}^{(t)} - r_{ijt})}^{2} e^{\frac{η}{2} ∣ Y_{i}^{(t)} Y_{j}^{(t)} - r_{ijt} ∣} \leq \frac{4}{η^{2}} E e^{η ∣ Y_{i}^{(t)} Y_{j}^{(t)} - r_{ijt} ∣} \leq \frac{4}{η^{2}} E e^{η (Y_{i}^{(t)} Y_{j}^{(t)} - r_{ijt})} + \frac{4}{η^{2}} E e^{- η (Y_{i}^{(t)} Y_{j}^{(t)} - r_{ijt})} \\ \leq \frac{8}{η^{2}} (E e^{∣ η ∣ Y_{i}^{(t) 2}} + E e^{∣ η ∣ Y_{j}^{(t) 2}}) e^{∣ η r_{ijt} ∣} \leq C_{4} \end{array}

where C₄ is a constant which does not depend on n₁, n₂, p. Thus, we set ${\bar{B}}_{n}^{2} = n_{t} C_{1}$ ; based on lemma 1 in Cai and Liu (2011), we have

pr (| \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} (Y_{i k}^{(t)} Y_{j k}^{(t)} - r_{ijt}) | \geq C_{η / 2} C_{4}^{1 / 2} x) \leq exp (- n_{t} x^{2}) .

(80)

for all $0 < x \leq C_{1}^{1 / 2}$ , where C_η/₂ = η/2 + 2/η. Next for ${\bar{Y}}_{i}^{(t)}$ , we similarly apply Lemma 1 in Cai and Liu (2011) and get

pr (∣ {\bar{Y}}_{i}^{(t)} ∣ \geq C_{5} x) \leq exp (- n_{t} x^{2})

(81)

for all $0 < x \leq C_{5}^{1 / 2}$ . Combining (80) and (81),

pr (max_{i j} | \frac{1}{n_{t}} \sum_{k = 1}^{n_{t}} (Y_{i k}^{(t)} Y_{j k}^{(t)} - r_{ijt}) | \leq C_{η / 2} C_{4}^{1 / 2} x and max_{i, t} ∣ {\bar{Y}}_{i}^{(t)} ∣ \leq C_{5} x) \leq 1 - 2 p^{2} exp (- n_{t} x^{2})

(82)

for all $0 < x \leq min (C_{1}^{1 / 2}, C_{5}^{1 / 2})$ . Finally, (79) and (82) yield (65).

Proof of Theorem 3.2

Without loss of generality, we assume n₁ ≤ n₂. For (R₁, R₂) ∈ Inline graphic (s₀(p)), set Σ₂ = R₂ = I_p×p and we have already known this information. The estimation of sparse difference immediately becomes the estimation of the sparse correlation matrix R₁. Then the lower bound result for estimating single sparse covariance matrix can be used to prove this theorem.

We follow the idea of Cai and Zhou (2012) and define the set of diagonal-1 covariance matrices as

F_{q} (s_{0} (p)) = {\sum : \sum ≽ 0, diag (\sum) = 1, max_{i} \sum_{j = 1}^{p} {∣ σ_{i j} ∣}^{q} \leq s_{0} (p)} .

We have {(R₁, I) : R₁ ∈ Inline graphic (s₀(p))} ⊆ (s₀(p)). Besides, the proof of Theorem 2 in Cai and Zhou (2012) shows that

inf_{\sum^{^}} sup_{\sum \in F_{q} (s_{0} (p))} E {‖ \sum^{^} - \sum ‖}^{2} \geq C s_{0} (p) {(\frac{log p}{n})}^{1 - q}

(83)

Since the correlation matrix equals to covariance matrix (i.e. R = Σ) when diag(Σ) = 1, then

\begin{array}{l} inf_{\hat{D}} sup_{(R_{1}, R_{2}) \in G_{q} (s_{0} (p))} E {‖ \hat{D} - (R_{1} - R_{2}) ‖}^{2} \geq inf_{\hat{D}} sup_{(R_{1}, I) : R_{1} \in F_{q} (s_{0} (p))} E {‖ \hat{D} - (R_{1} - I) ‖}^{2} \\ \geq inf_{{\hat{R}}_{1}} sup_{R_{1} \in F_{q} (s_{0} (p))} E {‖ {\hat{R}}_{1} - R_{1} ‖}^{2} \\ \geq inf_{\sum^{^}} sup_{\sum_{1} \in F_{q} (s_{0} (p)), diag (\sum_{1}) = 1} E ‖ {\sum^{^}}_{1} - \sum_{1} ‖ \\ \geq {C s}_{0}^{2} (p) {(\frac{log p}{n_{1}})}^{1 - q} \geq \frac{C}{2} s_{0}^{2} (p) {(\frac{log p}{n_{1}} + \frac{log p}{n_{2}})}^{1 - q} \end{array}

(84)

which implies (17). By ||·||_ℓ₁ ≥ ||·|| for symmetric matrices, (18) also follow immediately.

Similarly, (19) follows from Theorem 4 of Cai and Zhou (2012).

Proof of Proposition 6.1

The proof of Proposition 6.1 is similar to Theorem 3.1. For the upper bound, again, we split the whole events into three,

A_{1} = {max_{i j} \frac{∣ {\hat{σ}}_{i j} - σ_{i j} ∣}{{(log p {\hat{θ}}_{i j} / n)}^{1 / 2}} \leq τ / 4 + 3, and max_{i j} \frac{∣ {\hat{θ}}_{i j} - θ_{i j} ∣}{σ_{i i} σ_{j j}} \leq ε},

(85)

A_{2} = {max_{i j} \frac{∣ {\hat{σ}}_{i j} - σ_{i j} ∣}{{(log p {\hat{θ}}_{i j} / n)}^{1 / 2}} > τ / 4 + 3, max_{i j} \frac{∣ {\hat{θ}}_{i j} - θ_{i j} ∣}{σ_{i i} σ_{j j}} \leq ε and max_{i j} \frac{∣ {\hat{σ}}_{i j} - σ_{i j} ∣}{{(σ_{i i} σ_{j j})}^{1 / 2}} \leq min (0.5, C_{1} C_{3})}

(86)

A_{3} = {(A_{1} \cup A_{2})}^{c} .

(87)

Here ε is the fixed constant which satisfies 0 < ε < ν₀/2 where ν₀ was introduced in (13); C₁, C₃ are constants specified in Lemma 6.1. Similarly to the proof of Theorem 3.1, we can prove the following statements.

Under A₁,
$\begin{array}{l} {‖ {\hat{R}}^{*} - R ‖}^{2} \leq {C s}_{0}^{2} (p) {(\frac{log p}{n})}^{1 - q}, \\ {‖ {\hat{R}}^{*} - R ‖}_{ℓ_{1}}^{2} \leq {C s}_{0}^{2} (p) {(\frac{log p}{n})}^{1 - q}, \\ {‖ {\hat{R}}^{*} - R ‖}_{F}^{2} \leq C s_{0} (p) {(\frac{log p}{n})}^{1 - q / 2} . \end{array}$
For A₂,
$\begin{array}{l} \int_{A_{2}} {‖ {\hat{R}}^{*} - R ‖}^{2} d P \leq C (p^{- τ / 4 + 1} log p) \frac{1}{n} \\ \int_{A_{2}} {‖ {\hat{R}}^{*} - R ‖}_{ℓ_{1}}^{2} d P \leq C (p^{- τ / 4 + 1} log p) \frac{1}{n} \\ \int_{A_{2}} {‖ {\hat{R}}^{*} - R ‖}_{F}^{2} d P \leq C (p^{- τ / 4 + 1} log p) \frac{1}{n} \end{array}$
For A₃,
$\begin{array}{l} \int_{A_{3}} {‖ {\hat{R}}^{*} - R ‖}^{2} d P \leq C \frac{p^{- M + 2}}{n} \\ \int_{A_{3}} {‖ {\hat{R}}^{*} - R ‖}_{ℓ_{1}}^{2} d P \leq C \frac{p^{- M + 2}}{n} \\ \int_{A_{3}} {‖ {\hat{R}}^{*} - R ‖}_{F}^{2} d P \leq C \frac{p^{- M + 2}}{n} \end{array}$

The rest of proof, including the lower bound results, are omitted here as they are essentially the same as Theorem 3.1.

Footnotes

The research was supported in part by NSF Grant DMS-1208982 and NIH Grant R01 CA127334.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H, Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ, Ideker T. Rewiring of genetic networks in response to DNA damage. Sci Signal. 2010;330(6009):1385–1389. doi: 10.1126/science.1195618. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barabási AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]
Bickel P, Levina E. Covariance regularization by thresholding. The Annals of Statistics. 2008;26:879–921. [Google Scholar]
Cai TT, Liu W. Adaptive thresholding for sparse covariance matrix estimation. Journal of American Statistical Association. 2011;106:672–684. [Google Scholar]
Cai TT, Liu W, Xia Y. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of American Statistical Association. 2013;108:265–277. [Google Scholar]
Cai TT, Yuan M. Adaptive covariance matrix estimation through block thresholding. The Annals of Statistics. 2012;40:2014–2042. [Google Scholar]
Cai TT, Zhang CH, Zhou HH. Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics. 2010;38:2118–2144. [Google Scholar]
Cai TT, Zhou HH. Optimal rates of convergence for sparse covariance matrix estimation. The Annals of Statistics. 2012;40:2389–2420. [Google Scholar]
Carter SL, Brechbühler CM, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004;20(14):2242–2250. doi: 10.1093/bioinformatics/bth234. [DOI] [PubMed] [Google Scholar]
de la Fuente A. From “differential expression” to “differential networking” identification of dysfunctional regulatory networks in diseases. Trends in Genetics. 2010;26:326–333. doi: 10.1016/j.tig.2010.05.001. [DOI] [PubMed] [Google Scholar]
Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Ádány R, Aromaa A, et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics. 2010;42(4):295–302. doi: 10.1038/ng.543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fukushima A. Diffcorr: an r package to analyze and visualize differential correlations in biological networks. Gene. 2013;518(1):209–214. doi: 10.1016/j.gene.2012.11.028. [DOI] [PubMed] [Google Scholar]
Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted gene coexpression network analysis strategies applied to mouse weight. Mammalian Genome. 2007;18(6–7):463–472. doi: 10.1007/s00335-007-9043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghellal A, Li C, Hayes M, Byrne G, Bundred N, Kumar S. Prognostic significance of tgf beta 1 and tgf beta 3 in human breast carcinoma. Anticancer Rec. 2000;20:4413–4418. [PubMed] [Google Scholar]
Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput Biol. 2009;5(5):e1000382. doi: 10.1371/journal.pcbi.1000382. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ideker T, Krogan NJ. Differential network biology. Molecular systems biology. 2012;8(1):565. doi: 10.1038/msb.2011.99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klopocki E, Kristiansen G, Wild PJ, Klaman I, Castanos-Velez E, Singer G, Sthr R, Simon R, Sauter G, Leibiger H, Essers L, Weber B, Hermann K, Rosenthal A, Hartmann A, Dahl E. Loss of sfrp1 is associated with breast cancer progression and poor prognosis in early stage tumors. Int J Oncol. 2004;25:641–649. [PubMed] [Google Scholar]
Kostka D, Spang R. Finding disease specific alterations in the co-expression of genes. Bioinformatics. 2004;20(suppl 1):i194–i199. doi: 10.1093/bioinformatics/bth909. [DOI] [PubMed] [Google Scholar]
Lai Y, Wu B, Chen L, Zhao H. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics. 2004;20:3146–3155. doi: 10.1093/bioinformatics/bth379. [DOI] [PubMed] [Google Scholar]
Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome research. 2004;14(6):1085–1094. doi: 10.1101/gr.1910904. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li J, Chen SX, et al. Two sample tests for high-dimensional covariance matrices. The Annals of Statistics. 2012;40(2):908–940. [Google Scholar]
Margheri F, Schiavone N, Papucci L, Magnelli L, Serrat S, Chill A, Laurenzana A, Bianchini F, Calorini L, Torre E, Dotor J, Feijoo E, Fibbi G, Del Rosso M. Gdf5 regulates tgf-dependent angiogenesis in breast carcinoma mcf-7 cells: In vitro and in vivo control by anti-tgf peptides. PloS ONE. 2012;7:e50342. doi: 10.1371/journal.pone.0050342. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothman A, Levina E, Zhu J. Generalized threshholding of large covariance matrices. Journal of the American Statistical Association. 2009;104:177–186. [Google Scholar]
Schott JR. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Computational Statistics & Data Analysis. 2007;51(12):6535–6542. [Google Scholar]
Shaw P, Greenstein D, Lerch J, Clasen L, Lenroot R, Gogtay N, Evans A, Rapoport J, Giedd J. Intellectual ability and cortical development in children and adolescents. Nature. 2006;440(7084):676–679. doi: 10.1038/nature04513. [DOI] [PubMed] [Google Scholar]
Shedden K, Taylor J. Methods of Microarray Data Analysis. Springer; 2005. Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas; pp. 121–131. [Google Scholar]
Shy B, Wu C, Khramtsova G, Zhang J, Olopade O, Goss K, Merrill B. Regulation of tcf7l1 dna binding and protein stability as principal mechanisms of wnt/b-catenin signaling. Cell Reports. 2013;4:1–9. doi: 10.1016/j.celrep.2013.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Srivastava MS, Yanagihara H. Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis. 2010;101(6):1319–1329. [Google Scholar]
van de Vijver M, He Y, van’t Veer L, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, vander Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]
Xu Y, Liu X, Guo F, Ning Y, Zhi X, Wang X, Chen S, Yin L, Li X. Effect of estrogen sulfation by sult1e1 and papss on the development of estrogen-dependent cancers. Cancer science. 2012;103:1000–1009. doi: 10.1111/j.1349-7006.2012.02258.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zafrakas M, Chorovicer M, Klaman I, Kristiansen G, Wild PJ, Heindrichs U, Knüchel R, Dahl E. Systematic characterisation of gabrp expression in sporadic breast cancer and normal breast tissue. International Iournal of Cancer. 2006;118(6):1453–1459. doi: 10.1002/ijc.21517. [DOI] [PubMed] [Google Scholar]
Zhang J, Li J, Deng H. Class-specific correlations of gene expressions: identification and their effects on clustering analyses. The American Journal of Human Genetics. 2008;83(2):269–277. doi: 10.1016/j.ajhg.2008.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H, Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ, Ideker T. Rewiring of genetic networks in response to DNA damage. Sci Signal. 2010;330(6009):1385–1389. doi: 10.1126/science.1195618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68. doi: 10.1038/nrg2918. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Barabási AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–113. doi: 10.1038/nrg1272. [DOI] [PubMed] [Google Scholar]

[R4] Bickel P, Levina E. Covariance regularization by thresholding. The Annals of Statistics. 2008;26:879–921. [Google Scholar]

[R5] Cai TT, Liu W. Adaptive thresholding for sparse covariance matrix estimation. Journal of American Statistical Association. 2011;106:672–684. [Google Scholar]

[R6] Cai TT, Liu W, Xia Y. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of American Statistical Association. 2013;108:265–277. [Google Scholar]

[R7] Cai TT, Yuan M. Adaptive covariance matrix estimation through block thresholding. The Annals of Statistics. 2012;40:2014–2042. [Google Scholar]

[R8] Cai TT, Zhang CH, Zhou HH. Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics. 2010;38:2118–2144. [Google Scholar]

[R9] Cai TT, Zhou HH. Optimal rates of convergence for sparse covariance matrix estimation. The Annals of Statistics. 2012;40:2389–2420. [Google Scholar]

[R10] Carter SL, Brechbühler CM, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004;20(14):2242–2250. doi: 10.1093/bioinformatics/bth234. [DOI] [PubMed] [Google Scholar]

[R11] de la Fuente A. From “differential expression” to “differential networking” identification of dysfunctional regulatory networks in diseases. Trends in Genetics. 2010;26:326–333. doi: 10.1016/j.tig.2010.05.001. [DOI] [PubMed] [Google Scholar]

[R12] Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J, Curtotti A, Zhernakova A, Heap GA, Ádány R, Aromaa A, et al. Multiple common variants for celiac disease influencing immune gene expression. Nature genetics. 2010;42(4):295–302. doi: 10.1038/ng.543. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Fukushima A. Diffcorr: an r package to analyze and visualize differential correlations in biological networks. Gene. 2013;518(1):209–214. doi: 10.1016/j.gene.2012.11.028. [DOI] [PubMed] [Google Scholar]

[R14] Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. Weighted gene coexpression network analysis strategies applied to mouse weight. Mammalian Genome. 2007;18(6–7):463–472. doi: 10.1007/s00335-007-9043-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Ghellal A, Li C, Hayes M, Byrne G, Bundred N, Kumar S. Prognostic significance of tgf beta 1 and tgf beta 3 in human breast carcinoma. Anticancer Rec. 2000;20:4413–4418. [PubMed] [Google Scholar]

[R16] Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput Biol. 2009;5(5):e1000382. doi: 10.1371/journal.pcbi.1000382. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Ideker T, Krogan NJ. Differential network biology. Molecular systems biology. 2012;8(1):565. doi: 10.1038/msb.2011.99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Klopocki E, Kristiansen G, Wild PJ, Klaman I, Castanos-Velez E, Singer G, Sthr R, Simon R, Sauter G, Leibiger H, Essers L, Weber B, Hermann K, Rosenthal A, Hartmann A, Dahl E. Loss of sfrp1 is associated with breast cancer progression and poor prognosis in early stage tumors. Int J Oncol. 2004;25:641–649. [PubMed] [Google Scholar]

[R19] Kostka D, Spang R. Finding disease specific alterations in the co-expression of genes. Bioinformatics. 2004;20(suppl 1):i194–i199. doi: 10.1093/bioinformatics/bth909. [DOI] [PubMed] [Google Scholar]

[R20] Lai Y, Wu B, Chen L, Zhao H. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics. 2004;20:3146–3155. doi: 10.1093/bioinformatics/bth379. [DOI] [PubMed] [Google Scholar]

[R21] Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome research. 2004;14(6):1085–1094. doi: 10.1101/gr.1910904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Li J, Chen SX, et al. Two sample tests for high-dimensional covariance matrices. The Annals of Statistics. 2012;40(2):908–940. [Google Scholar]

[R23] Margheri F, Schiavone N, Papucci L, Magnelli L, Serrat S, Chill A, Laurenzana A, Bianchini F, Calorini L, Torre E, Dotor J, Feijoo E, Fibbi G, Del Rosso M. Gdf5 regulates tgf-dependent angiogenesis in breast carcinoma mcf-7 cells: In vitro and in vivo control by anti-tgf peptides. PloS ONE. 2012;7:e50342. doi: 10.1371/journal.pone.0050342. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Rothman A, Levina E, Zhu J. Generalized threshholding of large covariance matrices. Journal of the American Statistical Association. 2009;104:177–186. [Google Scholar]

[R25] Schott JR. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Computational Statistics & Data Analysis. 2007;51(12):6535–6542. [Google Scholar]

[R26] Shaw P, Greenstein D, Lerch J, Clasen L, Lenroot R, Gogtay N, Evans A, Rapoport J, Giedd J. Intellectual ability and cortical development in children and adolescents. Nature. 2006;440(7084):676–679. doi: 10.1038/nature04513. [DOI] [PubMed] [Google Scholar]

[R27] Shedden K, Taylor J. Methods of Microarray Data Analysis. Springer; 2005. Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas; pp. 121–131. [Google Scholar]

[R28] Shy B, Wu C, Khramtsova G, Zhang J, Olopade O, Goss K, Merrill B. Regulation of tcf7l1 dna binding and protein stability as principal mechanisms of wnt/b-catenin signaling. Cell Reports. 2013;4:1–9. doi: 10.1016/j.celrep.2013.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Srivastava MS, Yanagihara H. Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis. 2010;101(6):1319–1329. [Google Scholar]

[R30] van de Vijver M, He Y, van’t Veer L, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, vander Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]

[R31] Xu Y, Liu X, Guo F, Ning Y, Zhi X, Wang X, Chen S, Yin L, Li X. Effect of estrogen sulfation by sult1e1 and papss on the development of estrogen-dependent cancers. Cancer science. 2012;103:1000–1009. doi: 10.1111/j.1349-7006.2012.02258.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Zafrakas M, Chorovicer M, Klaman I, Kristiansen G, Wild PJ, Heindrichs U, Knüchel R, Dahl E. Systematic characterisation of gabrp expression in sporadic breast cancer and normal breast tissue. International Iournal of Cancer. 2006;118(6):1453–1459. doi: 10.1002/ijc.21517. [DOI] [PubMed] [Google Scholar]

[R33] Zhang J, Li J, Deng H. Class-specific correlations of gene expressions: identification and their effects on clustering analyses. The American Journal of Human Genetics. 2008;83(2):269–277. doi: 10.1016/j.ajhg.2008.07.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Inference for High-dimensional Differential Correlation Matrices *

T Tony Cai

Anru Zhang

Abstract

1 Introduction

2 Estimation of Differential Correlation Matrix

3 Theoretical Properties

3.1 Rate Optimality of the Thresholding Estimator

Theorem 3.1 (Upper bound)

Remark 3.1

Theorem 3.2 (Lower Bound)

Remark 3.2

4 Numerical Studies

Figure 1.

4.1 Empirical Choice of τ

4.2 Estimation of Differential Correlation Matrix

Table 1.

5 Analysis of A Breast Cancer Dataset

Table 3.

Table 4.

6 Other Related Problems

6.1 Testing Differential Correlation Matrices

6.2 Optimal Estimation of a Sparse Correlation Matrix

Proposition 6.1

Remark 6.1

6.3 Optimal Estimation of Sparse Differential Covariance Matrices

6.4 Estimate Differential Cross-Correlation Matrices

Proposition 6.2

Table 2.

Appendix: Proofs

Proof of Theorem 3.1

Lemma 6.1

Proof of Lemma 6.1

Proof of Theorem 3.2

Proof of Proposition 6.1

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Inference for High-dimensional Differential Correlation Matrices ^*