Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Zijian Guo; Wanjie Wang; T Tony Cai; Hongzhe Li

doi:10.1080/01621459.2017.1407774

. Author manuscript; available in PMC: 2024 Mar 1.

Published in final edited form as: J Am Stat Assoc. 2018 Nov 19;114(525):358–369. doi: 10.1080/01621459.2017.1407774

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Zijian Guo ¹, Wanjie Wang ², T Tony Cai ³, Hongzhe Li ⁴

PMCID: PMC10907007 NIHMSID: NIHMS1501133 PMID: 38434789

Abstract

Estimating the genetic relatedness between two traits based on the genome-wide association data is an important problem in genetics research. In the framework of high-dimensional linear models, we introduce two measures of genetic relatedness and develop optimal estimators for them. One is genetic covariance, which is defined to be the inner product of the two regression vectors, and another is genetic correlation, which is a normalized inner product by their lengths. We propose functional de-biased estimators (FDEs), which consist of an initial estimation step with the plug-in scaled Lasso estimator, and a further bias correction step. We also develop estimators of the quadratic functionals of the regression vectors, which can be used to estimate the heritability of each trait. The estimators are shown to be minimax rate-optimal and can be efficiently implemented. Simulation results show that FDEs provide better estimates of the genetic relatedness than simple plug-in estimates. FDE is also applied to an analysis of a yeast segregant data set with multiple traits to estimate the genetic relatedness among these traits.

Keywords: Genetic correlations, genome-wide association studies, inner product, quadratic functional, minimax rate of convergence

1. Introduction

1.1. Motivation and Background

Genome-wide association studies (GWAS) have led to identification of thousands of genetic variants or single nucleotide polymorphisms (SNPs) that are associated with various complex phenotypes (Manolio, 2010). Results from these GWAS have shown that many complex phenotypes share common genetic variants, including various autoimmune diseases (Zhernakova et al., 2009) and psychiatric disorders (Lee et al., 2013). These empirical evidence of shared genetic etiology for various phenotypes provides important insights of common pathophysiologies for related disorders that can be explored for drug repositioning and for studying disease etiology. Such knowledge of genetic sharing can potentially be explored to increase the accuracy of genetic risk prediction (Maier et al., 2015; Wray et al., 2007; Purcell et al., 2009). The concept of genetic relatedness or genetic correlations has been proposed to describe the shared genetic associations within pairs of quantitative traits based on GWAS data. This is in contrast to the traditional approaches of estimating co-heritability based on twin or family studies, where measurements of both traits are required on the same set of individuals. Due to the availability of GWAS data sets of many important traits, there has been significant recent interest in methods for quantifying and estimating the genetic relatedness between two traits based on large scale genetic association data.

Several measures of genetic relatedness have been proposed using GWAS data. Lee et al. (2012) and Yang et al. (2013) extended the mixed-effect model framework to estimate genetic covariance and genetic correlation between two traits. In their models, each individual’s trait value is associated with a random genetic effect, which is correlated across individuals by virtue of sharing some of the genetic variants affecting the traits, and an environmental random effect. Co-heritability is then defined as the square-root of the ratio of the covariance of the genetic random effects to the product of the total variances. The mixed-effect model approach requires knowledge of the identity of the causal variants, and hence the covariance matrix. This is however not available. Lee et al. (2012) and Yang et al. (2013) approximated the genetic relationship between every pair of individuals across the set of causal variants by the genetic relationship across the set of all genotyped variants. However, the very large number of variants used for estimating the genetic correlations, most of them likely not causative, might mask out the correlations on the set of causal variants, leading to inaccurate and suboptimal estimation of heritability (Golan and Rosset, 2011). Bulik-Sullivan et al. (2015) studied the genetic relatedness based on another random effects model for the two traits and developed a cross-trait linkage disequilibrium (LD) score regression to estimate the genetic covariance and genetic correlation. This approach shares similarity with the mixed-effect model approach of Yang et al. (2013) but has the advantages of only using the GWAS summary statistics. Lee and van der Wer (2016) developed an algorithm for multivariate linear mixed model analysis and demonstrated its use in estimating co-heritability.

To alleviate the difficulty of estimating the covariance matrix in the commonly used mixed effect model framework of estimating the heritability or co-heritability, we take a regression approach with fixed genetic effects in high-dimensional settings. High-dimensional linear regression provides a natural framework for GWAS in order to identify the trait-associated genetic variants, and its advantages over the simple univariate analysis have been demonstrated (Wu et al., 2009). The study of heritability in high-dimensional regression analysis has been studied in Bonnet et al. (2015); Verzelen and Gassiat (2016); Janson et al. (2016). However, high-dimensional regression analysis has not been explored to study the genetic relatedness between two traits based on genetic association data. The goal of this paper is to define two quantities that can be used to measure the genetic relatedness between a pair of traits based on GWAS data in the framework of high-dimensional linear models. Our definitions of the genetic relatedness reflect covariance or correlation of the trait-associated genetic variants. This is different from the mixed-effects model-based approaches where the genetic relatedness is defined through the variance/covariance matrix of the individual-specific random effects and the data from all the genetic variants are used to approximate the true covariance matrix.

1.2. Definition and Problem Formulation

A pair of trait values ( $y$ , $w$ ) are modeled as a linear combination of $p$ genetic variants and an error term that includes environmental and unmeasured genetic effects,

y_{n_{1} \times 1} = X_{n_{1} \times p} β_{p \times 1} + ϵ_{n_{1} \times 1} and w_{n_{2} \times 1} = Z_{n_{2} \times p} γ_{p \times 1} + δ_{n_{2} \times 1},

(1)

where the rows $X_{i \cdot}$ are i.i.d. p-dimensional Sub-gaussian random vectors with covariance matrix $Σ$ , the rows $Z_{i \cdot}$ are i.i.d. p-dimensional Sub-gaussian random vectors with covariance matrix $Γ$ and the error ${(ϵ, δ)}^{⊤}$ follows the multivariate normal distribution with mean zero and covariance

(\begin{matrix} σ_{1}^{2} I_{n_{1} \times n_{1}} & 0_{n_{1} \times n_{2}} \\ 0_{n_{2} \times n_{1}} & σ_{2}^{2} I_{n_{2} \times n_{2}} \end{matrix})

and is assumed to be independent of $X$ and $Z$ .

In the study of genetic relatedness, the pair of traits $y$ and $w$ are assumed to have mean zero, and the jth column of $X$ , $X_{\cdot j}$ , and the jth column of $Z$ , $Z_{\cdot j}$ , are the numerically coded genetic markers at the jth genetic variant and are assumed to have mean zero and variance 1. Under this model, if the columns of $X$ and $Z$ are independent, for the i-th observation,

Var (y_{i}) = \sum_{j} β_{j}^{2} + σ_{1}^{2} = ∥ β ∥_{2}^{2} + σ_{1}^{2}, and Var (w_{i}) = \sum_{j} γ_{j}^{2} + σ_{2}^{2} = ∥ γ ∥_{2}^{2} + σ_{2}^{2},

therefore $∥ β ∥_{2}^{2} / (∥ β ∥_{2}^{2} + σ_{1}^{2})$ and $∥ γ ∥_{2}^{2} / (∥ γ ∥_{2}^{2} + σ_{2}^{2})$ can then be interpreted as the narrow sense heritability (Bulik-Sullivan et al., 2015).

Based on this model, one measure of genetic relatedness is the inner product of the regression coefficients

I (β, γ) = 〈 β, γ 〉

(2)

which measures the shared genetic effects between these two traits. Bulik-Sullivan et al. (2015) defined this quantity as the genetic covariance due to the $p$ genetic variants. Alternatively, a normalized inner product called genetic correlation, that is the ratio

R (β, γ) = \frac{〈 β, γ 〉}{∥ β ∥_{2} ∥ γ ∥_{2}} 1 (∥ β ∥_{2} ∥ γ ∥_{2} > 0),

(3)

can also be used. In the case where one of $∥ β ∥_{2}$ and $∥ γ ∥_{2}$ is vanishing, the ratio is defined as zero, which indicates no correlation between two traits when one of the regression vector is zero. With this normalization, $R (β, γ)$ is always between −1 and 1 and can be used to compare the genetic relatedness among multiple pairs. Note that to exhibit genetic correlation, the directions of effect must also be consistently aligned.

Although Bulik-Sullivan et al. (2015) defined (2) and (3) as genetic covariance and genetic correlation, they treated $β$ and $γ$ as random vectors with a particular covariance form and then proposed to apply LD regression to estimate the expectation of $〈 β, γ 〉$ . The focus of this paper is to develop estimators for $I (β, γ)$ and $R (β, γ)$ based on two GWAS data with genotype data measured on the same set of genetic markers, denoted by $(y_{i}, X_{i \cdot}, i = 1, \dots, n_{1})$ and $(w_{i}, Z_{i \cdot}, i = 1, \dots, n_{2})$ .

1.3. Methods and Main Results

A naive estimator is to estimate $β$ and $γ$ first and then plug-in the estimators of $β$ and $γ$ into the expressions (2) and (3). For the problem of interest, usually there are more genetic markers than the sample size, that is $p \geq max {n_{1}, n_{2}}$ . However, for any trait, one expect that only a few of these markers have nonzero effects. One can apply any high-dimensional sparse regression methods such as Lasso (Tibshirani, 1996), scaled Lasso (Sun and Zhang, 2012) and marginal regression with screening (McCarthy et al., 2008; Fan et al., 2012) to estimate these sparse regression coefficients. The aforementioned plug-in estimators, however, have several drawbacks in estimating the genetic relatedness. The Lasso approach shrinks the estimation towards 0, in particular, some weak effects might be shrunken to 0, yet the accumulation of these weak effects may contribute significant to the trait variability. It is possible that some genetic variants may have strong effects on one trait and weak effects on the other trait. Due to shrinkage, the plug-in of Lasso type estimators fails to capture this part of contribution to genetic relatedness from such genetic variants. Marginal regression calculates the regression score between the trait and each single marker (i.e., $y$ and $X_{\cdot j}, 1 \leq j \leq p$ ), and screen for the large scores. This approach also suffers from the existence of weak effects, as the marginal scores must be large enough to survive in the screening step.

We propose a two-step procedure to estimate the genetic relatedness measure $I (β, γ)$ defined in (2), where step 1 is involved with estimating the inner product $I (β, γ)$ by the plug-in scaled Lasso estimator, and step 2 is involved with correcting the plug-in scaled Lasso estimator. Similar two-step procedures are proposed to estimate the quadratic functionals $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ . To estimate the normalized inner product $R (β, γ)$ defined in (3), we plug-in the estimators of the inner product and quadratic functionals into the definition (3). Due to the correction step, we name our estimators as Functional De-biased Estimators (FDEs).

FDEs are shown to achieve the minimax optimal convergence rates of estimating $I (β, γ)$ and $R (β, γ)$ . The optimality of FDEs results from the unique way of balancing the bias and variance for estimating $I (β, γ)$ and $R (β, γ)$ . To illustrate this, we focus on estimation of $I (β, γ)$ , take the plug-in estimator of the scaled Lasso estimators (Sun and Zhang, 2012) and the plug-in of the de-biased Lasso estimators (Javanmard and Montanari, 2014; van de Geer et al., 2014; Zhang and Zhang, 2014) as examples and compare them with FDE estimator of $I (β, γ)$ . Note that the scaled Lasso estimator achieves the optimal convergence rate of estimating the whole vector $β$ and the de-biased estimator achieves the optimal convergence rate of estimating the single coordinate $β_{i}$ . However, simply plugging in the scaled Lasso estimators or the de-biased Lasso estimators does not lead to a good estimator of $I (β, γ)$ since the plug-in estimator of scaled Lasso estimators suffers from a large bias and the plug-in estimator of de-biased Lasso estimators suffers from the inflation of variance.

In contrast, FDE estimator of $I (β, γ)$ balances the bias and variance in the optimal way. Specifically, in the correction step of FDE estimator, the bias caused by plugging in the scaled Lasso estimator is corrected through adding the minimum amount of variance. As demonstrated in the simulation studies, FDE consistently outperforms the plug-in estimator of the scaled Lasso estimators and the plug-in estimator of the de-biased Lasso estimators. In addition, FDEs do not suffer from dependency among genetic markers. FDEs work for a broad class of dependency structure of genetic markers.

The theoretical analysis given in Section 3 establishes the optimal convergence rates of estimating $I (β, γ)$ , $R (β, γ)$ , $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ . To facilitate the discussion, we control the $ℓ_{2}$ norm of regression coefficients $β$ and $γ$ as $c η_{0} \leq ∥ β ∥_{2} \leq C M_{0}$ and $c η_{0} \leq ∥ γ ∥_{2} \leq C M_{0}$ where $c$ , $C$ are positive constant independent of $n$ , $p$ . Here, we present the most interesting regime where the signals are strong in the sense of $η_{0} \geq C \sqrt{k \log p / n}$ , where $p$ is the dimension, $n$ is the sample size, k is the maximum sparsity of $β$ and $γ$ and $C$ is a positive constant independent of $k$ , $n$ , $p$ . We have shown that the optimal rate of convergence for estimating $I (β, γ)$ , $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ is

M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n} .

The optimal rate depends not only on $p$ , $n$ and $k$ , but also the upper bound for the signal strength $M_{0}$ . In addition, we have shown that the optimal convergence rate of estimating $R (β, γ)$ is

\frac{1}{η_{0}} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{1}{η_{0}^{2}} \frac{k \log p}{n} .

In contrast to estimating $I (β, γ)$ , $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ , the optimal rate scales to the inverse of the lower bound for the signal strength, represented by $1 / η_{0}$ . The estimators $\hat{I} (β, γ)$ , $\hat{Q} (β)$ , $\hat{Q} (γ)$ and $\hat{R} (β, γ)$ proposed in Section 3 are shown to adaptively achieve the optimal rates for estimating $I (β, γ)$ , $Q (β)$ , $Q (γ)$ and $R (β, γ)$ , respectively.

1.4. Notation and Definitions

Basic notation and definitions used in the rest of the paper are defined here. For a matrix $X \in R^{n \times p}$ , $X_{i \cdot}$ , $X_{\cdot j}$ , and $X_{i, j}$ denote respectively the i-th row, j-th column, and (i, j)-th entry of the matrix $X$ , $X_{i, - j}$ denotes the i-th row of $X$ excluding the j-th coordinate, and $X_{- j}$ denotes the sub-matrix of $X$ excluding the j-th column. Let $[p] = {1, 2, \cdot \cdot \cdot, p}$ . For a subset $J \subset [p]$ , $X_{J}$ denotes the sub-matrix of $X$ consisting of columns $X_{\cdot j}$ with $j \in J$ and for a vector $x \in R^{p}$ , $x_{J}$ is the sub-vector of $x$ with indices in $J$ and $x_{- J}$ is the sub-vector with indices in $J^{c}$ . For a vector $x \in R^{p}$ , the $ℓ_{q}$ norm of $x$ is defined as $∥ x ∥_{q} = {(\sum_{i = 1}^{q} {| x_{i} |}^{q})}^{\frac{1}{q}}$ for $q \geq 0$ with ${‖ x ‖}_{0}$ denoting the cardinality of non-zero elements of $x$ and $∥ x ∥_{\infty} = \max_{1 \leq j \leq p} | x_{j} |$ . For a matrix $A$ and $1 \leq q \leq \infty$ , $∥ A ∥_{q} = \sup_{∥ x ∥_{q} = 1} ∥ A x ∥_{q}$ is the matrix $ℓ_{q}$ operator norm. In particular, $∥ A ∥_{2}$ is the spectral norm. For a symmetric matrix $A$ , $λ_{min} (A)$ and $λ_{max} (A)$ denote respectively the smallest and largest eigenvalue of $A$ . For a set $S$ , $| S |$ denotes the cardinality of $S$ . For $a \in R$ , $a_{+} = max {a, 0}$ and $sign (a)$ is the sign of $a$ , i.e., $sign (a) = 1$ if $a > 0$ , $sign (a) = - 1$ if $a < 0$ and $sign (0) = 0$ . Define the Sub-gaussian norm $∥ x ∥_{ψ_{2}}$ of $x \in R^{p}$ as $∥ x ∥_{ψ_{2}} = \sup_{v \in S^{p - 1}} \sup_{q \geq 1} {(E {| v^{⊤} x |}^{q})}^{\frac{1}{q}} / \sqrt{q}$ where $S^{p - 1}$ is the unit sphere in $R^{p}$ . The random vector $x \in R^{p}$ is defined to be Sub-gaussian if its corresponding Sub-gaussian norm is bounded; see Vershynin (2012) for more on Subgaussian random variables. For the design matriices $X \in R^{n_{1} \times p}$ and $Z \in R^{n_{2} \times p}$ , we define the corresponding sample covariance matrices as $\hat{Σ} = X^{⊤} X / n_{1}$ and $\hat{Γ} = Z^{⊤} Z / n_{2}$ . Let $z_{α / 2}$ denote the upper $α / 2$ quantile of the standard normal distribution. For two positive sequences $a_{n}$ and $b_{n}$ , $a_{n} ≲ b_{n}$ means $a_{n} \leq C b_{n}$ for all $n$ , $a_{n} ≳ b_{n}$ if $b_{n} ≲ a_{n}$ and $a_{n} ≍ b_{n}$ if $a_{n} ≲ b_{n}$ and $b_{n} ≲ a_{n}$ . $c$ and $C$ are used to denote generic positive constants that may vary from place to place. For any two sequences of numbers $a_{n}$ and $b_{n}$ , we will write $b_{n} ≪ a_{n}$ if $lim sup b_{n} / a_{n} = 0$ .

1.5. Organization of the Paper

The rest of the paper is organized as follows. Section 2 presents the procedures for estimating $I (β, γ)$ , $Q (β)$ , $Q (γ)$ , and $R (β, γ)$ in details. In Section 3, minimax convergence rates for the estimation problems are established and the proposed estimators are shown to attain the optimal rates. In Section 4, simulation studies are conducted to evaluate the empirical performance of FDEs. A yeast cross data is used to illustrate the estimators in Section 5. Discussion is provided in Section 6. The proofs of main theorems are present in Section 7. The remaining proofs and the extended simulation studies are given in the supplementary materials.

2. Estimation Methods

2.1. Estimation of $I (β, γ)$

Since the inner product $I (β, γ)$ is of significant interest in its own right, we first consider the estimation of $I (β, γ) = 〈 β, γ 〉$ . The scaled Lasso estimators for high-dimensional linear model (1) are defined through the following optimization algorithm (Sun and Zhang, 2012),

{\hat{β}, {\hat{σ}}_{1}} = \arg \min_{β \in R^{p}, σ_{1} \in R^{+}} \frac{∥ y - X β ∥_{2}^{2}}{2 n_{1} σ_{1}} + \frac{σ_{1}}{2} + \frac{λ_{0}}{\sqrt{n_{1}}} \sum_{j = 1}^{p} \frac{{‖ X_{\cdot j} ‖}_{2}}{\sqrt{n_{1}}} | β_{j} |,

(4)

and

{\hat{γ}, {\hat{σ}}_{2}} = \arg \min_{γ \in R^{p}, σ_{2} \in R^{+}} \frac{∥ w - Z γ ∥_{2}^{2}}{2 n_{2} σ_{2}} + \frac{σ_{2}}{2} + \frac{λ_{0}}{\sqrt{n_{2}}} \sum_{j = 1}^{p} \frac{{‖ Z_{\cdot j} ‖}_{2}}{\sqrt{n_{2}}} | γ_{j} |,

(5)

where $λ_{0} = \sqrt{2.01 \log p}$ . To construct an optimal estimator of $I (β, γ)$ , it is helpful to analyze the error of the plugin estimator $〈 \hat{β}, \hat{γ} 〉$ ,

〈 \hat{β}, \hat{γ} 〉 - 〈 β, γ 〉 = 〈 \hat{γ}, \hat{β} - β 〉 + 〈 \hat{β}, \hat{γ} - γ 〉 - 〈 \hat{β} - β, \hat{γ} - γ 〉 .

(6)

The last term on the right hand side, $〈 \hat{β} - β, \hat{γ} - γ 〉$ is “small”, but the first two terms $〈 \hat{γ}, \hat{β} - β 〉$ and $〈 \hat{β}, \hat{γ} - γ 〉$ can be large. This provides the motivation for the proposed estimator, where we first estimate these two terms and then subtract them from $〈 \hat{β}, \hat{γ} 〉$ to obtain the final estimator of $I (β, γ)$ .

The intuition for estimating $〈 \hat{γ}, β - \hat{β} 〉$ is given first. Since

\frac{1}{n_{1}} X^{⊤} (y - X \hat{β}) = \hat{Σ} (β - \hat{β}) + \frac{1}{n_{1}} X^{⊤} ϵ,

(7)

multiplying both sides of (7) by a vector $u \in R^{p}$ yields

\frac{1}{n_{1}} u^{⊤} X^{⊤} (y - X \hat{β}) = u^{⊤} \hat{Σ} (β - \hat{β}) + \frac{1}{n_{1}} u^{⊤} X^{⊤} ϵ,

(8)

which can be written as

\frac{1}{n_{1}} u^{⊤} X^{⊤} (y - X \hat{β}) - 〈 \hat{γ}, β - \hat{β} 〉 = {(\hat{Σ} u - \hat{γ})}^{⊤} (β - \hat{β}) + \frac{1}{n_{1}} u^{⊤} X^{⊤} ϵ .

(9)

If the vector $u$ can be chosen such that the right hand side of (9) is “small”, then $u^{⊤} X^{⊤} (y - X \hat{β}) / n_{1}$ is a good estimator of $〈 \hat{γ}, β - \hat{β} 〉$ . Since the first term on the right hand side of (9) is upper bounded as $| {(\hat{Σ} u - \hat{γ})}^{⊤} (β - \hat{β}) | \leq ∥ \hat{Σ} u - \hat{γ} ∥_{\infty} ∥ \hat{β} - β ∥_{1}$ , we control the right hand side of (9) through constructing a projection vector $u$ such that $∥ \hat{Σ} u - \hat{γ} ∥_{\infty}$ is constrained and the second term of (9) $u^{⊤} X^{⊤} ϵ / n_{1}$ is controlled through minimizing its variance $σ_{1}^{2} u^{⊤} \hat{Σ} u / n_{1}$ . This leads to the following convex optimization algorithm for identifying the projection vector $u$ for estimating $〈 \hat{γ}, β - \hat{β} 〉$ ,

{\hat{u}}_{1} = \underset{u \in R^{p}}{\arg \min} {u^{⊤} \hat{Σ} u : ∥ \hat{Σ} u - \hat{γ} ∥_{\infty} \leq ∥ \hat{γ} ∥_{2} \frac{λ_{1}}{\sqrt{n_{1}}}},

(10)

where $λ_{1} = 12 λ_{\max}^{2} (Σ) \sqrt{\log p}$ .

Remark 1.

The solution of the above optimization problem might not be unique and ${\hat{u}}_{1}$ is defined as any minimizer of the optimization problem. The theory established in Section 3 still holds for any minimizer of (10). The optimization problem (10) is solved through its equivalent Lagrange dual problem, which is computationally efficient and scales well to the high-dimensional problem. See Step 2 in Table 1 for more details.

Table 1:

FDE algorithm without sample splitting for estimating the inner product, quadratic functionals and the normalized inner product.

	Input: design matrices: $X$ , $Z$ ; response vectors: $y$ , $w$ ; tuning parameters $λ_{0}$ , $λ$ .
	Output: $\hat{I} (β, γ)$ , $\hat{Q} (β)$ , $\hat{Q} (γ)$ and $\hat{R} (β, γ)$ .

Initial Lasso estimators:
1.	Scaled Lasso: Calculate $\hat{β}$ and $\hat{γ}$ from (4) and (5) with the tuning parameter $λ_{0}$ .
Inner product calculation:
2.	Projection vector ${\hat{u}}_{1}$ : Calculate ${\hat{u}}_{1} = \arg {min}_{u} u^{⊤} X^{⊤} X u / 4 n_{1} + u^{⊤} \hat{γ} + λ^{t} ∥ u ∥_{1}$ , where $λ^{t} = λ^{t - 1} / 1.5$ , and $λ^{0} = λ / \sqrt{n_{1}}$ . Repeat until ${\hat{u}}_{1}$ can not be solved with $λ^{t}$ replaced by $λ^{t + 1}$ , or $t \geq 10$ .
3.	Projection vector ${\hat{u}}_{2}$ : Calculate ${\hat{u}}_{2} = \arg \min_{u} u^{⊤} Z^{⊤} Z u / 4 n_{2} + u^{⊤} \hat{β} + λ^{t} ∥ u ∥_{1}$ , where $λ^{t} = λ^{t - 1} / 1.5$ , and $λ^{0} = λ / \sqrt{n_{1}}$ . Repeat until ${\hat{u}}_{2}$ can not be solved with $λ^{t}$ replaced by $λ^{t + 1}$ , or $t \geq 10$ .
4.	Correction: $\hat{I} (β, γ) = 〈 \hat{β}, \hat{γ} 〉 + {\hat{u}}_{1}^{⊤} X^{⊤} (y - X \hat{β}) / n_{1} + {\hat{u}}_{2}^{⊤} Z^{⊤} (w - Z \hat{γ}) / n_{2}$ .
Quadratic functional calculation:
5.	Projection vector ${\hat{u}}_{3}$ : Calculate ${\hat{u}}_{3} = \arg \min_{u} u^{⊤} X^{⊤} X u / 4 n_{1} + u^{⊤} \hat{β} + λ^{t} ∥ u ∥_{1}$ , where $λ^{t} = λ^{t - 1} / 1.5$ , and $λ^{0} = λ / \sqrt{n_{1}}$ . Repeat until ${\hat{u}}_{3}$ can not be solved with $λ^{t}$ replaced by $λ^{t + 1}$ , or $t \geq 10$ .
6.	Projection vector ${\hat{u}}_{4}$ : Calculate ${\hat{u}}_{4} = \arg \min_{u} u^{⊤} Z^{⊤} Z u / 4 n_{2} + u^{⊤} \hat{γ} + λ^{t} ∥ u ∥_{1}$ , where $λ^{t} = λ^{t - 1} / 1.5$ , and $λ^{0} = λ / \sqrt{n_{2}}$ . Repeat until ${\hat{u}}_{1}$ can not be solved with $λ^{t}$ replaced by $λ^{t + 1}$ , or $t \geq 10$ .
7.	Correction: $\hat{Q} (β) = {(∥ \hat{β} ∥^{2} + 2 {\hat{u}}_{3}^{⊤} X^{⊤} (y - X \hat{β}) / n_{1})}_{+},$
	$\hat{Q} (γ) = {(∥ \hat{γ} ∥^{2} + 2 {\hat{u}}_{4}^{⊤} Z^{⊤} (w - Z \hat{γ}) / n_{2})}_{+} .$
Ratio calculation:
8.	$\hat{R} (β, γ) = sign (\hat{I} (β, γ)) \cdot \min {(\| \hat{I} (β, γ) ∣ / \sqrt{\hat{Q} (β) \hat{Q} (γ)}) 1 {\hat{Q} (β) \hat{Q} (γ) > 0}, 1} .$

Open in a new tab

Once the projection vector ${\hat{u}}_{1}$ is obtained, $〈 \hat{γ}, β - \hat{β} 〉$ is then estimated by ${\hat{u}}_{1}^{⊤} X^{⊤} (y - X \hat{β}) / n_{1}$ . Similarly, the projection vector for estimating $〈 \hat{β}, γ - \hat{γ} 〉$ can be obtained via the convex algorithm

{\hat{u}}_{2} = \underset{u \in R^{p}}{\arg \min} {u^{⊤} \hat{Γ} u : ∥ \hat{Γ} u - \hat{β} ∥_{\infty} \leq ∥ \hat{β} ∥_{2} \frac{λ_{2}}{\sqrt{n_{2}}}},

(11)

where $λ_{2} = 12 λ_{\max}^{2} (Γ) \sqrt{\log p}$ . Then $〈 \hat{β}, γ - \hat{γ} 〉$ is estimated by ${\hat{u}}_{2}^{⊤} Z^{⊤} (w - Z \hat{γ}) / n_{2}$ .

The final estimator $\hat{I} (β, γ)$ of $I (β, γ)$ is given by

\hat{I} (β, γ) = 〈 \hat{β}, \hat{γ} 〉 + {\hat{u}}_{1}^{⊤} \frac{1}{n_{1}} X^{⊤} (y - X \hat{β}) + {\hat{u}}_{2}^{⊤} \frac{1}{n_{2}} Z^{⊤} (w - Z \hat{γ}) .

(12)

It is clear from the above discussion that the key idea for the construction of the final estimator $\hat{I} (β, γ)$ is to identify the projection vectors ${\hat{u}}_{1}$ and ${\hat{u}}_{2}$ such that $〈 \hat{γ}, β - \hat{β} 〉$ and $〈 \hat{β}, γ - \hat{γ} 〉$ are well approximated. It will be shown in Section 3 that the estimator $\hat{I} (β, γ)$ is adaptively minimax rate-optimal.

Remark 2.

As mentioned, simply plugging in the Lasso, scaled Lasso, or de-biased estimator does not lead to a good estimator of $I (β, γ)$ . Another natural approach is to first threshold the de-biased estimator to obtain a sparse estimator of the coefficient vectors (see details in Zhang and Zhang (2014, Section 3.3), Guo et al. (2016, equation (10))) and then plugin this thresholded estimator. This estimator is referred to as the thresholded estimator. Simulations in Section 4 demonstrate that the proposed estimator defined in (12) outperforms the three plug-in estimators using the scaled Lasso, de-biased, and thresholded estimators.

2.2. Estimation of $Q (β)$ and $Q (γ)$

In order to estimate the normalized inner product $R (β, γ)$ , it is necessary to estimate the quadratic functionals $Q (β) = ∥ β ∥_{2}^{2}$ and $Q (γ) = ∥ γ ∥_{2}^{2}$ . To this end, we randomly split the data $(y, X)$ into two subsamples $(y^{(1)}, X^{(1)})$ with sample size $n_{1} / 2$ and $(y^{(2)}, X^{(2)})$ with sample size $n_{1} / 2$ and the data $(w, Z)$ into two subsamples $(w^{(1)}, Z^{(1)})$ with sample size $n_{2} / 2$ and $(w^{(2)}, Z^{(2)})$ with sample size $n_{2} / 2$ .

With a slight abuse of notation, let $\hat{β}$ and $\hat{γ}$ denote the optimizers of the scaled Lasso algorithm (4) applied to $(y^{(1)}, X^{(1)})$ and (5) applied to $(w^{(1)}, Z^{(1)})$ , respectively. For the scaled Lasso algorithms, the sample sizes $n_{1}$ and $n_{2}$ are replaced by $n_{1} / 2$ and $n_{2} / 2$ , respectively. Again, the simple plug-in estimator $∥ \hat{β} ∥_{2}^{2}$ of $Q (β)$ is not a good estimator of $∥ β ∥_{2}^{2}$ because of the following error decomposition,

∥ \hat{β} ∥_{2}^{2} - ∥ β ∥_{2}^{2} = 2 〈 \hat{β}, \hat{β} - β 〉 - ∥ \hat{β} - β ∥_{2}^{2},

(13)

where the second term on the right hand side of (13) is “small”, but the first can be large. Specifically, the term $2 〈 \hat{β}, β - \hat{β} 〉$ is estimated first and then is added to $∥ \hat{β} ∥_{2}^{2}$ to obtain the final estimator of $∥ β ∥_{2}^{2}$ . To estimate $〈 \hat{β}, β - \hat{β} 〉$ , a projection vector $u$ is identified such that the following difference is controlled,

\frac{1}{n_{1} / 2} u^{⊤} {(X^{(2)})}^{⊤} (y^{(2)} - X^{(2)} \hat{β}) - 〈 \hat{β}, β - \hat{β} 〉 = (u^{⊤} {\hat{Σ}}^{(2)} - \hat{β}) (β - \hat{β}) + \frac{1}{n_{1} / 2} u^{⊤} {(X^{(2)})}^{⊤} ϵ,

(14)

with ${\hat{Σ}}^{(2)} = {(X^{(2)})}^{⊤} X^{(2)} / (n_{1} / 2)$ . Define the projection vector ${\hat{u}}_{3}$ as the solution to the following optimization algorithm

{\hat{u}}_{3} = \underset{u \in R^{p}}{\arg \min} {u^{⊤} {\hat{Σ}}^{(2)} u : {‖ {\hat{Σ}}^{(2)} u - \hat{β} ‖}_{\infty} \leq ∥ \hat{β} ∥_{2} \frac{λ_{1}}{\sqrt{n_{1} / 2}}},

(15)

where $λ_{1} = 12 λ_{\max}^{2} (Σ) \sqrt{\log p}$ . We then estimate $〈 \hat{β}, β - \hat{β} 〉$ by ${\hat{u}}_{3}^{⊤} {(X^{(2)})}^{⊤} (y^{(2)} - X^{(2)} \hat{β}) / (n_{1} / 2)$ and propose the final estimator of $∥ β ∥_{2}^{2}$ as

\hat{Q} (β) = {(∥ \hat{β} ∥_{2}^{2} + 2 {\hat{u}}_{3}^{⊤} \frac{1}{n_{1} / 2} {(X^{(2)})}^{⊤} (y^{(2)} - X^{(2)} \hat{β}))}_{+} .

(16)

Similarly, the estimator of $∥ γ ∥_{2}^{2}$ is given by

\hat{Q} (γ) = {(∥ \hat{γ} ∥_{2}^{2} + 2 {\hat{u}}_{4}^{⊤} \frac{1}{n_{2} / 2} {(Z^{(2)})}^{⊤} (w^{(2)} - Z^{(2)} \hat{γ}))}_{+},

(17)

where

{\hat{u}}_{4} = \underset{u}{\arg \min} {u^{⊤} {\hat{Γ}}^{(2)} u : {‖ {\hat{Γ}}^{(2)} u - \hat{γ} ‖}_{\infty} \leq ∥ \hat{γ} ∥_{2} \frac{λ_{2}}{\sqrt{n_{2} / 2}}},

(18)

with ${\hat{Γ}}^{(2)} = {(Z^{(2)})}^{⊤} (Z^{(2)}) / (n_{2} / 2)$ and $λ_{2} = 12 λ_{\max}^{2} (Γ) \sqrt{\log p}$ .

Remark 3.

Sample splitting is used here for the purpose of the theoretical analysis. In the simulation study (Section 4), the performance of the proposed estimator without sample splitting is investigated; see Steps 5 – 7 in Table 1. The proposed estimator without sample splitting performs even better numerically than with sample splitting since more observations are used in constructing the initial estimators $∥ \hat{β} ∥_{2}^{2}$ and $∥ \hat{γ} ∥_{2}^{2}$ and the projection vectors ${\hat{u}}_{3}$ and ${\hat{u}}_{4}$ .

2.3. Estimation of $R (β, γ)$

Given the estimators $\hat{I} (β, γ)$ , $\hat{Q} (β)$ and $\hat{Q} (γ)$ constructed in Sections 2.1 and 2.2, a natural estimator for the normalized inner product $R (β, γ)$ is given by

\hat{R} (β, γ) = s i g n (\hat{I} (β, γ)) \cdot \min {\frac{∣ \hat{I} (β, γ) ∣}{\sqrt{\hat{Q} (β) \hat{Q} (γ)}} 1 {\hat{Q} (β) \hat{Q} (γ) > 0}, 1},

(19)

where $\hat{I} (β, γ)$ , $\hat{Q} (β)$ and $\hat{Q} (γ)$ are estimators of $〈 β, γ 〉$ , $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ defined in (12), (16) and (17), respectively. It is possible that one of $\hat{Q} (β)$ and $\hat{Q} (γ)$ is 0 if $∥ β ∥_{2}^{2}$ and $∥ γ ∥_{2}^{2}$ are close to zero. In this case, the normalized inner product $R (β, γ)$ is estimated as 0. Since $R (β, γ)$ is always between −1 and 1, the estimator $\hat{R} (β, γ)$ is truncated to ensure that it is within the range. The FDE algorithm without sample splitting for calculating the estimators $\hat{I} (β, γ)$ , $\hat{Q} (β)$ , $\hat{Q} (γ)$ , and $\hat{R} (β, γ)$ is detailed in Table 1.

3. Theoretical Analysis

3.1. Upper Bound Analysis

The samples sizes $n_{1}$ and $n_{2}$ are assumed to be of the same order, that is $n_{1} ≍ n_{2}$ . Let $n = min {n_{1}, n_{2}}$ be the smallest of two sample sizes. The following assumptions are introduced to facilitate the theoretical analysis.

(A1) The population covariance matrices $Σ$ and $Γ$ satisfy $1 / M_{1} \leq λ_{\min} (Σ) \leq λ_{\max} (Σ) \leq M_{1}$ , and $1 / M_{1} \leq λ_{\min} (Γ) \leq λ_{\max} (Γ) \leq M_{1}$ , where $M_{1} \geq 1$ is a positive constant. The random design matrix $X$ is assumed to be independent of the other random design matrix $Z$ . The noise levels $σ_{1}$ and $σ_{2}$ satisfy $max {σ_{1}, σ_{2}} \leq M_{2}$ , where $M_{2} > 0$ is a positive constant.

(A2) The $ℓ_{2}$ norms of the coefficient vectors $β$ and $γ$ are bounded away from zero in the sense that

\min {∥ β ∥_{2}, ∥ γ ∥_{2}} \geq η_{0} \geq C \sqrt{k \log p / n}, where k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} .

(20)

Assumption (A1) places a condition on the spectrum of the covariance matrices $Σ$ and $Γ$ and an upper bound on the noise levels $σ_{1}$ and $σ_{2}$ . Assumption (A2) requires that the total strengths of the signals have to be bounded away from zero by $η_{0}$ , which is only used in the upper bound analysis of the normalized inner product $R (β, γ)$ .

The following theorem establishes the convergence rates of the estimators $\hat{I} (β, γ)$ , $\hat{Q} (β)$ , and $\hat{Q} (γ)$ , proposed in (12), (16) and (17), respectively.

Theorem 1.

Suppose the assumption (A1) holds and $k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} \leq c n / \log p$ for some $c > 0$ . Then for any fixed constant $0 < α < 1 / 4$ , with probability at least $1 - 4 α - p^{- c_{0}}$ , we have

| \hat{I} (β, γ) - I (β, γ) | ≲ (∥ β ∥_{2} + ∥ γ ∥_{2}) (\frac{z_{α / 2}}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n},

(21)

| \hat{Q} (β) - Q (β) | ≲ ∥ β ∥_{2} (\frac{z_{α / 2}}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n},

(22)

| \hat{Q} (γ) - Q (γ) | ≲ ∥ γ ∥_{2} (\frac{z_{α / 2}}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n},

(23)

where $c_{0}$ is a positive constant.

The upper bound of estimating $〈 β, γ 〉$ not only depends on $k$ , $n$ and $p$ , but also scales to the signal strengths $∥ β ∥_{2}$ and $∥ γ ∥_{2}$ . For the estimation of the quadratic functional $Q (β)$ (or $Q (γ)$ ), the convergence rate depends on $∥ β ∥_{2}$ (or $∥ γ ∥_{2}$ ). The following theorem establishes the convergence rate of the estimator $\hat{R} (β, γ)$ proposed in (19).

Theorem 2.

Suppose the assumptions (A1) and (A2) hold and $k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} \leq c n / \log p$ for some $c > 0$ . Then for any fixed constant $0 < α < 1 / 4$ , with probability at least $1 - 4 α - p^{- c_{0}}$ , we have

| \hat{R} (β, γ) - R (β, γ) | ≲ \frac{1}{η_{0}} (\frac{z_{α / 2}}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{1}{η_{0}^{2}} \frac{k \log p}{n},

(24)

where $c_{0}$ is a positive constant.

In contrast to Theorem 1, Theorem 2 requires the extra assumption (A2) on the signal strengths $∥ β ∥_{2}$ and $∥ γ ∥_{2}$ . The convergence rate of estimating $R (β, γ)$ is scaled to the inverse of the signal strength, $1 / η_{0}$ . This is different from the error bound in Theorem 1, where the estimation accuracy is scaled to the signal strength. The lower bound results established in Theorem 3 will demonstrate the necessity of Assumption (A2) for estimation of $R (β, γ)$ .

3.2. Minimax Lower Bounds

This section establishes the minimax lower bounds of estimating $I (β, γ)$ , $Q (β)$ , $Q (γ)$ and $R (β, γ)$ . We first introduce parameter spaces for $θ = (β, Σ, σ_{1}, γ, Γ, σ_{2})$ , which is defined as the product of parameter spaces for $θ_{β} = (β, Σ, σ_{1})$ and $θ_{γ} = (γ, Γ, σ_{2})$ . We define the following parameter space for both $θ_{β} = (β, Σ, σ_{1})$ and $θ_{γ} = (γ, Γ, σ_{2})$ ,

𝓖 (k, M_{0}) = {(β, Σ, σ_{1}) : ∥ β ∥_{0} \leq k, ∥ β ∥_{2} \leq M_{0}, \frac{1}{M_{1}} \leq λ_{\min} (Σ) \leq λ_{\max} (Σ) \leq M_{1}, σ_{1} \leq M_{2}},

(25)

where $M_{1} \geq 1$ and $M_{2} > 0$ are positive constants. The parameter space defined in (25) requires that the signal $β$ contains less than $k$ non-zero coefficients and the $ℓ_{2}$ norm $∥ β ∥_{2}$ is upper bounded by $M_{0}$ , where $M_{0}$ is allowed to grow with $n$ and $p$ . The lower bound results in Theorem 3 show that the estimation difficulties of $I (β, γ)$ , $Q (β)$ and $Q (γ)$ depend on $M_{0}$ . The other conditions $1 / M_{1} \leq λ_{min} (Σ) \leq λ_{max} (Σ) \leq M_{1}$ and $σ_{1} \leq M_{2}$ are regularity conditions. Based on the definition (25), the parameter space for $(β, Σ, σ_{1}, γ, Γ, σ_{2})$ is defined as a product of two parameter spaces,

Θ (k, M_{0}) = {θ = (β, Σ, σ_{1}, γ, Γ, σ_{2}) : (β, Σ, σ_{1}) \in 𝓖 (k, M_{0}), (γ, Γ, σ_{2}) \in 𝓖 (k, M_{0})} .

(26)

For establishing optimal bounds of $R (β, γ)$ , we define the following parameter space

Θ (k, η_{0}) = {θ = (β, Σ, σ_{1}, γ, Γ, σ_{2}) : (β, Σ, σ_{1}) \in 𝓖 (k, η_{0}), (γ, Γ, σ_{2}) \in 𝓖 (k, η_{0})},

(27)

where

𝓖 (k, η_{0}) = {(β, Σ, σ_{1}) : ∥ β ∥_{0} \leq k, ∥ β ∥_{2} \geq η_{0}, \frac{1}{M_{1}} \leq λ_{\min} (Σ) \leq λ_{\max} (Σ) \leq M_{1}, σ_{1} \leq M_{2}},

with $η_{0} \geq 0$ . In contrast to the parameter space $𝓖 (k, M_{0})$ where $∥ β ∥_{2}$ is upper bounded by $M_{0}$ , the parameter space $𝓖 (k, η_{0})$ requires the signal strength $∥ β ∥_{2}$ to be lower bounded by $η_{0}$ , where $η_{0}$ is allowed to grow with $n$ and $p$ . The lower bound in Theorem 3 shows that the estimation difficulty of $R (β, γ)$ depends on $η_{0}$ .

The following theorem establishes the minimax lower bounds for the convergence rates of estimating the inner product $I (β, γ)$ , the quadratic functionals $Q (β)$ and $Q (γ)$ and the normalized inner product $R (β, γ)$ .

Theorem 3.

Suppose $k \leq c \min {n / \log p, p^{ν}}$ for some constants $c > 0$ and $0 \leq ν < \frac{1}{2}$ . Then

\inf_{\tilde{I}} \sup_{θ \in Θ (k, M_{0})} ℙ_{θ} (| \tilde{I} - I (β, γ) | ≳ \min {M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}, M_{0}^{2}}) \geq \frac{1}{4},

(28)

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | ≳ \min {M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}, M_{0}^{2}}) \geq \frac{1}{4},

(29)

\inf_{\tilde{Q}} \sup_{θ_{γ} \in 𝓖 (k, M_{0})} ℙ_{θ_{γ}} (| \tilde{Q} - Q (γ) | ≳ \min {M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}, M_{0}^{2}}) \geq \frac{1}{4},

(30)

\inf_{\tilde{R}} \sup_{θ \in Θ (k, η_{0})} ℙ_{θ} (| \tilde{R} - R (β, γ) | ≳ \min {\frac{1}{η_{0}} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{1}{η_{0}^{2}} \frac{k \log p}{n}, 1}) \geq \frac{1}{4} .

(31)

Remark 4.

Estimation of quadratic functionals has been extensively studied in the classical Gaussian sequence model. See, for example, Donoho and Nussbaum (1990); Efromovich and Low (1996); Laurent and Massart (2000); Cai and Low (2005, 2006); Collier et al. (2015) for details. In the regime $k \leq c min {n / log p, p^{ν}}$ for some constants $c > 0$ and $0 < v < \frac{1}{2}$ , Theorem 2 in (Collier et al., 2015) gives a lower bound, $\min {M_{0} / \sqrt{n} + k \log p / n, M_{0}^{2}}$ , for estimating ${∥ β ∥}_{2}^{2}$ in the sequence model. In contrast, an extra term $M_{0} k log p / n$ appears in the lower bound given in (29) for estimating ${∥ β ∥}_{2}^{2}$ in high-dimensional linear regression. One intuitive reason for this extra term is that high-dimensional linear regression is involved with an extra inverse process than Gaussian sequence model. Estimation of the quadratic functional ${∥ β ∥}_{2}^{2}$ in high-dimensional linear regression is fundamentally harder than that in the Gaussian sequence model. For the high-dimensional linear regression, the estimation lower bound $k log p / n$ in (29) can also be established by the general lower bounds developed in Cai and Guo (2017a). See Section 8 in Cai and Guo (2017a) for details.

3.3. Optimality of FDEs

In this section, we establish the optimality of FDEs by combining Theorems 1 and 2 over the parameter spaces $Θ (k, M_{0})$ and $Θ (k, η_{0})$ defined in (26) and (27), respectively.

Corollary 1.

Suppose $k \leq c min {n / log p, p^{ν}}$ and $M_{0} \geq η_{0} \geq C \sqrt{k \log p / n}$ for some constants $C$ , $c > 0$ and $0 \leq ν < \frac{1}{2}$ . Then

\sup_{θ \in Θ (k, M_{0})} ℙ_{θ} (| \hat{I} - I (β, γ) | ≳ M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}) \geq 1 - 4 α - p^{- c_{0}},

(32)

\sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \hat{Q} - Q (β) | ≳ M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}) \geq 1 - 4 α - p^{- c_{0}},

(33)

\sup_{θ_{γ} \in 𝓖 (k, M_{0})} ℙ_{θ_{γ}} (| \hat{Q} - Q (γ) | ≳ M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{k \log p}{n}) \geq 1 - 4 α - p^{- c_{0}},

(34)

\sup_{θ \in Θ (k, η_{0})} ℙ_{θ} (| \hat{R} - R (β, γ) | ≳ \frac{1}{η_{0}} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}) + \frac{1}{η_{0}^{2}} \frac{k \log p}{n}) \geq 1 - 4 α - p^{- c_{0}} .

(35)

where $0 < α < 1 / 4$ and $c_{0}$ is a positive constant.

Combined with Theorem 3, Corollary 1 implies that, for $M_{0} \geq C \sqrt{k \log p / n}$ , the estimators $\hat{I} (β, γ)$ , $\hat{Q} (β)$ , $\hat{Q} (γ)$ proposed in (12),(16) and (17) achieve the minimax lower bounds (28), (29) and (30) within a constant factor, that is, FDEs are minimax rate-optimal. On the other hand, if $M_{0} ≪ \sqrt{k \log p / n}$ , estimation of $I (β, γ)$ , $Q (β)$ and $Q (γ)$ is uninteresting as the trivial estimator 0 achieves the minimax lower bound in this case. For estimation of $R (β, γ)$ , under the assumption $η_{0} \geq C \sqrt{k \log p / n}$ , Corollary 1 shows that the estimator $\hat{R} (β, γ)$ given in (19) achieves the minimax lower bound $1 / η_{0} \times (1 / \sqrt{n} + k \log p / n) + 1 / η_{0}^{2} \times k \log p / n$ in (31). Hence $\hat{R} (β, γ)$ is the rate-optimal estimator of $R (β, γ)$ under the assumption (A2). When $η_{0} ≪ \sqrt{k \log p / n}$ , estimation of $R (β, γ)$ becomes trivial as the simple estimator 0 attains the minimax lower bound. This demonstrates the necessity of the assumption (A2) in Theorem 2.

4. Simulation Evaluations and Comparisons

We compare the finite-sample performance of several estimators of $I (β, γ)$ and $R (β, γ)$ using simulations. These estimators included plug-in scaled Lasso estimator (Sun and Zhang, 2012), plug-in de-biased estimator (Javanmard and Montanari, 2014; van de Geer et al., 2014; Zhang and Zhang, 2014), plug-in thresholded estimator (Zhang and Zhang, 2014, Section 3.3) and the proposed estimator FDE. Specifically, they are defined as

FDE: The inner product $I (β, γ)$ is estimated by $\hat{I} (β, γ)$ in (12) and the ratio $R (β, γ)$ is estimated by $\hat{R} (β, γ)$ in (19). We consider FDE with sample splitting (FDE-S) and without sample splitting (FDE-NS) for $\hat{R} (β, γ)$
Plug-in scaled Lasso estimator (Lasso): The inner product $I (β, γ)$ is estimated by $〈 \hat{β}, \hat{γ} 〉$ and the normalized inner product $R (β, γ)$ is estimated by
$[〈 \hat{β}, \hat{γ} 〉 / (∥ \hat{β} ∥_{2} ∥ \hat{γ} ∥_{2})] 1 {∥ \hat{β} ∥_{2} ∥ \hat{γ} ∥_{2} > 0} .$
Plug-in de-biased estimator (De-biased): Denote the de-biased Lasso estimators as $\tilde{β}$ and $\tilde{γ}$ . The inner product $I (β, γ)$ is estimated by $〈 \tilde{β}, \tilde{γ} 〉$ and the normalized inner product $R (β, γ)$ is estimated by $[〈 \tilde{β}, \tilde{γ} 〉 / (∥ \tilde{β} ∥_{2} ∥ \tilde{γ} ∥_{2})] 1 {∥ \tilde{β} ∥_{2} ∥ \tilde{γ} ∥_{2} > 0}$ .
Plug-in thresholded estimator (Thresholded): Denote the thresholded estimators as $\bar{β}$ and $\bar{γ}$ . The inner product $I (β, γ)$ is estimated by $〈 \tilde{β}, \tilde{γ} 〉$ and the normalized inner product $R (β, γ)$ is estimated by $[〈 \bar{β}, \bar{γ} 〉 / (∥ \bar{β} ∥_{2} ∥ \bar{γ} ∥_{2})] 1 {∥ \bar{β} ∥_{2} ∥ \bar{γ} ∥_{2} > 0}$ .

Implementation of the de-biased, thresholded and FDE estimators requires the scaled Lasso estimators $\hat{β}$ and $\hat{γ}$ in the initial step. The scaled Lasso estimator is implemented by the equivalent square-root Lasso algorithm (Belloni et al., 2011). The theoretical tuning parameter is $λ_{0} = \sqrt{2.01 \log p / n}$ , which may be conservative in the numerical studies. Instead, the tuning parameter is chosen as $λ_{0} = b \sqrt{2.01 \log p}$ . However, the performances of all estimators are evaluated across a grid of tuning parameter values $b \in {.25, .5, .75, 1}$ (see Supplementary Material, Section A.1). The results showed that b = .5 was a good choice for all the estimators. Hence, $λ_{0} = .5 \sqrt{2.01 \log p / n}$ was used for the numerical studies in this section and Section 5. To implement the FDE algorithm, the other tuning parameter $λ$ is chosen as $\sqrt{2.01 \log p / n}$ for the correction Steps 2,3,5 and 6 in Table 1.

Comparisons of estimates of $I (β, γ)$ and $R (β, γ)$ are presented below. Results on estimating the quadratic functionals are presented in the Supplementary Material, Section A.2. For each setting, with the parameters $(p, n_{1}, n_{2}, s, s_{1}, s_{2})$ , $Σ$ , $Γ$ , $F_{β}$ , $F_{γ}$ specified, we generate the data and compare different methods as follows,

Generate sets $S_{1} \subset [p]$ and $S_{2} \subset [p]$ , with $| S_{1} | = s_{1}$ , $| S_{2} | = s_{2}$ and $| S_{1} \cap S_{2} | = s$ . For $β \in R^{p}$ and $γ \in R^{p}$ , generate $β_{j} \sim F_{β}$ and $γ_{l} \sim F_{γ}$ , for $j \in S_{1}$ , $l \in S_{2}$ , and set $β_{j} = 0$ and $γ_{l} = 0$ , for $j \notin S_{1}$ , $l \notin S_{2}$ .
Generate $X_{i} \overset{i.i.d}{\sim} N (0, Σ)$ , $1 \leq i \leq n_{1}$ , and $Z_{i} \overset{i.i.d}{\sim} N (0, Γ)$ , $1 \leq i \leq n_{2}$ .
Generate the noise $ϵ_{i} \overset{i.i.d}{\sim} N (0, 1)$ , $1 \leq i \leq n_{1}$ , and $δ_{i} \overset{i.i.d}{\sim} N (0, 1)$ , $1 \leq i \leq n_{2}$ . Generate the outcome as $y = Xβ + ϵ$ and $w = Zγ + δ$ .
With $X$ , $y$ , $Z$ , and $w$ , estimate $I (β, γ)$ and $R (β, γ)$ through different estimators.
Repeat 2–4 for $L$ times.

We evaluate the performance of an estimator by the mean squared error (MSE), which is defined as

MSE (\hat{T}) = \frac{1}{L} \sum_{l = 1}^{L} {(\hat{T} (X, y, Z, w; l) - T)}^{2},

(36)

for a given quantity T and its estimate $\hat{T} (X, y, Z, w; l)$ from l-th replication. We consider two different settings with two sets of parameters $(p, n_{1}, n_{2}, s, s_{1}, s_{2})$ , $Σ$ , $Γ$ , $F_{β}$ and $F_{γ}$ and the simulation for each setting is repeated L = 300 times.

Experiment 1.

The parameters are set as follows, $(p, n_{1}, n_{2}) = (600, 400, 400)$ , the sparsity parameters $(s, s_{1}, s_{2}) = (15, 30, 25)$ , and the covariance matrices $Σ$ and $Γ$ satisfy $Σ_{i j} = Γ_{i j} = {(0.8)}^{| i - j |}$ . For given positive values $τ_{1}$ and $τ_{2}$ , the signals of $β$ satisfy that $β_{j i} = (1 + i / s_{1}) τ_{1} / 2$ , for $j_{i} \in S_{1}$ , $i = 1, 2, \cdot \cdot \cdot, s_{1}$ , and the signals in $γ$ satisfy that $γ_{j} = τ_{2}$ for $j \in S_{2}$ . This simulation aims to investigate the case where the coefficients for one regression are much larger than the other by varying the signal strength parameters as $(τ_{1}, τ_{2}) \in {(3.0,.1),(2.6,.2),(2.2,.3),(1.8,.4),(.1,1.6),(.2,1.4),(.3,1.2),(.4,1.0)}$ .

The results are summarized in Table 2. For all combinations of the signal strength parameters, in terms of estimating the inner product $I (β, γ)$ , FDE consistently outperformed the plug-in estimates with Lasso and Thresholded Lasso. Moreover, with increasing difference between $τ_{1}$ and $τ_{2}$ , the advantage of FDE over the plugin estimate using Lasso or thresholded Lasso became larger. The same results were observed for estimation of the normalized inner product $R (β, γ)$ , where FDE-NS had consistent better performance than other methods. Although De-biased performed well in terms of estimating $I (β, γ)$ , it performed much worse than FDE-NS for estimating $R (β, γ)$ .

Table 2:

Mean square errors (MSE) of the estimates of the inner product $I (β, γ)$ and the normalized inner product $R (β, γ)$ for various signal strength parameters. Lasso: plugin estimator with the scaled Lasso estimator; De-biased: plug-in estimator with the de-biased estimator; Thresholded: plug-in estimator with the thresholded estimator; FDE: the proposed estimator $\hat{I} (β, γ)$ ; FDE-S: the proposed estimator $\hat{R} (β, γ)$ with sample splitting; FDE-NS: the proposed estimator $\hat{R} (β, γ)$ without sample splitting.

		Strength parameters, $(τ_{1}, τ_{2})$
		(1.8, .4)	(2.2, .3)	(2.6, .2)	(3, .1)	(.1, 1.6)	(.2, 1.4)	(.3, 1.2)	(.4, 1)
$I (β, γ)$	Truth	8.088	7.414	5.841	3.370	1.797	3.145	4.044	4.493

MSE
	Lasso	9.295	11.564	12.560	7.279	2.377	4.889	5.409	4.800
	De-biased	1.733	2.191	2.324	1.386	.449	.838	.985	.886
	Thresholded	2.029	3.377	6.463	5.789	1.877	3.024	2.432	1.546
	FDE	1.847	2.471	2.662	2.118	.734	.995	1.028	.986

$R (β, γ)$	Truth	.5314	.5314	.5314	.5314	.5314	.5314	.5314	.5314

MSE
	Lasso	.0023	.0075	.0332	.1260	.1574	.0624	.0227	.0087
	De-biased	.0208	.0415	.0864	.1590	.1736	.1068	.0627	.0373
	Thresholded	.0045	.0139	.0585	.1753	.0964	.0981	.0389	.0153
	FDE-S	.0337	.0303	.0621	.0678	.2130	.1199	.0694	.0616
	FDE-NS	.0036	.0064	.0163	.0580	.0892	.0237	.0116	.0061

Open in a new tab

As discussed in Section 2, the sample splitting of estimating the normalized inner product is simply proposed to facilitate the theoretical analysis and might not be necessary for the algorithm. Our simulation results indicated that the proposed estimator without sample splitting (FDE-NS) performed quite well in all settings, even better than FDE-S, due to the fact that more samples were used for estimation and correction steps. Such observations led us to use the proposed estimator without sample splitting (FDE-NS) in the real data analysis in Section 5.

Experiment 2.

The parameters are set as follows, $(p, n_{1}, n_{2}) = (800, 400, 400)$ , signal strength parameters $(τ_{1}, τ_{2}) = (.2, .1)$ and the covariance matrices $Σ$ and $Γ$ satisfy $Σ_{i j} = Γ_{i j} = {(0.8)}^{| i - j |}$ . The signals of $β$ follow that $β_{j i} = (1 + i / s_{1}) τ_{1} / 2$ , for $j_{i} \in S_{1}$ , $i = 1, 2, \cdot \cdot \cdot, s_{1}$ , and the signals in $γ$ satisfy that $γ_{j} = τ_{2}$ for $j \in S_{2}$ . This simulation setting is set to investigate the relationship between the performance of estimators and the signal sparsity level and vary the sparsity levels of $β$ and $γ$ as $(s_{2}, s_{2}) \in {(40, 40), (50, 50), (60, 60), (70, 70), (80, 80), (90, 90), (100, 100), (110, 110)}$ , and fix the number of common signals at $s = 20$ . Since the number of the associated variants is very large for both coefficient vectors, large values of $τ_{1}$ and $τ_{2}$ would induce strong signals such that all the methods perform well. Instead, we consider a more challenging setting where the signal magnitude is small, that is $τ_{1} = .2$ and $τ_{2} = .1$ .

The results are summarized in Table 3. Clearly, FDE outperformed the other methods. When the signals became denser, the improvement of FDE over other methods was more pronounced. For estimation of $R (β, γ)$ , the results showed that FDE-NS consistently outperformed other estimators. As the number of signals increased, the MSE corresponding to FDE-NS decreased quickly.

Table 3:

Mean square errors (MSE) of the estimates of the inner product $I (β, γ)$ and the normalized inner product $R (β, γ)$ for various sparsity parameters. Lasso: plug-in estimator with the scaled Lasso estimator; De-biased: plug-in estimator with the de-biased estimator; Thresholded: plug-in estimator with the thresholded estimator; FDE: the proposed estimator $\hat{I} (β, γ)$ ; FDE-S: the proposed estimator $\hat{R} (β, γ)$ with sample splitting; FDE-NS: the proposed estimator $\hat{R} (β, γ)$ without sample splitting.

		Sparsity parameter, $s_{1} (s_{2} = s_{1})$
		40	50	60	70	80	90	100	110
$I (β, γ)$	Truth	.190	.170	.219	.212	.179	.221	.183	.221

MSE
	Lasso	.032	.025	.039	.036	.024	.035	.023	.028
	De-biased	.015	.015	.018	.017	.024	.027	.040	.066
	Thresholded	.027	.021	.031	.029	.020	.025	.018	.018
	FDE	.020	.014	.021	.022	.011	.013	.008	.008

$R (β, γ)$	Truth	.4027	.2908	.3122	.2592	.1914	.2110	.1573	.1725

MSE
	Lasso	.1157	.0517	.0539	.0370	.0166	.0180	.0097	.0079
	De-biased	.1267	.0601	.0659	.0411	.0160	.0173	.0063	.0059
	Thresholded	.1392	.0687	.0732	.0504	.0262	.0277	.0155	.0142
	FDE-S	.1154	.1225	.0779	.0574	.0456	.0499	.0450	.0493
	FDE-NS	.0847	.0340	.0368	.0294	.0115	.0091	.0055	.0047

Open in a new tab

5. Genetic Relatedness Yeast Colony Growths of Based on Genome Wide Association Data

Bloom et al. (2013) reported a large scale genome-wide association study of 46 quantitative traits based on 1,008 Saccharomyces cerevisiae segregants crossbred from a laboratory strain and a wine strain. The data set included 11,623 unique genotype markers. Since many of these markers are highly correlated and differ only in a few samples, Bloom et al. (2013) further selected a set of 4,410 markers that are weakly dependent based on the linkage disequilibrium information. Specifically, these markers were selected by picking one marker closest to each centimorgan position on the genetic map. The maker genotypes are coded as 1 or −1, according to which strain it came from and satisfy the sub-Gaussian conditions. The traits of interest were the end-point colony size normalized by the control growth under 46 different growth media, including Hydrogen Peroxide, Diamide, Calcium, Yeast Nitrogen Base (YNB) and Yeast extract Peptone Dextrose (YPD), etc. Bloom et al. (2013) showed that the genetic variants are associated with many of such trait values. It is therefore important to genetic relatedness among these related traits.

To demonstrate the genetic relatedness among these traits, eight traits were considered, including the normalized colony sizes under Calcium Chloride (Calcium), Diamide, Hydrogen Peroxide (Hydrogen), Paraquat, Raffinose, 6 Azauracil (Azauracil), YNB, and YPD. Each trait was normalized to have variance 1, so the quadratic norm represents the total genetic effects for each trait and an estimate the heritability. FDE was applied to every pair of these 8 traits without sample splitting, for a total of 28 pairs. The results are summarized in Table 4, including estimates of the heritability, genetic covariance and and genetic correlation for each of the 28 pairs. The genetic heritability of these traits ranged from 0.22 for Raffinose to 0.67 for YPD. About two thirds of these pairs had an estimated genetic correlation smaller than 0.1, indicating relatively weak genetic correlations among these traits.

Table 4:

FDE estimation for the heritability (bold diagonals), genetic covariance(upper diagonals) and genetic correlation (lower diagonals) among for each pair of 8 colony growth traits of the yeast segregants.

Traits	Calcium	Diamide	Hydrogen	Paraquat	Raffinose	Azauracil	YNB	YPD
Calcium	.3314	−.0189	−.1003	.0084	.0927	.0095	.0656	.−0134
Diamide	−.0286	.4390	.0598	−.0039	.0500	.0446	−.0159	.0803
Hydrogen	−.1579	.0942	.4033	.0576	−.1040	.0601	.0672	.0637
Paraquat	.0117	−.0053	.0799	.5199	.0023	.0365	.1148	.1029
Raffinose	.1972	.1065	−.2213	.0049	.2208	.0137	.0830	.0331
Azauracil	.0172	.0809	.1089	.0661	.0248	.3045	−.0259	.0703
YNB	.0968	−.0235	.0991	.1693	.1224	−.0383	.4594	.4246
YPD	−.0164	.0983	.0779	.1259	.0405	.0860	.5195	.6680

Open in a new tab

To further demonstrate the genetic relatedness among these pairs, for each trait, a Z-score was calculated based on regressing the trait value $y$ on genetic genetic marker $X_{\cdot j}$ , for $i \leq j \leq p$ . A larger absolute value of the Z-score statistic implies a stronger effect of the marker on the trait. For any pair of traits, the scatter plot of the Z-statistics provides a way of revealing the shared genetic relationship between them. The scatterplots of the Z-scores for all 28 pairs of traits are included in Section D of the Supplemetal Materials. Figure 1 (a) shows the plots of several pairs of the traits, including the pairs with a large positive $I (β, γ)$ , YPD v.s. YNB and Paraquat v.s. YNB, pairs with a large negative $I (β, γ)$ , Raffinose v.s. Hydrogen and Calcium v.s. Hydrogen, and pairs with $I (β, γ)$ near 0, including Paraquat v.s. Diamide and Paraquat v.s. Raffinose. The plot clearly indicates a strong positive genetic covariance between YPD and YNB. The genetic covariance between Paraquat and YNB/YPD is smaller. Raffinose/Hydrogen and Calcium/Hydrogen pair clearly show negative genetic correlation. There are several genetic variants with very large effects on Hydrogen, but they are not associated with the other traits such as Raffinose and Calcium. The shared genetic variants are relatively weak, leading to smaller genetic covariances. The plots on the bottom show the pairs of traits with weak weak genetic covariances. These plots indicate that the proposed genetic correlation measures can indeed capture the genetic sharing among different related traits.

Figure 1: — Scatter plots of marginal regression Z-score statistics for six pairs of traits ranked by the estimated genetic covariance (*gcov*) based on FDE (a) or LD regression (b), including the pairs with large positive genetic covariance (left panel), negative genetic covariance (middle panel), and small genetic covariance (right panel).

Figure 2 shows the six pairs of the phenotypes ranked by the estimated genetic correlations FDE, including two with the largest positive genetic correlations, two with the largest negative genetic correlations and two with the small genetic correlations. The pairs identified agree with the marginal Z-scores very well.

As a comparison, we also obtained the estimated genetic covariance for each pair of the traits using the LD regression methods proposed by Bulik-Sullivan et al. (2015). The pairs of traits with large positive, negative or weak estimated covariance are presented in Figure 1 (b). The pairs with the largest positive and negative estimated covariance are different from those two pairs identified by FDE. Comparison of the scatter plots of the Z-score statistics in Figure 1 indicates the pairs identified by FDE seem to agree with the marginal Z-statistics better.

6. Discussion

Motivated by the problems of estimating the genetic relatedness between two traits using the GWAS data, we have considered the problem of estimating the different functionals of the regression coefficients of two linear models, including the inner product $〈 β, γ 〉$ , the quadratic functionals $Q (β)$ and $Q (γ)$ and the ratio $R (β, γ)$ . The proposed method is different from plugging in the de-biased estimators proposed in Javanmard and Montanari (2014); van de Geer et al. (2014); Zhang and Zhang (2014). The correction procedures are implemented on the inner product and quadratic functionals directly, which balance the bias and variance uniquely for these functionals and hence result in minimax rate optimal estimators. The proposed estimators were shown in simulations to result in smaller estimation errors than directly plugging in these de-biased estimators across different settings. Results from analysis of the yeast segregants data suggested that the yeast colony growth sizes were under similar genetic controls under certain growth medias such as YPD and YNB, but this was not true for all pairs of growth media considered.

The algorithm for obtaining the these estimates only involves applying the Lasso several times, which can be implemented efficiently using the coordinate descent algorithms. The Matlab codes to implement the proposed estimation methods are available at http://statgene.med.upenn.edu/software.html. An important future research is to quantify uncertainty of these proposed estimators and the upper bound analysis of (21)-(23) and (24) indicates the possibility of constructing confidence intervals, centering at the proposed estimators and of parametric length $1 / \sqrt{n}$ , under additional sparsity and other regularity conditions.

7. Proofs

In this Section, we prove Theorem 1 and (29) and (30) of Theorem 3. The proofs of Theorem 2, (28) and (31) of Theorem 3 and extra lemmas are presented in the supplementary materials.

7.1. Proof of Theorem 1

For simplicity of notation, we assume $n_{1} = n_{2}$ and use $n = n_{1} = n_{2}$ to represent the sample size throughout the proof. The proofs can be easily generalized to the case $n_{1} ≍ n_{2}$ . Without loss of generality, we assume that the Sub-gaussian norm of random vectors $X_{i \cdot}$ and $Z_{i \cdot}$ are also upper bounded by $\sqrt{M_{1}}$ , that is, $\max {∥ X_{i \cdot} ∥_{ψ_{2}}^{2}, ∥ Z_{i \cdot} ∥_{ψ_{2}}^{2}} \leq M_{1}$ .

Proof of (21)

The upper bound is based on the following decomposition,

\hat{I} (β, γ) - I (β, γ) = 〈 \hat{β}, \hat{γ} 〉 + {\hat{u}}_{1}^{⊤} \frac{1}{n} X^{⊤} (y - X \hat{β}) + {\hat{u}}_{2}^{⊤} \frac{1}{n} Z^{⊤} (w - Z \hat{γ}) - 〈 β, γ 〉 = ({\hat{u}}_{1}^{⊤} \frac{1}{n} X^{⊤} (y - X \hat{β}) - 〈 \hat{γ}, β - \hat{β} 〉) + ({\hat{u}}_{2}^{⊤} \frac{1}{n} Z^{⊤} (w - Z \hat{γ}) - 〈 \hat{β}, γ - \hat{γ} 〉) - 〈 \hat{β} - β, \hat{γ} - γ 〉 = {\hat{u}}_{1}^{⊤} \frac{1}{n} X^{⊤} ϵ + ({\hat{u}}_{1}^{⊤} \hat{Σ} - {\hat{γ}}^{⊤}) (β - \hat{β}) + {\hat{u}}_{2}^{⊤} \frac{1}{n} Z^{⊤} δ + ({\hat{u}}_{2}^{⊤} \hat{Γ} - {\hat{β}}^{⊤}) (γ - \hat{γ}) - 〈 \hat{β} - β, \hat{γ} - γ 〉 .

(37)

The following lemmas are introduced to control the terms in (37) and similar results were established in the analysis of Lasso, scaled Lasso and de-biased Lasso (Cai and Guo, 2017b; Ren et al., 2015; Sun and Zhang, 2012; Ye and Zhang, 2010). The proofs of the following lemmas can be found in the supplementary material, Section D.

Lemma 1.

Suppose the assumption (A1) holds and $k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} \leq c n / \log p$ for some $c > 0$ . Then with probability at least $1 - p^{- c_{0}}$ , we have

∥ \hat{β} - β ∥_{1} \leq C k \sqrt{\frac{\log p}{n}}, ∥ \hat{γ} - γ ∥_{1} \leq C k \sqrt{\frac{\log p}{n}},

(38)

∥ \hat{β} - β ∥_{2} \leq C \sqrt{\frac{k \log p}{n}}, ∥ \hat{γ} - γ ∥_{2} \leq C \sqrt{\frac{k \log p}{n}},

(39)

where $c_{0}$ and $C$ are positive constants.

Lemma 2.

Suppose the assumption (A1) holds and $k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} \leq c n / \log p$ for some $c > 0$ . Then with probability at least $1 - 2 α - p^{- c_{0}}$ , we have

| ({\hat{u}}_{1}^{⊤} \hat{Σ} - {\hat{γ}}^{⊤}) (β - \hat{β}) | \leq C ∥ \hat{γ} ∥_{2} \frac{∥ β ∥_{0} \log p}{n} a n d | ({\hat{u}}_{2}^{⊤} \hat{Γ} - {\hat{β}}^{⊤}) (γ - \hat{γ}) | \leq C ∥ \hat{β} ∥_{2} \frac{∥ γ ∥_{0} \log p}{n};

(40)

| {\hat{u}}_{1}^{⊤} \frac{1}{n} X^{⊤} ϵ | \leq C ∥ \hat{γ} ∥_{2} \frac{z_{α / 2}}{\sqrt{n}}, a n d | {\hat{u}}_{2}^{⊤} \frac{1}{n} Z^{⊤} δ | \leq C ∥ \hat{β} ∥_{2} \frac{z_{α / 2}}{\sqrt{n}},

(41)

where $c_{0}$ and $C$ are positive constants.

By the decomposition (37) and the inequalities (39), (40) and (41), we obtain that

| \hat{I} (β, γ) - I (β, γ) | \leq C (∥ \hat{β} ∥_{2} + ∥ \hat{γ} ∥_{2}) (\frac{z_{α / 2}}{\sqrt{n}} + \frac{k \log p}{n}) + C \frac{k \log p}{n} .

By (39), we establish (21).

Proof of (22) and (23)

The proof of (23) is similar to that of (22) and only the proof of (22) is present in the following. We introduce the estimator $\bar{Q} (β) = ∥ \hat{β} ∥_{2}^{2} + 2 {\hat{u}}_{3}^{⊤} {(X^{(2)})}^{⊤} (y^{(2)} - X^{(2)} \hat{β}) / (n / 2)$ and due to the fact that $Q (β)$ is non-negative, we have $| \hat{Q} (β) - Q (β) | \leq | \bar{Q} (β) - Q (β) |$ . We decompose the difference between $\bar{Q} (β)$ and $Q (β)$ ,

\bar{Q} (β) - Q (β) = ∥ \hat{β} ∥_{2}^{2} - ∥ β ∥_{2}^{2} + 2 {\hat{u}}_{3}^{⊤} \frac{1}{n / 2} {(X^{(2)})}^{⊤} (y^{(2)} - X^{(2)} \hat{β}) = 2 ({\hat{u}}_{3}^{⊤} {\hat{Σ}}^{(2)} - {\hat{β}}^{⊤}) (β - \hat{β}) + 2 {\hat{u}}_{3}^{⊤} \frac{1}{n / 2} {(X^{(2)})}^{⊤} ϵ^{(2)} - ∥ \hat{β} - β ∥_{2}^{2} .

Combined with the above argument, the upper bound (22) follows from (39) and the following lemma, whose proof can be found in the supplementary material Section D.

Lemma 3.

Suppose the assumption (A1) holds and $k = \max {∥ β ∥_{0}, ∥ γ ∥_{0}} \leq c \frac{n}{\log p}$ for some $c > 0$ . Then with probability at least $1 - p^{- c_{0}} - α$ ,

| {\hat{u}}_{3}^{⊤} \frac{1}{n / 2} {(X^{(2)})}^{⊤} ϵ^{(2)} | \leq C ∥ \hat{β} ∥_{2} \frac{z_{α / 2}}{\sqrt{n}},

(42)

| ({\hat{u}}_{3}^{⊤} {\hat{Σ}}^{(2)} - {\hat{β}}^{⊤}) (β - \hat{β}) | \leq C ∥ \hat{β} ∥_{2} \frac{k \log p}{n},

(43)

where $c_{0}$ and $C$ are positive constants.

7.2. Proof of (29) and (30) in Theorem 3

We first introduce the notations used in the proof of lower bound results. Let $π$ denote the prior distribution supported on the parameter space $𝓗$ . Let $f_{π} (z)$ denote the density function of the marginal distribution of the random variable $Z$ with the prior $π$ on $𝓗$ . More specifically, $f_{π} (z) = \int f_{θ} (z) π (θ) d θ$ . We define the $χ^{2}$ distance between two density functions $f_{1}$ and $f_{0}$ by

χ^{2} (f_{1}, f_{0}) = \int \frac{{(f_{1} (z) - f_{0} (z))}^{2}}{f_{0} (z)} d z = \int \frac{f_{1}^{2} (z)}{f_{0} (z)} d z - 1

(44)

and the $L_{1}$ distance by $L_{1} (f_{1}, f_{0}) = \int | f_{1} (z) - f_{0} (z) | d z$ . It is well known that

L_{1} (f_{1}, f_{0}) \leq \sqrt{χ^{2} (f_{1}, f_{0})} .

(45)

The proof of the lower bound is based on the following version of Le Cam’s Lemma (LeCam (1973); Yu (1997); Ren et al. (2015)).

Lemma 4.

Let $T (θ)$ denote a functional on $θ$ . Suppose that $𝓗_{0} = {θ_{0}}$ , $𝓗_{0}$ , $𝓗_{1} \subset Θ$ and $d = \min_{θ \in 𝓗_{1}} | T (θ) - T (θ_{0}) |$ . Let $π$ denote a prior on the parameter space $𝓗_{1}$ . Then we have

\inf_{\hat{T}} \sup_{θ \in ℋ_{0} \cup ℋ_{1}} ℙ_{θ} (| \hat{T} - T (θ) | \geq \frac{d}{2}) \geq \frac{1 - L_{1} (f_{π}, f_{θ_{0}})}{2} .

(46)

The proofs of (29) and (30) are applications of Lemma 4. The key is to construct the parameter spaces $𝓗_{0} = {θ_{0}}$ , $𝓗_{1}$ and the prior on $𝓗_{1}$ such that (i) $𝓗_{0}$ , $𝓗_{1} \subset Θ$ , (ii) the $L_{1}$ distance $L_{1} (f_{π}, f_{θ_{0}})$ is controlled and (iii) the distance $d = \min_{θ \in 𝓗_{1}} | T (θ) - T (θ_{0}) |$ is maximized. In the following, we provide the detailed proof of (29). The proof of (30) is similar to that of (29) and is omitted here. In the discussion of lower bound results, we will assume that the design $X_{i \cdot}$ and $Z_{i \cdot}$ follow joint normal distribution with zero means. The lower bound (29) can be decomposed into the following three lower bounds,

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | \geq c^{'} \min {\frac{k \log p}{n}, M_{0}^{2}}) \geq \frac{1}{4} .

(47)

For $M_{0} \geq C \sqrt{k \log p / n}$ , then

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | \geq c^{'} M_{0} \frac{k \log p}{n}) \geq \frac{1}{4},

(48)

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | \geq c^{'} M_{0} \frac{1}{\sqrt{n}}) \geq \frac{1}{4},

(49)

where $c^{'} > 0$ is a positive constant. For $M_{0} \geq C \sqrt{k \log p / n}$ , combining (48), (49) and (47), we have

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | ≳ \max {M_{0} (\frac{1}{\sqrt{n}} + \frac{k \log p}{n}), \frac{k \log p}{n}}) \geq \frac{1}{4} .

(50)

For $M_{0} \geq C \sqrt{k \log p / n}$ , by (47), we have

\inf_{\tilde{Q}} \sup_{θ_{β} \in 𝓖 (k, M_{0})} ℙ_{θ_{β}} (| \tilde{Q} - Q (β) | \geq c^{'} M_{0}^{2}) \geq \frac{1}{4} .

(51)

We can establish (29) by combining (50) and (51). In the following, we will establish the lower bounds (48), (49) and (47) separately.

Proof of (48)

Under the Gaussian random design model, $V_{i} = (y_{i}, X_{i}) \in R^{p + 1}$ follows a joint Gaussian distribution with mean $0$ . Let $Σ^{v}$ denote the covariance matrix of $V_{i}$ . For the indices of $Σ^{v}$ , we use 0 as the index of $y_{i}$ and ${1, \cdot \cdot \cdot, p}$ as the indices for $(X_{i 1}, \dots, X_{i p}) \in R^{p}$ . Decompose $Σ^{v}$ into blocks $(\begin{matrix} Σ_{y y}^{v} & {(Σ_{x y}^{v})}^{⊤} \\ Σ_{x y}^{v} & Σ_{x x}^{v} \end{matrix})$ , where $Σ_{y y}^{v}$ , $Σ_{x x}^{v}$ and $Σ_{x y}^{v}$ denote the variance of $y_{i}$ , the variance of $X_{i \cdot}$ and the covariance of $y_{i}$ and $X_{i \cdot}$ , respectively. Let $Ω = Σ^{- 1}$ denote the precision matrix. There exists a bijective function $h : Σ^{v} \to (β, Ω, σ_{1})$ and the inverse mapping $h^{- 1} : (β, Ω, σ_{1}) \to Σ^{v}$ , where $h^{- 1} ((β, Ω, σ_{1})) = (\begin{matrix} β^{⊤} Ω^{- 1} β + σ_{1}^{2} & β^{⊤} Ω^{- 1} \\ Ω^{- 1} β & Ω^{- 1} \end{matrix})$ and

h (Σ^{v}) = ({(Σ_{x x}^{v})}^{- 1} Σ_{x y}^{v}, {(Σ_{x x}^{v})}^{- 1}, Σ_{y y}^{v} - {(Σ_{x y}^{v})}^{⊤} {(Σ_{x x}^{v})}^{- 1} Σ_{x y}^{v}) .

(52)

Based on the bijection, it is sufficient to control the $χ^{2}$ distance between two multivariate Gaussian distributions. We introduce the null parameter space

𝓖_{0} = {θ_{0} = (β^{*}, I, σ_{0}) : β^{*} = (β_{1}^{*}, η_{0}, 0, \dots, 0) with β_{1}^{*} = - \frac{M_{0}}{2} and σ_{0} = \sqrt{\frac{M_{2}}{8}}},

where $0 \leq η_{0} \leq M_{0} / 2$ . Note that $𝓖_{0} \subset 𝓖 (k, M_{0})$ . Define $p_{1} = p - 2$ . Based on the mapping $h$ , we have the corresponding null parameter space for $Σ^{v}$ , $𝓕_{0} = {Σ_{0}^{v}}$ where

Σ_{0}^{v} = (\begin{matrix} {(β_{1}^{*})}^{2} + η_{0}^{2} + σ_{0}^{2} & β_{1}^{*} & η_{0} & 0 \\ β_{1}^{*} & 1 & 0 & 0 \\ η_{0} & 0 & 1 & 0 \\ 0 & 0 & 0 & I_{p_{1} \times p_{1}} \end{matrix}) .

We then introduce the alternative parameter space for $Σ^{v}$ , which will induce a parameter space for $(β, Ω, σ_{1})$ through the mapping $h$ . Define $𝓕_{1} = {Σ_{α}^{v} : α \in ℓ (p_{1}, k, ρ)}$ , where

Σ_{α}^{v} = (\begin{matrix} {(β_{1}^{*})}^{2} + η_{0}^{2} + σ_{0}^{2} & β_{1}^{*} & η_{0} & ρ_{0} α^{⊤} \\ β_{1}^{*} & 1 & 0 & α^{⊤} \\ η_{0} & 0 & 1 & 0 \\ ρ_{0} α & α & 0 & I_{p_{1} \times p_{1}} \end{matrix}),

(53)

with $ρ_{0} = β_{1}^{*} + σ_{0}$ and

ℓ (p_{1}, k, ρ) = {α : α \in R^{p_{1}}, ∥ α ∥_{0} = k - 2, α_{i} \in {0, ρ} for 1 \leq i \leq p_{1}} .

(54)

Then we construct the corresponding parameter space $𝓖_{1}$ for $(β, Ω, σ_{1})$ , which is induced by the mapping $h$ and the parameter space $𝓕_{1}$ ,

𝓖_{1} = {(β, Ω, σ_{1}) : (β, Ω, σ_{1}) = h (Σ^{v}) for Σ^{v} \in 𝓕_{1}} .

(55)

Similar to equation (7.15) in Cai and Guo (2017b), we can show that for $(β, Ω, σ_{1}) \in 𝓖_{1}$ , the corresponding $β_{1}$ is $\frac{- ∥ α ∥_{2}^{2} ρ_{0} + β_{1}^{*}}{1 - ∥ α ∥_{2}^{2}}$ and the difference between $β_{1}$ and $β_{1}^{*}$ is

β_{1} - β_{1}^{*} = \frac{∥ α ∥_{2}^{2} (β_{1}^{*} - ρ_{0})}{1 - ∥ α ∥_{2}^{2}} = \frac{- σ_{0} ∥ α ∥_{2}^{2}}{1 - ∥ α ∥_{2}^{2}} .

By taking $ρ = \sqrt{\log (4 p_{1} / k^{2}) / 2 n}$ , we have $| β_{1} - β_{1}^{*} | \leq C_{0} k \log p / n$ and hence $β_{1}^{2} + η_{0}^{2} + k ρ^{2} \leq M_{0}^{2}$ for $M_{0} \geq C \sqrt{k \log p / n}$ . Similar to the arguments between (7.15)-(7.18) in Cai and Guo (2017b), we can show that $𝓖_{1} \subset 𝓖 (k, M_{0})$ . Let $π$ denote the uniform prior over the parameter space $𝓖_{1}$ induced by the uniform prior of $α$ over $ℓ (p_{1}, k, ρ)$ where $ρ = \sqrt{\log (4 p_{1} / k^{2}) / 2 n}$ . The control of $L_{1} (f_{π}, f_{θ_{0}})$ is established in the following lemma, which follows from Lemma 2 of Cai and Guo (2017b) and is established in (7.21) of Cai and Guo (2017b).

Lemma 5.

Suppose that $k \leq c {n / \log p, p^{γ}}$ , where $0 \leq γ < \frac{1}{2}$ and $c$ is a sufficient small positive constant. For $ρ = \sqrt{\log (4 p_{1} / k^{2}) / 2 n}$ , we establish that $∥ α ∥_{2} \leq \min {1 - 1 / M_{1}, M_{1} - 1}$ and

L_{1} (f_{π}, f_{θ_{0}}) \leq \frac{1}{4} .

(56)

To apply Lemma 4, we consider the functional $T (θ) = ∥ β ∥_{2}^{2}$ and calculate the distance

d = | β_{1}^{2} + {(\frac{σ_{0}}{1 - ∥ α ∥_{2}^{2}})}^{2} ∥ α ∥_{2}^{2} - {(β_{1}^{*})}^{2} | = \frac{σ_{0}}{1 - ∥ α ∥_{2}^{2}} ∥ α ∥_{2}^{2} | β_{1} + β_{1}^{*} - \frac{σ_{0}}{1 - ∥ α ∥_{2}^{2}} | .

(57)

Since $β_{1}^{*} = - M_{0} / 2$ and $| β_{1} - β_{1}^{*} | ≲ k \log p / n$ , we have $β_{1} < 0$ and hence

\frac{σ}{1 - ∥ α ∥_{2}^{2}} ∥ α ∥_{2}^{2} | β_{1} + β_{1}^{*} - \frac{σ_{0}}{1 - ∥ α ∥_{2}^{2}} | \geq c k \frac{\log p}{n} σ_{0} M_{0},

where the last inequality follows from the fact that $β_{1}^{*} < 0$ , $β_{1} < 0$ and $- \frac{σ_{0}}{1 - ∥ α ∥_{2}^{2}} < 0$ . Combined with (56), an application of Lemma 4 leads to (48).

Proof of (49)

We construct the following parameter spaces,

𝓖_{0} = {θ_{0} = (β^{*}, I, σ_{0}) : β^{*} = (β_{1}^{*}, η_{0}, 0 \dots, 0), and β_{1}^{*} = \frac{M_{0}}{2}} 𝓖_{1} = {θ_{1} = (β, I, σ_{0}) : β = (β_{1}^{*} + σ_{0} \frac{\bar{ϵ}}{\sqrt{n}}, η_{0}, 0 \dots, 0)},

(58)

where $0 \leq η_{0} \leq M_{0} / 2$ and $\bar{ϵ} = \sqrt{\log (17 / 16) / 2}$ . Since $M_{0} \geq C \sqrt{k \log p / n}$ , we have ${(β_{1}^{*} + σ_{0} \frac{\bar{ϵ}}{\sqrt{n}})}^{2} + η_{0}^{2} \leq M_{0}^{2}$ and $𝓖_{0}$ , $𝓖_{1} \subset 𝓖 (k, M_{0})$ .

The proof of the following lemma can be found in the supplementary material Section D.

Lemma 6.

If $\bar{ϵ} = \sqrt{\log (17 / 16) / 2}$ , then we have

L_{1} (f_{π}, f_{θ_{0}}) \leq \frac{1}{4} .

(59)

To apply Lemma 4, we take $η_{0} = 0$ and calculate the distance

d = | β_{1}^{2} - {(β_{1}^{*})}^{2} | = | 2 β_{1}^{*} σ \frac{\bar{ϵ}}{\sqrt{n}} + σ^{2} \frac{{\bar{ϵ}}^{2}}{n} | \geq c M_{0} \frac{1}{\sqrt{n}} .

Applying Lemma 4, we establish (49).

Proof of (47)

We introduce the following null and alternative parameter spaces,

𝓖_{0} = {(β, I, σ_{1}) : β = (η_{0}, 0, 0, \dots, 0)} 𝓖_{1} = {(β, I, σ_{1}) : β = (η_{0}, 0, α) with α \in ℓ_{1} (p, k, ρ)},

(60)

where $0 \leq η_{0} \leq M_{0} / 2$ and

ℓ_{1} (p, k, ρ) = {α : α \in R^{p - 2}, ∥ α ∥_{0} = k - 1, α_{i} \in {0, ρ}} .

(61)

Let $π$ denote the prior over the parameter space $𝓖_{1}$ induced by the uniform prior of $α$ over $ℓ_{1} (p, k, ρ)$ where $ρ = \min {\sqrt{\log (4 p_{1} / {(k - 1)}^{2}) / 2 n}, \sqrt{\frac{M_{0}^{2} - η_{0}^{2}}{k - 1}}}$ . The control of $L_{1} (f_{π}, f_{θ_{0}})$ is established in the following lemma, which follows from Lemma 7 of Cai and Guo (2017c) and is established in (1.6) of Cai and Guo (2017c).

Lemma 7.

Suppose that $k \leq c {n / \log p, p^{γ}}$ , where $0 \leq γ < \frac{1}{2}$ and $c$ is a sufficient small positive constant. For $ρ = \min {\sqrt{\log (4 p_{1} / {(k - 1)}^{2}) / 2 n}, \sqrt{\frac{M_{0}^{2} - η_{0}^{2}}{k - 1}}}$ , we establish $L_{1} (f_{π}, f_{θ_{0}}) \leq \frac{1}{8}$ .

By specifying $η_{0} = 0$ , we have $𝓖_{0}$ and $𝓖_{1}$ defined in (60) are proper subspaces of the parameter space $𝓖 (k, M_{0})$ . To apply Lemma 4, we calculate the distance $d = ∥ α ∥_{2}^{2} \geq c \min {k \log p / n, M_{0}^{2}}$ . Applying Lemma 4, we establish (47).

Supplementary Material

NIHMS1501133-supplement-1.pdf^{(2.7MB, pdf)}

Acknowledgement

We would like to thank Alexandre Tsybakov for helpful discussion on Section 3.2, and the reviewer and AE for helpful comments.

Footnotes

Supplementary Material

Supplement to “Optimal Estimation of Genetic Relatedness in High-dimensional Linear Regressions”. (.pdf file)

References

Belloni A, Chernozhukov V, and Wang L. (2011), “Square-root lasso: pivotal recovery of sparse signals via conic programming,” Biometrika, 98(4), 791–806. [Google Scholar]
Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, and Kruglyak L. (2013), “Finding the sources of missing heritability in a yeast cross,” Nature, 494(7436), 234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bonnet A, Gassiat E, Lévy-Leduc C. et al. (2015), “Heritability estimation in high dimensional sparse linear mixed models,” Electronic Journal of Statistics, 9(2), 2099–2129. [Google Scholar]
Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JR, Patterson N, Robinson EB et al. (2015), “An atlas of genetic correlations across human diseases and traits,” Nature genetics,. [DOI] [PMC free article] [PubMed]
Cai TT, and Guo Z. (2017a), “Accuracy assessment for high-dimensional linear regression,” The Annals of Statistics, To appear.
Cai TT, and Guo Z. (2017b), “Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity,” The Annals of Statistics, 45(2), 615–646. [Google Scholar]
Cai TT, and Guo Z. (2017c), “Supplement to “Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity,” The Annals of Statistics, 45(2). [Google Scholar]
Cai TT, and Low MG (2005), “Nonquadratic estimators of a quadratic functional,” The Annals of Statistics, 33(6), 2930–2956. [Google Scholar]
Cai TT, and Low MG (2006), “Optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 34(5), 2298–2325. [Google Scholar]
Collier O, Comminges L, and Tsybakov AB (2015), “Minimax estimation of linear and quadratic functionals on sparsity classes,” The Annals of Statistics, To appear.
Donoho DL, and Nussbaum M. (1990), “Minimax quadratic estimation of a quadratic functional,” Journal of Complexity, 6(3), 290–323. [Google Scholar]
Efromovich S, and Low M. (1996), “On optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 24(3), 1106–1125. [Google Scholar]
Fan J, Han X, and Gu W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107(499), 1019–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Golan D, and Rosset S. (2011), “Accurate estimation of heritability in genome wide studies using random effects models,” Bioinformatics, 27, i317–i323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guo Z, Kang H, Cai TT, and Small DS (2016), “Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting,” arXiv preprint arXiv:1603.05224,.
Janson L, Barber RF, and Candes E. (2016), “EigenPrism: inference for high dimensional signal-to-noise ratios,” Journal of the Royal Statistical Society: Series B (Statistical Methodology),. [DOI] [PMC free article] [PubMed]
Javanmard A, and Montanari A. (2014), “Confidence intervals and hypothesis testing for high-dimensional regression,” The Journal of Machine Learning Research, 15(1), 2869–2909. [Google Scholar]
Laurent B, and Massart P. (2000), “Adaptive estimation of a quadratic functional by model selection,” The Annals of Statistics, 28(5), 1302–1338. [Google Scholar]
LeCam L. (1973), “Convergence of estimates under dimensionality restrictions,” The Annals of Statistics, 1(1), 38–53. [Google Scholar]
Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, Mowry BJ, Thapar A, Goddard ME, and Witte JS (2013), “Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs,” Nature Genetics, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee SH, Yang J, Goddard ME, Visscher PM, and Wray NR (2012), “Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood,” Bioinformatics, 28(19), 2540–2542. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S, and van der Wer J. (2016), “MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information,” Bioinformatics, 32(9), 1420–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maier R, Moser G, Chen G-B, Ripke S, Coryell W, Potash JB, Scheftner WA, Shi J, Weissman MM, Hultman CM et al. (2015), “Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder,” The American Journal of Human Genetics, 96(2), 283–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manolio T. (2010), “Genomewide association studies and assessment of the risk of disease,” New England Journal of Medicine, 363, 166–176. [DOI] [PubMed] [Google Scholar]
McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, and Hirschhorn JN (2008), “Genome-wide association studies for complex traits: consensus, uncertainty and challenges,” Nature Reviews Genetics, 9(5), 356–369. [DOI] [PubMed] [Google Scholar]
Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P, Ruderfer DM, McQuillin A, Morris DW et al. (2009), “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, 460(7256), 748–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ren Z, Sun T, Zhang C-H, and Zhou HH (2015), “Asymptotic normality and optimalities in estimation of large Gaussian graphical models,” The Annals of Statistics, 43(3), 991–1026. [Google Scholar]
Sun T, and Zhang C-H (2012), “Scaled sparse linear regression,” Biometrika, 101(2), 269–284. [Google Scholar]
Tibshirani R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [Google Scholar]
van de Geer S, Bühlmann P, Ritov Y, and Dezeure R. (2014), “On asymptotically optimal confidence regions and tests for high-dimensional models,” The Annals of Statistics, 42(3), 1166–1202. [Google Scholar]
Vershynin R. (2012), “Introduction to the non-asymptotic analysis of random matrices,” in Compressed Sensing: Theory and Applications, eds. Eldar Y, and Kutyniok G. Cambridge University Press, pp. 210–268. [Google Scholar]
Verzelen N, and Gassiat E. (2016), “Adaptive estimation of High-Dimensional Signal-to-Noise Ratios,” arXiv preprint arXiv:1602.08006,.
Wray NR, Goddard ME, and Visscher PM (2007), “Prediction of individual genetic risk to disease from genome-wide association studies,” Genome Research, 17(10), 1520–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu T, Chen Y, Hastie T, Sobel E, and Lange K. (2009), “Genome-wide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714721. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang L, Neale BM, Liu L, Lee SH, Wray NR, Ji N, Li H, Qian Q, Wang D, Li J. et al. (2013), “Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: Genome-wide association study of both common and rare variants,” American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 162(5), 419–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ye F, and Zhang C-H (2010), “Rate Minimaxity of the Lasso and Dantzig Selector for the `_q Loss in `_r Balls,” The Journal of Machine Learning Research, 11, 3519–3540. [Google Scholar]
Yu B. (1997), “Assouad, Fano, and Le Cam,” in Festschrift for Lucien Le Cam Springer, pp. 423–435. [Google Scholar]
Zhang C-H, and Zhang SS (2014), “Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. [Google Scholar]
Zhernakova A, van Diemen C, and Wijmenga C. (2009), “Detecting shared pathogenesis from the shared genetics of immune-related diseases,” Nature Reviews Genetics, 10, 43–45. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1501133-supplement-1.pdf^{(2.7MB, pdf)}

[R1] Belloni A, Chernozhukov V, and Wang L. (2011), “Square-root lasso: pivotal recovery of sparse signals via conic programming,” Biometrika, 98(4), 791–806. [Google Scholar]

[R2] Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, and Kruglyak L. (2013), “Finding the sources of missing heritability in a yeast cross,” Nature, 494(7436), 234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bonnet A, Gassiat E, Lévy-Leduc C. et al. (2015), “Heritability estimation in high dimensional sparse linear mixed models,” Electronic Journal of Statistics, 9(2), 2099–2129. [Google Scholar]

[R4] Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JR, Patterson N, Robinson EB et al. (2015), “An atlas of genetic correlations across human diseases and traits,” Nature genetics,. [DOI] [PMC free article] [PubMed]

[R5] Cai TT, and Guo Z. (2017a), “Accuracy assessment for high-dimensional linear regression,” The Annals of Statistics, To appear.

[R6] Cai TT, and Guo Z. (2017b), “Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity,” The Annals of Statistics, 45(2), 615–646. [Google Scholar]

[R7] Cai TT, and Guo Z. (2017c), “Supplement to “Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity,” The Annals of Statistics, 45(2). [Google Scholar]

[R8] Cai TT, and Low MG (2005), “Nonquadratic estimators of a quadratic functional,” The Annals of Statistics, 33(6), 2930–2956. [Google Scholar]

[R9] Cai TT, and Low MG (2006), “Optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 34(5), 2298–2325. [Google Scholar]

[R10] Collier O, Comminges L, and Tsybakov AB (2015), “Minimax estimation of linear and quadratic functionals on sparsity classes,” The Annals of Statistics, To appear.

[R11] Donoho DL, and Nussbaum M. (1990), “Minimax quadratic estimation of a quadratic functional,” Journal of Complexity, 6(3), 290–323. [Google Scholar]

[R12] Efromovich S, and Low M. (1996), “On optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 24(3), 1106–1125. [Google Scholar]

[R13] Fan J, Han X, and Gu W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107(499), 1019–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Golan D, and Rosset S. (2011), “Accurate estimation of heritability in genome wide studies using random effects models,” Bioinformatics, 27, i317–i323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Guo Z, Kang H, Cai TT, and Small DS (2016), “Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting,” arXiv preprint arXiv:1603.05224,.

[R16] Janson L, Barber RF, and Candes E. (2016), “EigenPrism: inference for high dimensional signal-to-noise ratios,” Journal of the Royal Statistical Society: Series B (Statistical Methodology),. [DOI] [PMC free article] [PubMed]

[R17] Javanmard A, and Montanari A. (2014), “Confidence intervals and hypothesis testing for high-dimensional regression,” The Journal of Machine Learning Research, 15(1), 2869–2909. [Google Scholar]

[R18] Laurent B, and Massart P. (2000), “Adaptive estimation of a quadratic functional by model selection,” The Annals of Statistics, 28(5), 1302–1338. [Google Scholar]

[R19] LeCam L. (1973), “Convergence of estimates under dimensionality restrictions,” The Annals of Statistics, 1(1), 38–53. [Google Scholar]

[R20] Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, Mowry BJ, Thapar A, Goddard ME, and Witte JS (2013), “Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs,” Nature Genetics, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Lee SH, Yang J, Goddard ME, Visscher PM, and Wray NR (2012), “Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood,” Bioinformatics, 28(19), 2540–2542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Lee S, and van der Wer J. (2016), “MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information,” Bioinformatics, 32(9), 1420–2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Maier R, Moser G, Chen G-B, Ripke S, Coryell W, Potash JB, Scheftner WA, Shi J, Weissman MM, Hultman CM et al. (2015), “Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder,” The American Journal of Human Genetics, 96(2), 283–294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Manolio T. (2010), “Genomewide association studies and assessment of the risk of disease,” New England Journal of Medicine, 363, 166–176. [DOI] [PubMed] [Google Scholar]

[R25] McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, and Hirschhorn JN (2008), “Genome-wide association studies for complex traits: consensus, uncertainty and challenges,” Nature Reviews Genetics, 9(5), 356–369. [DOI] [PubMed] [Google Scholar]

[R26] Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P, Ruderfer DM, McQuillin A, Morris DW et al. (2009), “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, 460(7256), 748–752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Ren Z, Sun T, Zhang C-H, and Zhou HH (2015), “Asymptotic normality and optimalities in estimation of large Gaussian graphical models,” The Annals of Statistics, 43(3), 991–1026. [Google Scholar]

[R28] Sun T, and Zhang C-H (2012), “Scaled sparse linear regression,” Biometrika, 101(2), 269–284. [Google Scholar]

[R29] Tibshirani R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [Google Scholar]

[R30] van de Geer S, Bühlmann P, Ritov Y, and Dezeure R. (2014), “On asymptotically optimal confidence regions and tests for high-dimensional models,” The Annals of Statistics, 42(3), 1166–1202. [Google Scholar]

[R31] Vershynin R. (2012), “Introduction to the non-asymptotic analysis of random matrices,” in Compressed Sensing: Theory and Applications, eds. Eldar Y, and Kutyniok G. Cambridge University Press, pp. 210–268. [Google Scholar]

[R32] Verzelen N, and Gassiat E. (2016), “Adaptive estimation of High-Dimensional Signal-to-Noise Ratios,” arXiv preprint arXiv:1602.08006,.

[R33] Wray NR, Goddard ME, and Visscher PM (2007), “Prediction of individual genetic risk to disease from genome-wide association studies,” Genome Research, 17(10), 1520–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Wu T, Chen Y, Hastie T, Sobel E, and Lange K. (2009), “Genome-wide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Yang L, Neale BM, Liu L, Lee SH, Wray NR, Ji N, Li H, Qian Q, Wang D, Li J. et al. (2013), “Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: Genome-wide association study of both common and rare variants,” American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 162(5), 419–430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Ye F, and Zhang C-H (2010), “Rate Minimaxity of the Lasso and Dantzig Selector for the `_q Loss in `_r Balls,” The Journal of Machine Learning Research, 11, 3519–3540. [Google Scholar]

[R37] Yu B. (1997), “Assouad, Fano, and Le Cam,” in Festschrift for Lucien Le Cam Springer, pp. 423–435. [Google Scholar]

[R38] Zhang C-H, and Zhang SS (2014), “Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. [Google Scholar]

[R39] Zhernakova A, van Diemen C, and Wijmenga C. (2009), “Detecting shared pathogenesis from the shared genetics of immune-related diseases,” Nature Reviews Genetics, 10, 43–45. [DOI] [PubMed] [Google Scholar]

PERMALINK

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Zijian Guo

Wanjie Wang

T Tony Cai

Hongzhe Li

Roles

Abstract

1. Introduction

1.1. Motivation and Background

1.2. Definition and Problem Formulation

1.3. Methods and Main Results

1.4. Notation and Definitions

1.5. Organization of the Paper

2. Estimation Methods

2.1. Estimation of I(β,γ)

Remark 1.

Table 1:

Remark 2.

2.2. Estimation of Q(β) and Q(γ)

Remark 3.

2.3. Estimation of R(β,γ)

3. Theoretical Analysis

3.1. Upper Bound Analysis

Theorem 1.

Theorem 2.

3.2. Minimax Lower Bounds

Theorem 3.

Remark 4.

3.3. Optimality of FDEs

Corollary 1.

4. Simulation Evaluations and Comparisons

Experiment 1.

Table 2:

Experiment 2.

Table 3:

5. Genetic Relatedness Yeast Colony Growths of Based on Genome Wide Association Data

Table 4:

Figure 1:

Figure 2:

6. Discussion

7. Proofs

7.1. Proof of Theorem 1

Proof of (21)

Lemma 1.

Lemma 2.

Proof of (22) and (23)

Lemma 3.

7.2. Proof of (29) and (30) in Theorem 3

Lemma 4.

Proof of (48)

Lemma 5.

Proof of (49)

Lemma 6.

Proof of (47)

Lemma 7.

Supplementary Material

Acknowledgement

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.1. Estimation of $I (β, γ)$

2.2. Estimation of $Q (β)$ and $Q (γ)$

2.3. Estimation of $R (β, γ)$