Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 1.
Published in final edited form as: J Am Stat Assoc. 2018 Nov 19;114(525):358–369. doi: 10.1080/01621459.2017.1407774

Optimal Estimation of Genetic Relatedness in High-dimensional Linear Models

Zijian Guo 1, Wanjie Wang 2, T Tony Cai 3, Hongzhe Li 4
PMCID: PMC10907007  NIHMSID: NIHMS1501133  PMID: 38434789

Abstract

Estimating the genetic relatedness between two traits based on the genome-wide association data is an important problem in genetics research. In the framework of high-dimensional linear models, we introduce two measures of genetic relatedness and develop optimal estimators for them. One is genetic covariance, which is defined to be the inner product of the two regression vectors, and another is genetic correlation, which is a normalized inner product by their lengths. We propose functional de-biased estimators (FDEs), which consist of an initial estimation step with the plug-in scaled Lasso estimator, and a further bias correction step. We also develop estimators of the quadratic functionals of the regression vectors, which can be used to estimate the heritability of each trait. The estimators are shown to be minimax rate-optimal and can be efficiently implemented. Simulation results show that FDEs provide better estimates of the genetic relatedness than simple plug-in estimates. FDE is also applied to an analysis of a yeast segregant data set with multiple traits to estimate the genetic relatedness among these traits.

Keywords: Genetic correlations, genome-wide association studies, inner product, quadratic functional, minimax rate of convergence

1. Introduction

1.1. Motivation and Background

Genome-wide association studies (GWAS) have led to identification of thousands of genetic variants or single nucleotide polymorphisms (SNPs) that are associated with various complex phenotypes (Manolio, 2010). Results from these GWAS have shown that many complex phenotypes share common genetic variants, including various autoimmune diseases (Zhernakova et al., 2009) and psychiatric disorders (Lee et al., 2013). These empirical evidence of shared genetic etiology for various phenotypes provides important insights of common pathophysiologies for related disorders that can be explored for drug repositioning and for studying disease etiology. Such knowledge of genetic sharing can potentially be explored to increase the accuracy of genetic risk prediction (Maier et al., 2015; Wray et al., 2007; Purcell et al., 2009). The concept of genetic relatedness or genetic correlations has been proposed to describe the shared genetic associations within pairs of quantitative traits based on GWAS data. This is in contrast to the traditional approaches of estimating co-heritability based on twin or family studies, where measurements of both traits are required on the same set of individuals. Due to the availability of GWAS data sets of many important traits, there has been significant recent interest in methods for quantifying and estimating the genetic relatedness between two traits based on large scale genetic association data.

Several measures of genetic relatedness have been proposed using GWAS data. Lee et al. (2012) and Yang et al. (2013) extended the mixed-effect model framework to estimate genetic covariance and genetic correlation between two traits. In their models, each individual’s trait value is associated with a random genetic effect, which is correlated across individuals by virtue of sharing some of the genetic variants affecting the traits, and an environmental random effect. Co-heritability is then defined as the square-root of the ratio of the covariance of the genetic random effects to the product of the total variances. The mixed-effect model approach requires knowledge of the identity of the causal variants, and hence the covariance matrix. This is however not available. Lee et al. (2012) and Yang et al. (2013) approximated the genetic relationship between every pair of individuals across the set of causal variants by the genetic relationship across the set of all genotyped variants. However, the very large number of variants used for estimating the genetic correlations, most of them likely not causative, might mask out the correlations on the set of causal variants, leading to inaccurate and suboptimal estimation of heritability (Golan and Rosset, 2011). Bulik-Sullivan et al. (2015) studied the genetic relatedness based on another random effects model for the two traits and developed a cross-trait linkage disequilibrium (LD) score regression to estimate the genetic covariance and genetic correlation. This approach shares similarity with the mixed-effect model approach of Yang et al. (2013) but has the advantages of only using the GWAS summary statistics. Lee and van der Wer (2016) developed an algorithm for multivariate linear mixed model analysis and demonstrated its use in estimating co-heritability.

To alleviate the difficulty of estimating the covariance matrix in the commonly used mixed effect model framework of estimating the heritability or co-heritability, we take a regression approach with fixed genetic effects in high-dimensional settings. High-dimensional linear regression provides a natural framework for GWAS in order to identify the trait-associated genetic variants, and its advantages over the simple univariate analysis have been demonstrated (Wu et al., 2009). The study of heritability in high-dimensional regression analysis has been studied in Bonnet et al. (2015); Verzelen and Gassiat (2016); Janson et al. (2016). However, high-dimensional regression analysis has not been explored to study the genetic relatedness between two traits based on genetic association data. The goal of this paper is to define two quantities that can be used to measure the genetic relatedness between a pair of traits based on GWAS data in the framework of high-dimensional linear models. Our definitions of the genetic relatedness reflect covariance or correlation of the trait-associated genetic variants. This is different from the mixed-effects model-based approaches where the genetic relatedness is defined through the variance/covariance matrix of the individual-specific random effects and the data from all the genetic variants are used to approximate the true covariance matrix.

1.2. Definition and Problem Formulation

A pair of trait values (y, w) are modeled as a linear combination of p genetic variants and an error term that includes environmental and unmeasured genetic effects,

yn1×1=Xn1×pβp×1+ϵn1×1andwn2×1=Zn2×pγp×1+δn2×1, (1)

where the rows Xi· are i.i.d. p-dimensional Sub-gaussian random vectors with covariance matrix Σ, the rows Zi· are i.i.d. p-dimensional Sub-gaussian random vectors with covariance matrix Γ and the error (ϵ,δ) follows the multivariate normal distribution with mean zero and covariance

(σ12In1×n10n1×n20n2×n1σ22In2×n2)

and is assumed to be independent of X and Z.

In the study of genetic relatedness, the pair of traits y and w are assumed to have mean zero, and the jth column of X, X·j, and the jth column of Z, Z·j, are the numerically coded genetic markers at the jth genetic variant and are assumed to have mean zero and variance 1. Under this model, if the columns of X and Z are independent, for the i-th observation,

Var(yi)=jβj2+σ12=β22+σ12,andVar(wi)=jγj2+σ22=γ22+σ22,

therefore β22/(β22+σ12) and γ22/(γ22+σ22) can then be interpreted as the narrow sense heritability (Bulik-Sullivan et al., 2015).

Based on this model, one measure of genetic relatedness is the inner product of the regression coefficients

I(β,γ)=β,γ (2)

which measures the shared genetic effects between these two traits. Bulik-Sullivan et al. (2015) defined this quantity as the genetic covariance due to the p genetic variants. Alternatively, a normalized inner product called genetic correlation, that is the ratio

R(β,γ)=β,γβ2γ21(β2γ2>0), (3)

can also be used. In the case where one of β2 and γ2 is vanishing, the ratio is defined as zero, which indicates no correlation between two traits when one of the regression vector is zero. With this normalization, R(β,γ) is always between −1 and 1 and can be used to compare the genetic relatedness among multiple pairs. Note that to exhibit genetic correlation, the directions of effect must also be consistently aligned.

Although Bulik-Sullivan et al. (2015) defined (2) and (3) as genetic covariance and genetic correlation, they treated β and γ as random vectors with a particular covariance form and then proposed to apply LD regression to estimate the expectation of β,γ. The focus of this paper is to develop estimators for I(β,γ) and R(β,γ) based on two GWAS data with genotype data measured on the same set of genetic markers, denoted by (yi,Xi,i=1,,n1) and (wi,Zi,i=1,,n2).

1.3. Methods and Main Results

A naive estimator is to estimate β and γ first and then plug-in the estimators of β and γ into the expressions (2) and (3). For the problem of interest, usually there are more genetic markers than the sample size, that is pmax{n1,n2}. However, for any trait, one expect that only a few of these markers have nonzero effects. One can apply any high-dimensional sparse regression methods such as Lasso (Tibshirani, 1996), scaled Lasso (Sun and Zhang, 2012) and marginal regression with screening (McCarthy et al., 2008; Fan et al., 2012) to estimate these sparse regression coefficients. The aforementioned plug-in estimators, however, have several drawbacks in estimating the genetic relatedness. The Lasso approach shrinks the estimation towards 0, in particular, some weak effects might be shrunken to 0, yet the accumulation of these weak effects may contribute significant to the trait variability. It is possible that some genetic variants may have strong effects on one trait and weak effects on the other trait. Due to shrinkage, the plug-in of Lasso type estimators fails to capture this part of contribution to genetic relatedness from such genetic variants. Marginal regression calculates the regression score between the trait and each single marker (i.e., y and Xj,1jp), and screen for the large scores. This approach also suffers from the existence of weak effects, as the marginal scores must be large enough to survive in the screening step.

We propose a two-step procedure to estimate the genetic relatedness measure I(β,γ) defined in (2), where step 1 is involved with estimating the inner product I(β,γ) by the plug-in scaled Lasso estimator, and step 2 is involved with correcting the plug-in scaled Lasso estimator. Similar two-step procedures are proposed to estimate the quadratic functionals β22 and γ22. To estimate the normalized inner product R(β,γ) defined in (3), we plug-in the estimators of the inner product and quadratic functionals into the definition (3). Due to the correction step, we name our estimators as Functional De-biased Estimators (FDEs).

FDEs are shown to achieve the minimax optimal convergence rates of estimating I(β,γ) and R(β,γ). The optimality of FDEs results from the unique way of balancing the bias and variance for estimating I(β,γ) and R(β,γ). To illustrate this, we focus on estimation of I(β,γ), take the plug-in estimator of the scaled Lasso estimators (Sun and Zhang, 2012) and the plug-in of the de-biased Lasso estimators (Javanmard and Montanari, 2014; van de Geer et al., 2014; Zhang and Zhang, 2014) as examples and compare them with FDE estimator of I(β,γ). Note that the scaled Lasso estimator achieves the optimal convergence rate of estimating the whole vector β and the de-biased estimator achieves the optimal convergence rate of estimating the single coordinate βi. However, simply plugging in the scaled Lasso estimators or the de-biased Lasso estimators does not lead to a good estimator of I(β,γ) since the plug-in estimator of scaled Lasso estimators suffers from a large bias and the plug-in estimator of de-biased Lasso estimators suffers from the inflation of variance.

In contrast, FDE estimator of I(β,γ) balances the bias and variance in the optimal way. Specifically, in the correction step of FDE estimator, the bias caused by plugging in the scaled Lasso estimator is corrected through adding the minimum amount of variance. As demonstrated in the simulation studies, FDE consistently outperforms the plug-in estimator of the scaled Lasso estimators and the plug-in estimator of the de-biased Lasso estimators. In addition, FDEs do not suffer from dependency among genetic markers. FDEs work for a broad class of dependency structure of genetic markers.

The theoretical analysis given in Section 3 establishes the optimal convergence rates of estimating I(β,γ), R(β,γ), β22 and γ22. To facilitate the discussion, we control the 2 norm of regression coefficients β and γ as cη0β2CM0 and cη0γ2CM0 where c, C are positive constant independent of n, p. Here, we present the most interesting regime where the signals are strong in the sense of η0Cklogp/n, where p is the dimension, n is the sample size, k is the maximum sparsity of β and γ and C is a positive constant independent of k, n, p. We have shown that the optimal rate of convergence for estimating I(β,γ), β22 and γ22 is

M0(1n+klogpn)+klogpn.

The optimal rate depends not only on p, n and k, but also the upper bound for the signal strength M0. In addition, we have shown that the optimal convergence rate of estimating R(β,γ) is

1η0(1n+klogpn)+1η02klogpn.

In contrast to estimating I(β,γ), β22 and γ22, the optimal rate scales to the inverse of the lower bound for the signal strength, represented by 1/η0. The estimators I^(β,γ), Q^(β), Q^(γ) and R^(β,γ) proposed in Section 3 are shown to adaptively achieve the optimal rates for estimating I(β,γ), Q(β), Q(γ) and R(β,γ), respectively.

1.4. Notation and Definitions

Basic notation and definitions used in the rest of the paper are defined here. For a matrix XRn×p, Xi, Xj, and Xi,j denote respectively the i-th row, j-th column, and (i, j)-th entry of the matrix X, Xi,j denotes the i-th row of X excluding the j-th coordinate, and Xj denotes the sub-matrix of X excluding the j-th column. Let [p]={1,2,···,p}. For a subset J[p], XJ denotes the sub-matrix of X consisting of columns Xj with jJ and for a vector xRp, xJ is the sub-vector of x with indices in J and xJ is the sub-vector with indices in Jc. For a vector xRp, the q norm of x is defined as xq=(i=1q|xi|q)1q for q0 with x0 denoting the cardinality of non-zero elements of x and x=max1jp|xj|. For a matrix A and 1q, Aq=supxq=1Axq is the matrix q operator norm. In particular, A2 is the spectral norm. For a symmetric matrix A, λmin(A) and λmax(A) denote respectively the smallest and largest eigenvalue of A. For a set S, |S| denotes the cardinality of S. For aR, a+=max{a,0} and sign(a) is the sign of a, i.e., sign(a)=1 if a>0, sign(a)=1 if a<0 and sign(0)=0. Define the Sub-gaussian norm xψ2 of xRp as xψ2=supvSp1supq1(E|vx|q)1q/q where Sp1 is the unit sphere in Rp. The random vector xRp is defined to be Sub-gaussian if its corresponding Sub-gaussian norm is bounded; see Vershynin (2012) for more on Subgaussian random variables. For the design matriices XRn1×p and ZRn2×p, we define the corresponding sample covariance matrices as Σ^=XX/n1 and Γ^=ZZ/n2. Let zα/2 denote the upper α/2 quantile of the standard normal distribution. For two positive sequences an and bn, anbn means anCbn for all n, anbn if bnan and anbn if anbn and bnan. c and C are used to denote generic positive constants that may vary from place to place. For any two sequences of numbers an and bn, we will write bnan if lim supbn/an=0.

1.5. Organization of the Paper

The rest of the paper is organized as follows. Section 2 presents the procedures for estimating I(β,γ), Q(β), Q(γ), and R(β,γ) in details. In Section 3, minimax convergence rates for the estimation problems are established and the proposed estimators are shown to attain the optimal rates. In Section 4, simulation studies are conducted to evaluate the empirical performance of FDEs. A yeast cross data is used to illustrate the estimators in Section 5. Discussion is provided in Section 6. The proofs of main theorems are present in Section 7. The remaining proofs and the extended simulation studies are given in the supplementary materials.

2. Estimation Methods

2.1. Estimation of I(β,γ)

Since the inner product I(β,γ) is of significant interest in its own right, we first consider the estimation of I(β,γ)=β,γ. The scaled Lasso estimators for high-dimensional linear model (1) are defined through the following optimization algorithm (Sun and Zhang, 2012),

{β^,σ^1}=argminβRp,σ1R+yXβ222n1σ1+σ12+λ0n1j=1pXj2n1|βj|, (4)

and

{γ^,σ^2}=argminγRp,σ2R+wZγ222n2σ2+σ22+λ0n2j=1pZj2n2|γj|, (5)

where λ0=2.01logp. To construct an optimal estimator of I(β,γ), it is helpful to analyze the error of the plugin estimator β^,γ^,

β^,γ^β,γ=γ^,β^β+β^,γ^γβ^β,γ^γ. (6)

The last term on the right hand side, β^β,γ^γ is “small”, but the first two terms γ^,β^β and β^,γ^γ can be large. This provides the motivation for the proposed estimator, where we first estimate these two terms and then subtract them from β^,γ^ to obtain the final estimator of I(β,γ).

The intuition for estimating γ^,ββ^ is given first. Since

1n1X(yXβ^)=Σ^(ββ^)+1n1Xϵ, (7)

multiplying both sides of (7) by a vector uRp yields

1n1uX(yXβ^)=uΣ^(ββ^)+1n1uXϵ, (8)

which can be written as

1n1uX(yXβ^)γ^,ββ^=(Σ^uγ^)(ββ^)+1n1uXϵ. (9)

If the vector u can be chosen such that the right hand side of (9) is “small”, then uX(yXβ^)/n1 is a good estimator of γ^,ββ^. Since the first term on the right hand side of (9) is upper bounded as |(Σ^uγ^)(ββ^)|Σ^uγ^β^β1, we control the right hand side of (9) through constructing a projection vector u such that Σ^uγ^ is constrained and the second term of (9) uXϵ/n1 is controlled through minimizing its variance σ12uΣ^u/n1. This leads to the following convex optimization algorithm for identifying the projection vector u for estimating γ^,ββ^,

u^1=argminuRp{uΣ^u:Σ^uγ^γ^2λ1n1}, (10)

where λ1=12λmax2(Σ)logp.

Remark 1.

The solution of the above optimization problem might not be unique and u^1 is defined as any minimizer of the optimization problem. The theory established in Section 3 still holds for any minimizer of (10). The optimization problem (10) is solved through its equivalent Lagrange dual problem, which is computationally efficient and scales well to the high-dimensional problem. See Step 2 in Table 1 for more details.

Table 1:

FDE algorithm without sample splitting for estimating the inner product, quadratic functionals and the normalized inner product.

Input: design matrices: X, Z; response vectors: y, w; tuning parameters λ0, λ.
Output: I^(β,γ), Q^(β), Q^(γ) and R^(β,γ).

Initial Lasso estimators:
1. Scaled Lasso: Calculate β^ and γ^ from (4) and (5) with the tuning parameter λ0.
Inner product calculation:
2. Projection vector u^1: Calculate u^1=argminuuXXu/4n1+uγ^+λtu1, where λt=λt1/1.5, and λ0=λ/n1. Repeat until u^1 can not be solved with λt replaced by λt+1, or t10.
3. Projection vector u^2: Calculate u^2=argminuuZZu/4n2+uβ^+λtu1, where λt=λt1/1.5, and λ0=λ/n1. Repeat until u^2 can not be solved with λt replaced by λt+1, or t10.
4. Correction: I^(β,γ)=β^,γ^+u^1X(yXβ^)/n1+u^2Z(wZγ^)/n2.
Quadratic functional calculation:
5. Projection vector u^3: Calculate u^3=argminuuXXu/4n1+uβ^+λtu1, where λt=λt1/1.5, and λ0=λ/n1. Repeat until u^3 can not be solved with λt replaced by λt+1, or t10.
6. Projection vector u^4: Calculate u^4=argminuuZZu/4n2+uγ^+λtu1, where λt=λt1/1.5, and λ0=λ/n2. Repeat until u^1 can not be solved with λt replaced by λt+1, or t10.
7. Correction: Q^(β)=(β^2+2u^3X(yXβ^)/n1)+,
      Q^(γ)=(γ^2+2u^4Z(wZγ^)/n2)+.
Ratio calculation:
8. R^(β,γ)=sign(I^(β,γ))min{(|I^(β,γ)/Q^(β)Q^(γ))1{Q^(β)Q^(γ)>0},1}.

Once the projection vector u^1 is obtained, γ^,ββ^ is then estimated by u^1X(yXβ^)/n1. Similarly, the projection vector for estimating β^,γγ^ can be obtained via the convex algorithm

u^2=argminuRp{uΓ^u:Γ^uβ^β^2λ2n2}, (11)

where λ2=12λmax2(Γ)logp. Then β^,γγ^ is estimated by u^2Z(wZγ^)/n2.

The final estimator I^(β,γ) of I(β,γ) is given by

I^(β,γ)=β^,γ^+u^11n1X(yXβ^)+u^21n2Z(wZγ^). (12)

It is clear from the above discussion that the key idea for the construction of the final estimator I^(β,γ) is to identify the projection vectors u^1 and u^2 such that γ^,ββ^ and β^,γγ^ are well approximated. It will be shown in Section 3 that the estimator I^(β,γ) is adaptively minimax rate-optimal.

Remark 2.

As mentioned, simply plugging in the Lasso, scaled Lasso, or de-biased estimator does not lead to a good estimator of I(β,γ). Another natural approach is to first threshold the de-biased estimator to obtain a sparse estimator of the coefficient vectors (see details in Zhang and Zhang (2014, Section 3.3), Guo et al. (2016, equation (10))) and then plugin this thresholded estimator. This estimator is referred to as the thresholded estimator. Simulations in Section 4 demonstrate that the proposed estimator defined in (12) outperforms the three plug-in estimators using the scaled Lasso, de-biased, and thresholded estimators.

2.2. Estimation of Q(β) and Q(γ)

In order to estimate the normalized inner product R(β,γ), it is necessary to estimate the quadratic functionals Q(β)=β22 and Q(γ)=γ22. To this end, we randomly split the data (y,X) into two subsamples (y(1),X(1)) with sample size n1/2 and (y(2),X(2)) with sample size n1/2 and the data (w,Z) into two subsamples (w(1),Z(1)) with sample size n2/2 and (w(2),Z(2)) with sample size n2/2.

With a slight abuse of notation, let β^ and γ^ denote the optimizers of the scaled Lasso algorithm (4) applied to (y(1),X(1)) and (5) applied to (w(1),Z(1)), respectively. For the scaled Lasso algorithms, the sample sizes n1 and n2 are replaced by n1/2 and n2/2, respectively. Again, the simple plug-in estimator β^22 of Q(β) is not a good estimator of β22 because of the following error decomposition,

β^22β22=2β^,β^ββ^β22, (13)

where the second term on the right hand side of (13) is “small”, but the first can be large. Specifically, the term 2β^,ββ^ is estimated first and then is added to β^22 to obtain the final estimator of β22. To estimate β^,ββ^, a projection vector u is identified such that the following difference is controlled,

1n1/2u(X(2))(y(2)X(2)β^)β^,ββ^=(uΣ^(2)β^)(ββ^)+1n1/2u(X(2))ϵ, (14)

with Σ^(2)=(X(2))X(2)/(n1/2). Define the projection vector u^3 as the solution to the following optimization algorithm

u^3=argminuRp{uΣ^(2)u:Σ^(2)uβ^β^2λ1n1/2}, (15)

where λ1=12λmax2(Σ)logp. We then estimate β^,ββ^ by u^3(X(2))(y(2)X(2)β^)/(n1/2) and propose the final estimator of β22 as

Q^(β)=(β^22+2u^31n1/2(X(2))(y(2)X(2)β^))+. (16)

Similarly, the estimator of γ22 is given by

Q^(γ)=(γ^22+2u^41n2/2(Z(2))(w(2)Z(2)γ^))+, (17)

where

u^4=argminu{uΓ^(2)u:Γ^(2)uγ^γ^2λ2n2/2}, (18)

with Γ^(2)=(Z(2))(Z(2))/(n2/2) and λ2=12λmax2(Γ)logp.

Remark 3.

Sample splitting is used here for the purpose of the theoretical analysis. In the simulation study (Section 4), the performance of the proposed estimator without sample splitting is investigated; see Steps 5 – 7 in Table 1. The proposed estimator without sample splitting performs even better numerically than with sample splitting since more observations are used in constructing the initial estimators β^22 and γ^22 and the projection vectors u^3 and u^4.

2.3. Estimation of R(β,γ)

Given the estimators I^(β,γ), Q^(β) and Q^(γ) constructed in Sections 2.1 and 2.2, a natural estimator for the normalized inner product R(β,γ) is given by

R^(β,γ)=sign(I^(β,γ))min{I^(β,γ)Q^(β)Q^(γ)1{Q^(β)Q^(γ)>0},1}, (19)

where I^(β,γ), Q^(β) and Q^(γ) are estimators of β,γ, β22 and γ22 defined in (12), (16) and (17), respectively. It is possible that one of Q^(β) and Q^(γ) is 0 if β22 and γ22 are close to zero. In this case, the normalized inner product R(β,γ) is estimated as 0. Since R(β,γ) is always between −1 and 1, the estimator R^(β,γ) is truncated to ensure that it is within the range. The FDE algorithm without sample splitting for calculating the estimators I^(β,γ), Q^(β), Q^(γ), and R^(β,γ) is detailed in Table 1.

3. Theoretical Analysis

3.1. Upper Bound Analysis

The samples sizes n1 and n2 are assumed to be of the same order, that is n1n2. Let n=min{n1,n2} be the smallest of two sample sizes. The following assumptions are introduced to facilitate the theoretical analysis.

(A1) The population covariance matrices Σ and Γ satisfy 1/M1λmin(Σ)λmax(Σ)M1, and 1/M1λmin(Γ)λmax(Γ)M1, where M11 is a positive constant. The random design matrix X is assumed to be independent of the other random design matrix Z. The noise levels σ1 and σ2 satisfy max{σ1,σ2}M2, where M2>0 is a positive constant.

(A2) The 2 norms of the coefficient vectors β and γ are bounded away from zero in the sense that

min{β2,γ2}η0Cklogp/n,wherek=max{β0,γ0}. (20)

Assumption (A1) places a condition on the spectrum of the covariance matrices Σ and Γ and an upper bound on the noise levels σ1 and σ2. Assumption (A2) requires that the total strengths of the signals have to be bounded away from zero by η0, which is only used in the upper bound analysis of the normalized inner product R(β,γ).

The following theorem establishes the convergence rates of the estimators I^(β,γ), Q^(β), and Q^(γ), proposed in (12), (16) and (17), respectively.

Theorem 1.

Suppose the assumption (A1) holds and k=max{β0,γ0}cn/logp for some c>0. Then for any fixed constant 0<α<1/4, with probability at least 14αpc0, we have

|I^(β,γ)I(β,γ)|(β2+γ2)(zα/2n+klogpn)+klogpn, (21)
|Q^(β)Q(β)|β2(zα/2n+klogpn)+klogpn, (22)
|Q^(γ)Q(γ)|γ2(zα/2n+klogpn)+klogpn, (23)

where c0 is a positive constant.

The upper bound of estimating β,γ not only depends on k, n and p, but also scales to the signal strengths β2 and γ2. For the estimation of the quadratic functional Q(β) (or Q(γ)), the convergence rate depends on β2 (or γ2). The following theorem establishes the convergence rate of the estimator R^(β,γ) proposed in (19).

Theorem 2.

Suppose the assumptions (A1) and (A2) hold and k=max{β0,γ0}cn/logp for some c>0. Then for any fixed constant 0<α<1/4, with probability at least 14αpc0, we have

|R^(β,γ)R(β,γ)|1η0(zα/2n+klogpn)+1η02klogpn, (24)

where c0 is a positive constant.

In contrast to Theorem 1, Theorem 2 requires the extra assumption (A2) on the signal strengths β2 and γ2. The convergence rate of estimating R(β,γ) is scaled to the inverse of the signal strength, 1/η0. This is different from the error bound in Theorem 1, where the estimation accuracy is scaled to the signal strength. The lower bound results established in Theorem 3 will demonstrate the necessity of Assumption (A2) for estimation of R(β,γ).

3.2. Minimax Lower Bounds

This section establishes the minimax lower bounds of estimating I(β,γ), Q(β), Q(γ) and R(β,γ). We first introduce parameter spaces for θ=(β,Σ,σ1,γ,Γ,σ2), which is defined as the product of parameter spaces for θβ=(β,Σ,σ1) and θγ=(γ,Γ,σ2). We define the following parameter space for both θβ=(β,Σ,σ1) and θγ=(γ,Γ,σ2),

𝓖(k,M0)={(β,Σ,σ1):β0k,β2M0,1M1λmin(Σ)λmax(Σ)M1,σ1M2}, (25)

where M11 and M2>0 are positive constants. The parameter space defined in (25) requires that the signal β contains less than k non-zero coefficients and the 2 norm β2 is upper bounded by M0, where M0 is allowed to grow with n and p. The lower bound results in Theorem 3 show that the estimation difficulties of I(β,γ), Q(β) and Q(γ) depend on M0. The other conditions 1/M1λmin(Σ)λmax(Σ)M1 and σ1M2 are regularity conditions. Based on the definition (25), the parameter space for (β,Σ,σ1,γ,Γ,σ2) is defined as a product of two parameter spaces,

Θ(k,M0)={θ=(β,Σ,σ1,γ,Γ,σ2):(β,Σ,σ1)𝓖(k,M0),(γ,Γ,σ2)𝓖(k,M0)}. (26)

For establishing optimal bounds ofR(β,γ), we define the following parameter space

Θ(k,η0)={θ=(β,Σ,σ1,γ,Γ,σ2):(β,Σ,σ1)𝓖(k,η0),(γ,Γ,σ2)𝓖(k,η0)}, (27)

where

𝓖(k,η0)={(β,Σ,σ1):β0k,β2η0,1M1λmin(Σ)λmax(Σ)M1,σ1M2},

with η00. In contrast to the parameter space 𝓖(k,M0) where β2 is upper bounded by M0, the parameter space 𝓖(k,η0) requires the signal strength β2 to be lower bounded by η0, where η0 is allowed to grow with n and p. The lower bound in Theorem 3 shows that the estimation difficulty of R(β,γ) depends on η0.

The following theorem establishes the minimax lower bounds for the convergence rates of estimating the inner product I(β,γ), the quadratic functionals Q(β) and Q(γ) and the normalized inner product R(β,γ).

Theorem 3.

Suppose kcmin{n/logp,pν} for some constants c>0 and 0ν<12. Then

infI˜supθΘ(k,M0)θ(|I˜I(β,γ)|min{M0(1n+klogpn)+klogpn,M02})14, (28)
infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|min{M0(1n+klogpn)+klogpn,M02})14, (29)
infQ˜supθγ𝓖(k,M0)θγ(|Q˜Q(γ)|min{M0(1n+klogpn)+klogpn,M02})14, (30)
infR˜supθΘ(k,η0)θ(|R˜R(β,γ)|min{1η0(1n+klogpn)+1η02klogpn,1})14. (31)

Remark 4.

Estimation of quadratic functionals has been extensively studied in the classical Gaussian sequence model. See, for example, Donoho and Nussbaum (1990); Efromovich and Low (1996); Laurent and Massart (2000); Cai and Low (2005, 2006); Collier et al. (2015) for details. In the regime kcmin{n/logp,pν} for some constants c>0 and 0<v<12, Theorem 2 in (Collier et al., 2015) gives a lower bound, min{M0/n+klogp/n,M02}, for estimating β22 in the sequence model. In contrast, an extra term M0klogp/n appears in the lower bound given in (29) for estimating β22 in high-dimensional linear regression. One intuitive reason for this extra term is that high-dimensional linear regression is involved with an extra inverse process than Gaussian sequence model. Estimation of the quadratic functional β22 in high-dimensional linear regression is fundamentally harder than that in the Gaussian sequence model. For the high-dimensional linear regression, the estimation lower bound klogp/n in (29) can also be established by the general lower bounds developed in Cai and Guo (2017a). See Section 8 in Cai and Guo (2017a) for details.

3.3. Optimality of FDEs

In this section, we establish the optimality of FDEs by combining Theorems 1 and 2 over the parameter spaces Θ(k,M0) and Θ(k,η0) defined in (26) and (27), respectively.

Corollary 1.

Suppose kcmin{n/logp,pν} and M0η0Cklogp/n for some constants C, c>0 and 0ν<12. Then

supθΘ(k,M0)θ(|I^I(β,γ)|M0(1n+klogpn)+klogpn)14αpc0, (32)
supθβ𝓖(k,M0)θβ(|Q^Q(β)|M0(1n+klogpn)+klogpn)14αpc0, (33)
supθγ𝓖(k,M0)θγ(|Q^Q(γ)|M0(1n+klogpn)+klogpn)14αpc0, (34)
supθΘ(k,η0)θ(|R^R(β,γ)|1η0(1n+klogpn)+1η02klogpn)14αpc0. (35)

where 0<α<1/4 and c0 is a positive constant.

Combined with Theorem 3, Corollary 1 implies that, for M0Cklogp/n, the estimators I^(β,γ), Q^(β), Q^(γ) proposed in (12),(16) and (17) achieve the minimax lower bounds (28), (29) and (30) within a constant factor, that is, FDEs are minimax rate-optimal. On the other hand, if M0klogp/n, estimation of I(β,γ), Q(β) and Q(γ) is uninteresting as the trivial estimator 0 achieves the minimax lower bound in this case. For estimation of R(β,γ), under the assumption η0Cklogp/n, Corollary 1 shows that the estimator R^(β,γ) given in (19) achieves the minimax lower bound 1/η0×(1/n+klogp/n)+1/η02×klogp/n in (31). Hence R^(β,γ) is the rate-optimal estimator of R(β,γ) under the assumption (A2). When η0klogp/n, estimation of R(β,γ) becomes trivial as the simple estimator 0 attains the minimax lower bound. This demonstrates the necessity of the assumption (A2) in Theorem 2.

4. Simulation Evaluations and Comparisons

We compare the finite-sample performance of several estimators of I(β,γ) and R(β,γ) using simulations. These estimators included plug-in scaled Lasso estimator (Sun and Zhang, 2012), plug-in de-biased estimator (Javanmard and Montanari, 2014; van de Geer et al., 2014; Zhang and Zhang, 2014), plug-in thresholded estimator (Zhang and Zhang, 2014, Section 3.3) and the proposed estimator FDE. Specifically, they are defined as

  • FDE: The inner product I(β,γ) is estimated by I^(β,γ) in (12) and the ratio R(β,γ) is estimated by R^(β,γ) in (19). We consider FDE with sample splitting (FDE-S) and without sample splitting (FDE-NS) for R^(β,γ)

  • Plug-in scaled Lasso estimator (Lasso): The inner product I(β,γ) is estimated by β^,γ^ and the normalized inner product R(β,γ) is estimated by
    [β^,γ^/(β^2γ^2)]1{β^2γ^2>0}.
  • Plug-in de-biased estimator (De-biased): Denote the de-biased Lasso estimators as β˜ and γ˜. The inner product I(β,γ) is estimated by β˜,γ˜ and the normalized inner product R(β,γ) is estimated by [β˜,γ˜/(β˜2γ˜2)]1{β˜2γ˜2>0}.

  • Plug-in thresholded estimator (Thresholded): Denote the thresholded estimators as β¯ and γ¯. The inner product I(β,γ) is estimated by β˜,γ˜ and the normalized inner product R(β,γ) is estimated by [β¯,γ¯/(β¯2γ¯2)]1{β¯2γ¯2>0}.

Implementation of the de-biased, thresholded and FDE estimators requires the scaled Lasso estimators β^ and γ^ in the initial step. The scaled Lasso estimator is implemented by the equivalent square-root Lasso algorithm (Belloni et al., 2011). The theoretical tuning parameter is λ0=2.01logp/n, which may be conservative in the numerical studies. Instead, the tuning parameter is chosen as λ0=b2.01logp. However, the performances of all estimators are evaluated across a grid of tuning parameter values b{.25,.5,.75,1} (see Supplementary Material, Section A.1). The results showed that b = .5 was a good choice for all the estimators. Hence, λ0=.52.01logp/n was used for the numerical studies in this section and Section 5. To implement the FDE algorithm, the other tuning parameter λ is chosen as 2.01logp/n for the correction Steps 2,3,5 and 6 in Table 1.

Comparisons of estimates of I(β,γ) and R(β,γ) are presented below. Results on estimating the quadratic functionals are presented in the Supplementary Material, Section A.2. For each setting, with the parameters (p,n1,n2,s,s1,s2), Σ, Γ, Fβ, Fγ specified, we generate the data and compare different methods as follows,

  1. Generate sets S1[p] and S2[p], with |S1|=s1, |S2|=s2 and |S1S2|=s. For βRp and γRp, generate βjFβ and γlFγ, for jS1, lS2, and set βj=0 and γl=0, for jS1, lS2.

  2. Generate Xii.i.dN(0,Σ), 1in1, and Zii.i.dN(0,Γ), 1in2.

  3. Generate the noise ϵii.i.dN(0,1), 1in1, and δii.i.dN(0,1), 1in2. Generate the outcome as y=+ϵ and w=+δ.

  4. With X, y, Z, and w, estimate I(β,γ) and R(β,γ) through different estimators.

  5. Repeat 2–4 for L times.

We evaluate the performance of an estimator by the mean squared error (MSE), which is defined as

MSE(T^)=1Ll=1L(T^(X,y,Z,w;l)T)2, (36)

for a given quantity T and its estimate T^(X,y,Z,w;l) from l-th replication. We consider two different settings with two sets of parameters (p,n1,n2,s,s1,s2), Σ, Γ, Fβ and Fγ and the simulation for each setting is repeated L = 300 times.

Experiment 1.

The parameters are set as follows, (p,n1,n2)=(600,400,400), the sparsity parameters (s,s1,s2)=(15,30,25), and the covariance matrices Σ and Γ satisfy Σij=Γij=(0.8)|ij|. For given positive values τ1 and τ2, the signals of β satisfy that βji=(1+i/s1)τ1/2, for jiS1, i=1,2,···,s1, and the signals in γ satisfy that γj=τ2 for jS2. This simulation aims to investigate the case where the coefficients for one regression are much larger than the other by varying the signal strength parameters as (τ1,τ2){(3.0,.1),(2.6,.2),(2.2,.3),(1.8,.4),(.1,1.6),(.2,1.4),(.3,1.2),(.4,1.0)}.

The results are summarized in Table 2. For all combinations of the signal strength parameters, in terms of estimating the inner product I(β,γ), FDE consistently outperformed the plug-in estimates with Lasso and Thresholded Lasso. Moreover, with increasing difference between τ1 and τ2, the advantage of FDE over the plugin estimate using Lasso or thresholded Lasso became larger. The same results were observed for estimation of the normalized inner product R(β,γ), where FDE-NS had consistent better performance than other methods. Although De-biased performed well in terms of estimating I(β,γ), it performed much worse than FDE-NS for estimating R(β,γ).

Table 2:

Mean square errors (MSE) of the estimates of the inner product I(β,γ) and the normalized inner product R(β,γ) for various signal strength parameters. Lasso: plugin estimator with the scaled Lasso estimator; De-biased: plug-in estimator with the de-biased estimator; Thresholded: plug-in estimator with the thresholded estimator; FDE: the proposed estimator I^(β,γ); FDE-S: the proposed estimator R^(β,γ) with sample splitting; FDE-NS: the proposed estimator R^(β,γ) without sample splitting.

Strength parameters, (τ1,τ2)
(1.8, .4) (2.2, .3) (2.6, .2) (3, .1) (.1, 1.6) (.2, 1.4) (.3, 1.2) (.4, 1)
I(β,γ) Truth 8.088 7.414 5.841 3.370 1.797 3.145 4.044 4.493

MSE
Lasso 9.295 11.564 12.560 7.279 2.377 4.889 5.409 4.800
De-biased 1.733 2.191 2.324 1.386 .449 .838 .985 .886
Thresholded 2.029 3.377 6.463 5.789 1.877 3.024 2.432 1.546
FDE 1.847 2.471 2.662 2.118 .734 .995 1.028 .986

R(β,γ) Truth .5314 .5314 .5314 .5314 .5314 .5314 .5314 .5314

MSE
Lasso .0023 .0075 .0332 .1260 .1574 .0624 .0227 .0087
De-biased .0208 .0415 .0864 .1590 .1736 .1068 .0627 .0373
Thresholded .0045 .0139 .0585 .1753 .0964 .0981 .0389 .0153
FDE-S .0337 .0303 .0621 .0678 .2130 .1199 .0694 .0616
FDE-NS .0036 .0064 .0163 .0580 .0892 .0237 .0116 .0061

As discussed in Section 2, the sample splitting of estimating the normalized inner product is simply proposed to facilitate the theoretical analysis and might not be necessary for the algorithm. Our simulation results indicated that the proposed estimator without sample splitting (FDE-NS) performed quite well in all settings, even better than FDE-S, due to the fact that more samples were used for estimation and correction steps. Such observations led us to use the proposed estimator without sample splitting (FDE-NS) in the real data analysis in Section 5.

Experiment 2.

The parameters are set as follows, (p,n1,n2)=(800,400,400), signal strength parameters (τ1,τ2)=(.2,.1) and the covariance matrices Σ and Γ satisfy Σij=Γij=(0.8)|ij|. The signals of β follow that βji=(1+i/s1)τ1/2, for jiS1, i=1,2,···,s1, and the signals in γ satisfy that γj=τ2 for jS2. This simulation setting is set to investigate the relationship between the performance of estimators and the signal sparsity level and vary the sparsity levels of β and γ as (s2,s2){(40,40),(50,50),(60,60),(70,70),(80,80),(90,90),(100,100),(110,110)}, and fix the number of common signals at s=20. Since the number of the associated variants is very large for both coefficient vectors, large values of τ1 and τ2 would induce strong signals such that all the methods perform well. Instead, we consider a more challenging setting where the signal magnitude is small, that is τ1=.2 and τ2=.1.

The results are summarized in Table 3. Clearly, FDE outperformed the other methods. When the signals became denser, the improvement of FDE over other methods was more pronounced. For estimation of R(β,γ), the results showed that FDE-NS consistently outperformed other estimators. As the number of signals increased, the MSE corresponding to FDE-NS decreased quickly.

Table 3:

Mean square errors (MSE) of the estimates of the inner product I(β,γ) and the normalized inner product R(β,γ) for various sparsity parameters. Lasso: plug-in estimator with the scaled Lasso estimator; De-biased: plug-in estimator with the de-biased estimator; Thresholded: plug-in estimator with the thresholded estimator; FDE: the proposed estimator I^(β,γ); FDE-S: the proposed estimator R^(β,γ) with sample splitting; FDE-NS: the proposed estimator R^(β,γ) without sample splitting.

 Sparsity parameter, s1(s2=s1)
40 50 60 70 80 90 100 110
I(β,γ) Truth .190 .170 .219 .212 .179 .221 .183 .221

MSE
Lasso .032 .025 .039 .036 .024 .035 .023 .028
De-biased .015 .015 .018 .017 .024 .027 .040 .066
Thresholded .027 .021 .031 .029 .020 .025 .018 .018
FDE .020 .014 .021 .022 .011 .013 .008 .008

R(β,γ) Truth .4027 .2908 .3122 .2592 .1914 .2110 .1573 .1725

MSE
Lasso .1157 .0517 .0539 .0370 .0166 .0180 .0097 .0079
De-biased .1267 .0601 .0659 .0411 .0160 .0173 .0063 .0059
Thresholded .1392 .0687 .0732 .0504 .0262 .0277 .0155 .0142
FDE-S .1154 .1225 .0779 .0574 .0456 .0499 .0450 .0493
FDE-NS .0847 .0340 .0368 .0294 .0115 .0091 .0055 .0047

5. Genetic Relatedness Yeast Colony Growths of Based on Genome Wide Association Data

Bloom et al. (2013) reported a large scale genome-wide association study of 46 quantitative traits based on 1,008 Saccharomyces cerevisiae segregants crossbred from a laboratory strain and a wine strain. The data set included 11,623 unique genotype markers. Since many of these markers are highly correlated and differ only in a few samples, Bloom et al. (2013) further selected a set of 4,410 markers that are weakly dependent based on the linkage disequilibrium information. Specifically, these markers were selected by picking one marker closest to each centimorgan position on the genetic map. The maker genotypes are coded as 1 or −1, according to which strain it came from and satisfy the sub-Gaussian conditions. The traits of interest were the end-point colony size normalized by the control growth under 46 different growth media, including Hydrogen Peroxide, Diamide, Calcium, Yeast Nitrogen Base (YNB) and Yeast extract Peptone Dextrose (YPD), etc. Bloom et al. (2013) showed that the genetic variants are associated with many of such trait values. It is therefore important to genetic relatedness among these related traits.

To demonstrate the genetic relatedness among these traits, eight traits were considered, including the normalized colony sizes under Calcium Chloride (Calcium), Diamide, Hydrogen Peroxide (Hydrogen), Paraquat, Raffinose, 6 Azauracil (Azauracil), YNB, and YPD. Each trait was normalized to have variance 1, so the quadratic norm represents the total genetic effects for each trait and an estimate the heritability. FDE was applied to every pair of these 8 traits without sample splitting, for a total of 28 pairs. The results are summarized in Table 4, including estimates of the heritability, genetic covariance and and genetic correlation for each of the 28 pairs. The genetic heritability of these traits ranged from 0.22 for Raffinose to 0.67 for YPD. About two thirds of these pairs had an estimated genetic correlation smaller than 0.1, indicating relatively weak genetic correlations among these traits.

Table 4:

FDE estimation for the heritability (bold diagonals), genetic covariance(upper diagonals) and genetic correlation (lower diagonals) among for each pair of 8 colony growth traits of the yeast segregants.

Traits Calcium Diamide Hydrogen Paraquat Raffinose Azauracil YNB YPD
Calcium .3314 −.0189 −.1003 .0084 .0927 .0095 .0656 .−0134
Diamide −.0286 .4390 .0598 −.0039 .0500 .0446 −.0159 .0803
Hydrogen −.1579 .0942 .4033 .0576 −.1040 .0601 .0672 .0637
Paraquat .0117 −.0053 .0799 .5199 .0023 .0365 .1148 .1029
Raffinose .1972 .1065 −.2213 .0049 .2208 .0137 .0830 .0331
Azauracil .0172 .0809 .1089 .0661 .0248 .3045 −.0259 .0703
YNB .0968 −.0235 .0991 .1693 .1224 −.0383 .4594 .4246
YPD −.0164 .0983 .0779 .1259 .0405 .0860 .5195 .6680

To further demonstrate the genetic relatedness among these pairs, for each trait, a Z-score was calculated based on regressing the trait value y on genetic genetic marker X·j, for ijp. A larger absolute value of the Z-score statistic implies a stronger effect of the marker on the trait. For any pair of traits, the scatter plot of the Z-statistics provides a way of revealing the shared genetic relationship between them. The scatterplots of the Z-scores for all 28 pairs of traits are included in Section D of the Supplemetal Materials. Figure 1 (a) shows the plots of several pairs of the traits, including the pairs with a large positive I(β,γ), YPD v.s. YNB and Paraquat v.s. YNB, pairs with a large negative I(β,γ), Raffinose v.s. Hydrogen and Calcium v.s. Hydrogen, and pairs with I(β,γ) near 0, including Paraquat v.s. Diamide and Paraquat v.s. Raffinose. The plot clearly indicates a strong positive genetic covariance between YPD and YNB. The genetic covariance between Paraquat and YNB/YPD is smaller. Raffinose/Hydrogen and Calcium/Hydrogen pair clearly show negative genetic correlation. There are several genetic variants with very large effects on Hydrogen, but they are not associated with the other traits such as Raffinose and Calcium. The shared genetic variants are relatively weak, leading to smaller genetic covariances. The plots on the bottom show the pairs of traits with weak weak genetic covariances. These plots indicate that the proposed genetic correlation measures can indeed capture the genetic sharing among different related traits.

Figure 1:

Figure 1:

Scatter plots of marginal regression Z-score statistics for six pairs of traits ranked by the estimated genetic covariance (gcov) based on FDE (a) or LD regression (b), including the pairs with large positive genetic covariance (left panel), negative genetic covariance (middle panel), and small genetic covariance (right panel).

Figure 2 shows the six pairs of the phenotypes ranked by the estimated genetic correlations FDE, including two with the largest positive genetic correlations, two with the largest negative genetic correlations and two with the small genetic correlations. The pairs identified agree with the marginal Z-scores very well.

Figure 2:

Figure 2:

Scatter plots of marginal regression Z-score statistics for six pairs of traits ranked by the estimated genetic correlation (gcor) based on FDE, including the pairs with large positive genetic correlation (left panel), negative genetic correlation (middle panel), and small genetic correlation (right panel).

As a comparison, we also obtained the estimated genetic covariance for each pair of the traits using the LD regression methods proposed by Bulik-Sullivan et al. (2015). The pairs of traits with large positive, negative or weak estimated covariance are presented in Figure 1 (b). The pairs with the largest positive and negative estimated covariance are different from those two pairs identified by FDE. Comparison of the scatter plots of the Z-score statistics in Figure 1 indicates the pairs identified by FDE seem to agree with the marginal Z-statistics better.

6. Discussion

Motivated by the problems of estimating the genetic relatedness between two traits using the GWAS data, we have considered the problem of estimating the different functionals of the regression coefficients of two linear models, including the inner product β,γ, the quadratic functionals Q(β) and Q(γ) and the ratio R(β,γ). The proposed method is different from plugging in the de-biased estimators proposed in Javanmard and Montanari (2014); van de Geer et al. (2014); Zhang and Zhang (2014). The correction procedures are implemented on the inner product and quadratic functionals directly, which balance the bias and variance uniquely for these functionals and hence result in minimax rate optimal estimators. The proposed estimators were shown in simulations to result in smaller estimation errors than directly plugging in these de-biased estimators across different settings. Results from analysis of the yeast segregants data suggested that the yeast colony growth sizes were under similar genetic controls under certain growth medias such as YPD and YNB, but this was not true for all pairs of growth media considered.

The algorithm for obtaining the these estimates only involves applying the Lasso several times, which can be implemented efficiently using the coordinate descent algorithms. The Matlab codes to implement the proposed estimation methods are available at http://statgene.med.upenn.edu/software.html. An important future research is to quantify uncertainty of these proposed estimators and the upper bound analysis of (21)-(23) and (24) indicates the possibility of constructing confidence intervals, centering at the proposed estimators and of parametric length 1/n, under additional sparsity and other regularity conditions.

7. Proofs

In this Section, we prove Theorem 1 and (29) and (30) of Theorem 3. The proofs of Theorem 2, (28) and (31) of Theorem 3 and extra lemmas are presented in the supplementary materials.

7.1. Proof of Theorem 1

For simplicity of notation, we assume n1=n2 and use n=n1=n2 to represent the sample size throughout the proof. The proofs can be easily generalized to the case n1n2. Without loss of generality, we assume that the Sub-gaussian norm of random vectors Xi· and Zi· are also upper bounded by M1, that is, max{Xiψ22,Ziψ22}M1.

Proof of (21)

The upper bound is based on the following decomposition,

I^(β,γ)I(β,γ)=β^,γ^+u^11nX(yXβ^)+u^21nZ(wZγ^)β,γ=(u^11nX(yXβ^)γ^,ββ^)+(u^21nZ(wZγ^)β^,γγ^)β^β,γ^γ=u^11nXϵ+(u^1Σ^γ^)(ββ^)+u^21nZδ+(u^2Γ^β^)(γγ^)β^β,γ^γ. (37)

The following lemmas are introduced to control the terms in (37) and similar results were established in the analysis of Lasso, scaled Lasso and de-biased Lasso (Cai and Guo, 2017b; Ren et al., 2015; Sun and Zhang, 2012; Ye and Zhang, 2010). The proofs of the following lemmas can be found in the supplementary material, Section D.

Lemma 1.

Suppose the assumption (A1) holds and k=max{β0,γ0}cn/logp for some c>0. Then with probability at least 1pc0, we have

β^β1Cklogpn,γ^γ1Cklogpn, (38)
β^β2Cklogpn,γ^γ2Cklogpn, (39)

where c0 and C are positive constants.

Lemma 2.

Suppose the assumption (A1) holds and k=max{β0,γ0}cn/logp for some c>0. Then with probability at least 12αpc0, we have

|(u^1Σ^γ^)(ββ^)|Cγ^2β0logpnand|(u^2Γ^β^)(γγ^)|Cβ^2γ0logpn; (40)
|u^11nXϵ|Cγ^2zα/2n,and|u^21nZδ|Cβ^2zα/2n, (41)

where c0 and C are positive constants.

By the decomposition (37) and the inequalities (39), (40) and (41), we obtain that

|I^(β,γ)I(β,γ)|C(β^2+γ^2)(zα/2n+klogpn)+Cklogpn.

By (39), we establish (21).

Proof of (22) and (23)

The proof of (23) is similar to that of (22) and only the proof of (22) is present in the following. We introduce the estimator Q¯(β)=β^22+2u^3(X(2))(y(2)X(2)β^)/(n/2) and due to the fact that Q(β) is non-negative, we have |Q^(β)Q(β)||Q¯(β)Q(β)|. We decompose the difference between Q¯(β) and Q(β),

Q¯(β)Q(β)=β^22β22+2u^31n/2(X(2))(y(2)X(2)β^)=2(u^3Σ^(2)β^)(ββ^)+2u^31n/2(X(2))ϵ(2)β^β22.

Combined with the above argument, the upper bound (22) follows from (39) and the following lemma, whose proof can be found in the supplementary material Section D.

Lemma 3.

Suppose the assumption (A1) holds and k=max{β0,γ0}cnlogp for some c>0. Then with probability at least 1pc0α,

|u^31n/2(X(2))ϵ(2)|Cβ^2zα/2n, (42)
|(u^3Σ^(2)β^)(ββ^)|Cβ^2klogpn, (43)

where c0 and C are positive constants.

7.2. Proof of (29) and (30) in Theorem 3

We first introduce the notations used in the proof of lower bound results. Let π denote the prior distribution supported on the parameter space 𝓗. Let fπ(z) denote the density function of the marginal distribution of the random variable Z with the prior π on 𝓗. More specifically, fπ(z)=fθ(z)π(θ)dθ. We define the χ2 distance between two density functions f1 and f0 by

χ2(f1,f0)=(f1(z)f0(z))2f0(z)dz=f12(z)f0(z)dz1 (44)

and the L1 distance by L1(f1,f0)=|f1(z)f0(z)|dz. It is well known that

L1(f1,f0)χ2(f1,f0). (45)

The proof of the lower bound is based on the following version of Le Cam’s Lemma (LeCam (1973); Yu (1997); Ren et al. (2015)).

Lemma 4.

Let T(θ) denote a functional on θ. Suppose that 𝓗0={θ0}, 𝓗0, 𝓗1Θ and d=minθ𝓗1|T(θ)T(θ0)|. Let π denote a prior on the parameter space 𝓗1. Then we have

infT^supθ01θ(|T^T(θ)|d2)1L1(fπ,fθ0)2. (46)

The proofs of (29) and (30) are applications of Lemma 4. The key is to construct the parameter spaces 𝓗0={θ0}, 𝓗1 and the prior on 𝓗1 such that (i) 𝓗0, 𝓗1Θ, (ii) the L1 distance L1(fπ,fθ0) is controlled and (iii) the distance d=minθ𝓗1|T(θ)T(θ0)| is maximized. In the following, we provide the detailed proof of (29). The proof of (30) is similar to that of (29) and is omitted here. In the discussion of lower bound results, we will assume that the design Xi· and Zi· follow joint normal distribution with zero means. The lower bound (29) can be decomposed into the following three lower bounds,

infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|cmin{klogpn,M02})14. (47)

For M0Cklogp/n, then

infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|cM0klogpn)14, (48)
infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|cM01n)14, (49)

where c>0 is a positive constant. For M0Cklogp/n, combining (48), (49) and (47), we have

infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|max{M0(1n+klogpn),klogpn})14. (50)

For M0Cklogp/n, by (47), we have

infQ˜supθβ𝓖(k,M0)θβ(|Q˜Q(β)|cM02)14. (51)

We can establish (29) by combining (50) and (51). In the following, we will establish the lower bounds (48), (49) and (47) separately.

Proof of (48)

Under the Gaussian random design model, Vi=(yi,Xi)Rp+1 follows a joint Gaussian distribution with mean 0. Let Σv denote the covariance matrix of Vi. For the indices of Σv, we use 0 as the index of yi and {1,···,p} as the indices for (Xi1,,Xip)Rp. Decompose Σv into blocks (Σyyv(Σxyv)ΣxyvΣxxv), where Σyyv, Σxxv and Σxyv denote the variance of yi, the variance of Xi· and the covariance of yi and Xi·, respectively. Let Ω=Σ1 denote the precision matrix. There exists a bijective function h:Σv(β,Ω,σ1) and the inverse mapping h1:(β,Ω,σ1)Σv, where h1((β,Ω,σ1))=(βΩ1β+σ12βΩ1Ω1βΩ1) and

h(Σv)=((Σxxv)1Σxyv,(Σxxv)1,Σyyv(Σxyv)(Σxxv)1Σxyv). (52)

Based on the bijection, it is sufficient to control the χ2 distance between two multivariate Gaussian distributions. We introduce the null parameter space

𝓖0={θ0=(β,I,σ0):β=(β1,η0,0,,0)withβ1=M02andσ0=M28},

where 0η0M0/2. Note that 𝓖0𝓖(k,M0). Define p1=p2. Based on the mapping h, we have the corresponding null parameter space for Σv, 𝓕0={Σ0v} where

Σ0v=((β1)2+η02+σ02β1η00β1100η0010000Ip1×p1).

We then introduce the alternative parameter space for Σv, which will induce a parameter space for (β,Ω,σ1) through the mapping h. Define 𝓕1={Σαv:α(p1,k,ρ)}, where

Σαv=((β1)2+η02+σ02β1η0ρ0αβ110αη0010ρ0αα0Ip1×p1), (53)

with ρ0=β1+σ0 and

(p1,k,ρ)={α:αRp1,α0=k2,αi{0,ρ}for1ip1}. (54)

Then we construct the corresponding parameter space 𝓖1 for (β,Ω,σ1), which is induced by the mapping h and the parameter space 𝓕1,

𝓖1={(β,Ω,σ1):(β,Ω,σ1)=h(Σv)forΣv𝓕1}. (55)

Similar to equation (7.15) in Cai and Guo (2017b), we can show that for (β,Ω,σ1)𝓖1, the corresponding β1 is α22ρ0+β11α22 and the difference between β1 and β1 is

β1β1=α22(β1ρ0)1α22=σ0α221α22.

By taking ρ=log(4p1/k2)/2n, we have |β1β1|C0klogp/n and hence β12+η02+kρ2M02 for M0Cklogp/n. Similar to the arguments between (7.15)-(7.18) in Cai and Guo (2017b), we can show that 𝓖1𝓖(k,M0). Let π denote the uniform prior over the parameter space 𝓖1 induced by the uniform prior of α over (p1,k,ρ) where ρ=log(4p1/k2)/2n. The control of L1(fπ,fθ0) is established in the following lemma, which follows from Lemma 2 of Cai and Guo (2017b) and is established in (7.21) of Cai and Guo (2017b).

Lemma 5.

Suppose that kc{n/logp,pγ}, where 0γ<12 and c is a sufficient small positive constant. For ρ=log(4p1/k2)/2n, we establish that α2min{11/M1,M11} and

L1(fπ,fθ0)14. (56)

To apply Lemma 4, we consider the functional T(θ)=β22 and calculate the distance

d=|β12+(σ01α22)2α22(β1)2|=σ01α22α22|β1+β1σ01α22|. (57)

Since β1=M0/2 and |β1β1|klogp/n, we have β1<0 and hence

σ1α22α22|β1+β1σ01α22|cklogpnσ0M0,

where the last inequality follows from the fact that β1<0, β1<0 and σ01α22<0. Combined with (56), an application of Lemma 4 leads to (48).

Proof of (49)

We construct the following parameter spaces,

𝓖0={θ0=(β,I,σ0):β=(β1,η0,0,0),andβ1=M02}𝓖1={θ1=(β,I,σ0):β=(β1+σ0ϵ¯n,η0,0,0)}, (58)

where 0η0M0/2 and ϵ¯=log(17/16)/2. Since M0Cklogp/n, we have (β1+σ0ϵ¯n)2+η02M02 and 𝓖0, 𝓖1𝓖(k,M0).

The proof of the following lemma can be found in the supplementary material Section D.

Lemma 6.

If ϵ¯=log(17/16)/2, then we have

L1(fπ,fθ0)14. (59)

To apply Lemma 4, we take η0=0 and calculate the distance

d=|β12(β1)2|=|2β1σϵ¯n+σ2ϵ¯2n|cM01n.

Applying Lemma 4, we establish (49).

Proof of (47)

We introduce the following null and alternative parameter spaces,

𝓖0={(β,I,σ1):β=(η0,0,0,,0)}𝓖1={(β,I,σ1):β=(η0,0,α)withα1(p,k,ρ)}, (60)

where 0η0M0/2 and

1(p,k,ρ)={α:αRp2,α0=k1,αi{0,ρ}}. (61)

Let π denote the prior over the parameter space 𝓖1 induced by the uniform prior of α over 1(p,k,ρ) where ρ=min{log(4p1/(k1)2)/2n,M02η02k1}. The control of L1(fπ,fθ0) is established in the following lemma, which follows from Lemma 7 of Cai and Guo (2017c) and is established in (1.6) of Cai and Guo (2017c).

Lemma 7.

Suppose that kc{n/logp,pγ}, where 0γ<12 and c is a sufficient small positive constant. For ρ=min{log(4p1/(k1)2)/2n,M02η02k1}, we establish L1(fπ,fθ0)18.

By specifying η0=0, we have 𝓖0 and 𝓖1 defined in (60) are proper subspaces of the parameter space 𝓖(k,M0). To apply Lemma 4, we calculate the distance d=α22cmin{klogp/n,M02}. Applying Lemma 4, we establish (47).

Supplementary Material

1

Acknowledgement

We would like to thank Alexandre Tsybakov for helpful discussion on Section 3.2, and the reviewer and AE for helpful comments.

Footnotes

Supplementary Material

Supplement to “Optimal Estimation of Genetic Relatedness in High-dimensional Linear Regressions”. (.pdf file)

References

  1. Belloni A, Chernozhukov V, and Wang L. (2011), “Square-root lasso: pivotal recovery of sparse signals via conic programming,” Biometrika, 98(4), 791–806. [Google Scholar]
  2. Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, and Kruglyak L. (2013), “Finding the sources of missing heritability in a yeast cross,” Nature, 494(7436), 234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bonnet A, Gassiat E, Lévy-Leduc C. et al. (2015), “Heritability estimation in high dimensional sparse linear mixed models,” Electronic Journal of Statistics, 9(2), 2099–2129. [Google Scholar]
  4. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, Duncan L, Perry JR, Patterson N, Robinson EB et al. (2015), “An atlas of genetic correlations across human diseases and traits,” Nature genetics,. [DOI] [PMC free article] [PubMed]
  5. Cai TT, and Guo Z. (2017a), “Accuracy assessment for high-dimensional linear regression,” The Annals of Statistics, To appear.
  6. Cai TT, and Guo Z. (2017b), “Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity,” The Annals of Statistics, 45(2), 615–646. [Google Scholar]
  7. Cai TT, and Guo Z. (2017c), “Supplement to “Confidence intervals for high-dimensional linear regression: minimax rates and adaptivity,” The Annals of Statistics, 45(2). [Google Scholar]
  8. Cai TT, and Low MG (2005), “Nonquadratic estimators of a quadratic functional,” The Annals of Statistics, 33(6), 2930–2956. [Google Scholar]
  9. Cai TT, and Low MG (2006), “Optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 34(5), 2298–2325. [Google Scholar]
  10. Collier O, Comminges L, and Tsybakov AB (2015), “Minimax estimation of linear and quadratic functionals on sparsity classes,” The Annals of Statistics, To appear.
  11. Donoho DL, and Nussbaum M. (1990), “Minimax quadratic estimation of a quadratic functional,” Journal of Complexity, 6(3), 290–323. [Google Scholar]
  12. Efromovich S, and Low M. (1996), “On optimal adaptive estimation of a quadratic functional,” The Annals of Statistics, 24(3), 1106–1125. [Google Scholar]
  13. Fan J, Han X, and Gu W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107(499), 1019–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Golan D, and Rosset S. (2011), “Accurate estimation of heritability in genome wide studies using random effects models,” Bioinformatics, 27, i317–i323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Guo Z, Kang H, Cai TT, and Small DS (2016), “Confidence Intervals for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding with Voting,” arXiv preprint arXiv:1603.05224,.
  16. Janson L, Barber RF, and Candes E. (2016), “EigenPrism: inference for high dimensional signal-to-noise ratios,” Journal of the Royal Statistical Society: Series B (Statistical Methodology),. [DOI] [PMC free article] [PubMed]
  17. Javanmard A, and Montanari A. (2014), “Confidence intervals and hypothesis testing for high-dimensional regression,” The Journal of Machine Learning Research, 15(1), 2869–2909. [Google Scholar]
  18. Laurent B, and Massart P. (2000), “Adaptive estimation of a quadratic functional by model selection,” The Annals of Statistics, 28(5), 1302–1338. [Google Scholar]
  19. LeCam L. (1973), “Convergence of estimates under dimensionality restrictions,” The Annals of Statistics, 1(1), 38–53. [Google Scholar]
  20. Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, Mowry BJ, Thapar A, Goddard ME, and Witte JS (2013), “Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs,” Nature Genetics, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lee SH, Yang J, Goddard ME, Visscher PM, and Wray NR (2012), “Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood,” Bioinformatics, 28(19), 2540–2542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lee S, and van der Wer J. (2016), “MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information,” Bioinformatics, 32(9), 1420–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Maier R, Moser G, Chen G-B, Ripke S, Coryell W, Potash JB, Scheftner WA, Shi J, Weissman MM, Hultman CM et al. (2015), “Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder,” The American Journal of Human Genetics, 96(2), 283–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Manolio T. (2010), “Genomewide association studies and assessment of the risk of disease,” New England Journal of Medicine, 363, 166–176. [DOI] [PubMed] [Google Scholar]
  25. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, and Hirschhorn JN (2008), “Genome-wide association studies for complex traits: consensus, uncertainty and challenges,” Nature Reviews Genetics, 9(5), 356–369. [DOI] [PubMed] [Google Scholar]
  26. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P, Ruderfer DM, McQuillin A, Morris DW et al. (2009), “Common polygenic variation contributes to risk of schizophrenia and bipolar disorder,” Nature, 460(7256), 748–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Ren Z, Sun T, Zhang C-H, and Zhou HH (2015), “Asymptotic normality and optimalities in estimation of large Gaussian graphical models,” The Annals of Statistics, 43(3), 991–1026. [Google Scholar]
  28. Sun T, and Zhang C-H (2012), “Scaled sparse linear regression,” Biometrika, 101(2), 269–284. [Google Scholar]
  29. Tibshirani R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [Google Scholar]
  30. van de Geer S, Bühlmann P, Ritov Y, and Dezeure R. (2014), “On asymptotically optimal confidence regions and tests for high-dimensional models,” The Annals of Statistics, 42(3), 1166–1202. [Google Scholar]
  31. Vershynin R. (2012), “Introduction to the non-asymptotic analysis of random matrices,” in Compressed Sensing: Theory and Applications, eds. Eldar Y, and Kutyniok G. Cambridge University Press, pp. 210–268. [Google Scholar]
  32. Verzelen N, and Gassiat E. (2016), “Adaptive estimation of High-Dimensional Signal-to-Noise Ratios,” arXiv preprint arXiv:1602.08006,.
  33. Wray NR, Goddard ME, and Visscher PM (2007), “Prediction of individual genetic risk to disease from genome-wide association studies,” Genome Research, 17(10), 1520–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Wu T, Chen Y, Hastie T, Sobel E, and Lange K. (2009), “Genome-wide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Yang L, Neale BM, Liu L, Lee SH, Wray NR, Ji N, Li H, Qian Q, Wang D, Li J. et al. (2013), “Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: Genome-wide association study of both common and rare variants,” American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 162(5), 419–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Ye F, and Zhang C-H (2010), “Rate Minimaxity of the Lasso and Dantzig Selector for the `q Loss in `r Balls,” The Journal of Machine Learning Research, 11, 3519–3540. [Google Scholar]
  37. Yu B. (1997), “Assouad, Fano, and Le Cam,” in Festschrift for Lucien Le Cam Springer, pp. 423–435. [Google Scholar]
  38. Zhang C-H, and Zhang SS (2014), “Confidence intervals for low dimensional parameters in high dimensional linear models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217–242. [Google Scholar]
  39. Zhernakova A, van Diemen C, and Wijmenga C. (2009), “Detecting shared pathogenesis from the shared genetics of immune-related diseases,” Nature Reviews Genetics, 10, 43–45. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES