Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2017 Nov 2;101(5):737–751. doi: 10.1016/j.ajhg.2017.09.022

Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits

Huwenbo Shi 1,, Nicholas Mancuso 2, Sarah Spendlove 4, Bogdan Pasaniuc 1,2,3
PMCID: PMC5673668  PMID: 29100087

Abstract

Although genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions disproportionately contribute to the genome-wide correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach requires GWAS summary data only and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 36 quantitative traits, and identified 25 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 6 genomic regions that contribute to the genetic correlation of 10 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we report the distribution of local genetic correlations across the genome for 55 pairs of traits that show putative causal relationships.

Keywords: genetic correlation, genetic covariance, complex traits, etiology, epidemiology, genome-wide association study, summary statistics, heritability

Introduction

Genomic regions that harbor variants contributing to multiple traits provide valuable insights into the underlying biological mechanisms with which genetic variation impacts complex traits.1, 2, 3, 4, 5, 6, 7 Therefore, both de novo discovery of such regions as well as the quantification of the correlation in effect sizes at known shared regions are important to epidemiological and etiological studies. For example, genetic variants associated with multiple traits in genome-wide associations studies (GWASs) can be used as instrumental variables in Mendelian randomization analyses to suggest causal relationships among complex traits.7, 8, 9, 10 Unfortunately, many risk variants are left undetected by existing GWASs due to a combination of high polygenicity (i.e., many variants of small effects) and sample sizes which limits the power to detect genetic variants of small effect.11 To improve accuracy at sub-GWAS significant regions, recent works1, 2 proposed to utilize the posterior probability of two traits sharing a causal variant at a given risk region to detect genetic overlap. Although powerful in detecting shared genetic risk variants, the posterior probability does not convey the direction or magnitude of the genetic effect at the overlapped genomic regions.1, 2 Alternative approaches have used genetic correlation (i.e., correlation of the genetic components of two traits), that summarizes both direction and magnitude of effects, to gain insights into genetic overlap of complex traits.12, 13, 14 Traditional methods to estimate genetic correlation are hindered by the lack of availability of large-scale individual-level data due to privacy concerns as they require individual genotype and trait measurements on the same set of individuals.12, 14, 15 More recent works have shown that GWAS summary data (i.e., effect sizes and standard errors at all variants typed in the study) are sufficient to estimate genome-wide genetic correlation under a polygenic trait architecture by aggregating information across all typed variants in the study.16, 17

In this work, we investigate the correlation between traits due to typed genetic variants from a small region in the genome (i.e., local genetic correlation) as means to identify genomic regions that contribute disproportionately to the genetic sharing between traits. We introduce methods that estimate the local genetic correlation from GWAS summary data while allowing for overlapping GWAS samples and linkage disequilibrium (LD) among variants. We partition the genome-wide genetic sharing across approximately independent LD regions of 1.6 Mb in width on average.18 To allow for a broad range of causal effect sizes, our approach makes no distributional assumptions on the causal effect sizes by treating them as fixed quantities. Our method can be viewed as a natural extension to pairs of traits of recently proposed methods that quantify local SNP heritability from GWAS summary data under a fixed-effect model.19

We illustrate the utility of local genetic correlation through an analysis of GWAS summary data of 36 quantitative complex traits. We identify 25 genomic regions that show significant local genetic correlation across 27 pairs of traits; e.g., region chr2: 21M–23M that harbors APOB (MIM: 107730) shows a significant genetic correlation for the pair of traits high-density lipoprotein (HDL) and triglycerides (TG). Notably, 6 (out of the 25) regions show significant local genetic correlation although the genome-wide genetic correlation is not significantly different from 0; e.g., region chr6: 134M–136M shows a significant in local genetic correlation for mean cell volume (MCV) and platelet count (PLT) although the genome-wide genetic correlation MCV-PLT is negligible (0.02, 95% CI [−0.04, 0.07]). This shows that these traits are correlated at a local level (e.g., due to pleiotropy and/or shared pathways) that are not reflected in the genome-wide correlation (due to balancing effect of other loci; e.g., positive correlation partially canceling a negative correlation, see Figure 1). Regions with significant local genetic correlations can also be used to identify new risk loci. For example, although the region chr8: 9.2M–9.6M shows a significant local genetic correlation between HDL and LDL, it does not harbor GWAS variant for HDL and LDL. Finally, we explore putative causal relations between all the 36 studied traits using a recently proposed approach2 and report 55 instances of pairs with putative causality. For most of these pairs, we show that the local genetic correlation ascertained for GWAS signals specific to each trait is consistent with the putative causal relation while providing a directly interpretable quantity of the magnitude of effect.

Figure 1.

Figure 1

Examples of Two Different Distributions of Local Genetic Covariances that Result in the Same Total Genetic Covariance

Covariances shown at the top of each bar; total genetic covariance (ρg,total = 0.05). In the left example, the total genetic covariance is a summation of a large positive local genetic covariance at region 1 and two smaller negative local genetic covariances at region 2 and region 3 (e.g., regions 2 and 3 impact traits through a different pathway than region 1). In the right example, the total genetic covariance is a summation of small positive local genetic covariances (e.g., all three regions impact both traits through the same pathway). Positive local genetic covariance can be interpreted as a locus driving a pathway that regulates two traits in the same direction, and negative local genetic covariance the opposite direction.

Material and Methods

Overview of Methods

Genetic covariance measures the similarity between a pair of traits driven by genetic variations and enjoys wide applications in understanding relations between complex traits.13, 20, 21 Genetic covariance is traditionally estimated as a single measure across the entire genome to capture the genome-wide contribution of genetic variations to the correlation between phenotypes. Here, we introduce local genetic covariance, the similarity between pairs of traits driven by genetic variations localized at a specific region in the genome (e.g., one LD block), as a principled way to partition the shared genetic risk between traits. For example, a high genome-wide genetic covariance can be driven by one genomic region containing a shared risk variant or by a large number of regions, each with a small contribution reflecting putative causal relations (where all risk variants for one trait are risk variants for the other trait) and/or pleiotropy (risk variants contributing to both traits through shared pathways) (see Figure 1). Whereas genetic covariance quantifies the magnitude of co-variation of the genetic components of two traits in their original scale, genetic correlation quantifies co-variation in a standardized scale and is therefore comparable across pairs of traits and/or genomic regions for which magnitude of effect size may differ. As a motivating example, consider two traits modeled by ϕ=x1β1+x2β2+ϵ and ψ=x1γ1+x2γ2+δ, where x1 and x2 represent two independent SNPs. In the special case where γ is proportional to β by a factor of α, i.e., γ=αβ, the genetic covariance between the two traits is α(β12+β22) and is governed by α. However, the genetic correlation between the two traits is always 1 for positive α (−1 for negative α) regardless of the magnitude of α.

We start by defining local genetic covariance under the fixed effect model, making a distinction between genetic covariance and covariance of the causal effects, β and γ (see below). We then describe methods to estimate genetic covariance followed by an approach to standardize the local genetic covariance to estimate local genetic correlation.

Local Genetic Covariance under Fixed-Effect Model

Let ϕ=xβ+ϵ and ψ=xγ+δ be two traits measured at an individual, standardized so that E[ϕ]=E[ψ]=0 and Var[ϕ]=Var[ψ]=1, where β,γRp are the fixed effect size vectors for the two traits; xRp, the genotype vector of the individual at p SNPs, standardized so that E[x]=0, and Var[x]=V, the LD matrix; and ϵ,δ, random environmental effects independent of x, β, γ, with E[ϵ]=E[δ]=0, Var[ϵ]=σϵ2, Var[δ]=σδ2, and Cov[ϵ,δ]=ρe. Under these assumptions, one can decompose the phenotypic covariance, ρ, between ϕ and ψ into a summation of genetic covariance and environmental covariance, as

ρ=Cov[ϕ,ψ]=E[ϕψ]E[ϕ]E[ψ]=E[(xβ+ϵ)(xγ+δ)]=E[(xβ)(xγ)]+E[ϵδ]=Cov[xβ,xγ]+Cov[ϵ,δ]=βE[xx]γ+Cov[ϵ,δ]=β+ρe, (Equation 1)

where ρg=Cov[xβ,xγ]=β is the genetic covariance between the two traits (i.e., covariance between the genetic components of the two traits, xβ and xγ), and ρe the environmental covariance (i.e., covariance between the environmental effects of two traits, ϵ and δ). The magnitude and sign of local genetic covariance can be interpreted as the effect and direction of the local genetic component of one trait on that of the other. Thus, given the true effect size vectors, β, γ, and the LD matrix V, one can obtain ρg by plugging in these quantities.

Genetic Covariance versus Covariance of the Causal Effects

An alternative approach to the covariance of the genetic components of the traits is to quantify the covariance of the causal effects (i.e., ρg,causal=βγ). In the special case where there is no LD (i.e., V=I, the identity matrix), genetic covariance and covariance of the causal effects coincide, ρg=β=β=βγ=ρg,causal. However, in general genetic covariance is different from covariance of the causal effects as function of the LD between the causal variants. More importantly, high local genetic covariance does not necessarily imply high covariance of the causal effects. In fact, high genetic covariance can be attained even when causal variants are different between the traits. To illustrate the difference, consider an example involving two SNPs. Let β=(1,0) and γ=(0,1) be the causal effect vectors of the two traits, i.e., the two traits have two distinct set of causal variants. And let

V=[1.00.90.91.0]

be the LD matrix between the SNPs. In this example, the covariance of the causal effects is ρg,causal=βγ=0, whereas the genetic covariance is ρg=β=0.9. Thus, at a region where the causal variants are distinct for the two traits, covariance of the causal effects is always zero, whereas genetic covariance may be non-zero depending on the LD (see Figure 2). The two definitions measure genetic sharing at different levels of resolution. Local genetic covariance measures sharing at regional level, giving a measure of how similar the regional genetic components are between the two traits, and has applications in predicting the regional genetic component of one trait from that of the other. In contrast, local causal effect covariance measures sharing at an individual SNP level, giving a measure of how similar the causal effects are between the two traits. Consider a scenario where two traits are each driven locally by a different SNP in the same gene. In this case, the local causal effect covariance is zero since the two traits share no causal SNP. However, the local genetic covariance is non-zero if the two SNPs are in LD, which induces similarity in the genetic component of the two traits and is an indication of the gene being shared across the two traits. Although in this work we focus on genetic covariance, for completeness we discuss an estimator for covariance of the causal effects (ρg,causal) in Appendix A.

Figure 2.

Figure 2

Distribution of Simulated Genetic Covariance and Causal Effect Covariance across 100 LD-Independent Regions on Chromosome 1 Binned by Average LD between Causal Variants

The red lines represent the average local genetic covariance in each bin. For each region, we simulated 2 traits, each with 3 causal variants with effect sizes set to 0.01, and with no shared causal variants (see Figure S1 for the case where the two traits share causal variants). Genetic covariance varies with respect to LD whereas causal effect covariance is always 0 (horizontal dotted line). Since genetic covariance can be thought as an upper bound of prediction accuracy using causal effects from one trait to another, a positive genetic covariance indicates that non-zero prediction accuracy could be attained by virtue of LD tagging.

Estimating Local Genetic Covariance from GWAS Summary Data

In two GWASs involving n1 individuals for trait 1 (ϕ), n2 individuals for trait 2 (ψ), and ns shared individuals, we assume

[ϕϕs]=[YXs]β+[ϵϵs], [ψψs]=[ZX's]γ+[δδs], (Equation 2)

where (ϕ,ϕs)Rn1 and (ψ,ψs)Rn2 are the standardized trait values of all individuals in each GWAS; (Y,Xs)Rn1×p and (Z,X's)Rn2×p, column standardized genotype matrices of all individuals in each GWAS, where Xs and X's represent the genotype matrices for the same set of individuals and SNPs but standardized differently in each GWAS; and (ϵ,ϵs)Rn1 and (δ,δs)Rn2 are environmental effects of all individuals in each GWAS. We use the subscript s to represent individuals shared by both GWASs. We further assume that E[ϵ]=E[δ]=E[ϵs]=E[δs]=0, Var[ϵ]=Var[ϵs]=σϵ2I, Var[δ]=Var[δs]=σδ2I, Cov[ϵ,δ]=0, and Cov[ϵs,δs]=ρeI.

In a traditional GWAS, we obtain marginal effect size estimates, βˆgwas and γˆgwas, as

βˆgwas=1n1[YXs][ϕϕs]=1n1(YY+XsXs)β+1n1(Yϵ+Xsϵs)γˆgwas=1n2[ZX's][ψψs]=1n2(ZZ+X'sX's)γ+1n2(Zδ+X'sδs). (Equation 3)

Assuming individuals in both GWASs are drawn from the same population with LD matrix V, we have βˆgwasN(,σϵ2n1V), γˆgwasN(,σδ2n2V). We also find

Cov[βˆgwas,γˆgwas]=E[βˆgwasγˆgwas]()()=ρen1n2E[XsXs]=ρensn1n2V, (Equation 4)

where the last equality follows from Isserlis’ theorem.22

Under infinite sample sizes, Var[βˆgwas]=Var[γˆgwas]=Cov[βˆgwas,γˆgwas]=0, and we have β=V1βˆgwas, γ=V1γˆgwas. Thus, local genetic covariance, ρg,local, can be computed as

ρg,local=(βˆgwasV1)V(V1γˆgwas)=βˆgwasV1γˆgwas. (Equation 5)

However, when sample sizes are finite, from bilinear form theory,23 the covariance between βˆgwas and γˆgwas creates bias, resulting in

E[βˆgwasV1γˆgwas]=β+ρen1n2tr(V)=β+p(ρρg,local)nsn1n2. (Equation 6)

Correcting for bias, we arrive at the unbiased estimator

ρˆg,local=n1n2βˆgwasV1γˆgwasnspρn1n2nsp. (Equation 7)

For rank-deficient LD matrix V, one replaces V1 with the pseudo-inverse (V) and p with q=rank(V), yielding the unbiased estimator

ρˆg,local=n1n2βˆgwasVγgwasnsqρn1n2nsq. (Equation 8)

Thus, in order to obtain an unbiased estimate of genetic covariance between a pair of traits, one needs to know their phenotypic covariance. When phenotypic covariance is not available, one can obtain an estimate from genome-wide summary association data using cross-trait LD Score regression,16

E[zϕ,jzψ,j|lj]=n1n2ρgplj+ρnsn1n2, (Equation 9)

where zϕ,j and zψ,j are the Z-scores of SNP j in the two traits, and lj the LD score of SNP j. Cross-trait LD Score regression regresses the product of Z-scores at each SNP against its LD score, lj, and accounts for bias generated by overlapping samples through the intercept term, ρnsn1n2,16 from which one can obtain an estimate of phenotypic covariance, ρ.

In the special case when βˆgwas and γˆgwas are obtained for the same trait on the same set of individuals (i.e., βˆgwas=γˆgwas, n1 = n2 = ns, ρ=1), Equation 7 reduces to the local SNP-heritability estimator.19 When ns = 0 (i.e., no shared individuals between the GWASs), the unbiased estimator is simply ρˆg,local=βˆgwasV1γˆgwas. An interpretation for this simple formula is that in the absence of sample overlap, the covariance in the noise, ϵ and δ, is 0 and does thus not introduce bias into the estimate of ρg,local.

Following bilinear form theory,23 we can estimate the variance for ρˆg,local as

Var[ρˆg,local]=(n1n2n1n2nsp)2[(pρensn1n2)2+σϵ2σδ2pn1n2+σδ2hgϕ,local2n2+σϵ2hgψ,local2n1+2nsρeρg,localn1n2]. (Equation 10)

For rank deficient LD matrix with rank(V)=q, one replaces p with q in Equation 10.

Accounting for Statistical Noise in LD Estimates

Limited sample size of external reference panels creates statistical noise in the estimated LD matrix that biases our estimates. Following our previous work,19 we apply truncated-SVD regularization24 to remove noise in external reference LD. We note that βˆgwasVγˆgwas=i=1qsi=i=1q1wi(βˆgwasui)(γˆgwasui), where wi and ui are the eigenvalues and eigenvectors of the LD matrix V and q=rank(V). We use sˆi=1wˆi(βˆgwasuˆi)(γˆgwasuˆi) to denote the counterpart obtained from external reference LD matrix Vˆ. We show through simulations that the bulk of βˆgwasVγˆgwas comes from si where iq and that sisˆi for iq, thus justifying truncated-SVD as an appropriate regularization method when only external reference LD (Vˆ) is available.

Let g(βˆgwas,γˆgwas,k)=i=1ksˆi=i=1k1wˆi(βˆgwasuˆi)(γˆgwasuˆi) be the truncated-SVD regularized estimates for βˆgwasVγˆgwas, then it can be shown that

E[g(βˆgwas,γˆgwas,k)]=nsk(ρρg)n1n2+i=1kwˆi(βuˆi)(γuˆi). (Equation 11)

Assuming wˆi=wi and uˆi=ui for ik, Equation 11 is a biased approximation of ρg,local, with bias nsk(ρρg)n1n2. Correcting for the bias, we arrive at the estimator

ρˆg,local=n1n2g(βˆgwas,γˆgwas,k)nsρkn1n2nsk, (Equation 12)

which has variance

Var[ρˆg,local]=(n1n2n1n2nsk)2[(kρensn1n2)2+σϵ2σδ2kn1n2+σδ2hgϕ,local2n2+σϵ2hgψ,local2n1+2nsρeρg,localn1n2]. (Equation 13)

Extension to Multiple Independent Regions

For genome partitioned into m regions, let

ϕ=x1β1++xmβm+ϵψ=x1γ1++xmγm+δ (Equation 14)

denote the phenotype measurements of two traits at an individuals, where we assume that SNPs in different pairs of regions are independent, i.e., E[xikxil]=0 for all ij, k{1,,pi}, and l{1,,pj}, where pi and pj are the number of SNPs in region i and j. Under these assumptions, we decompose the phenotypic covariance, ρ, between ϕ and ψ, into a summation of per-region genetic covariance and environmental covariance

ρ=Cov[ϕ,ψ]=E[(x1β1++xmβm+ϵ)(x1γ1++xmγm+δ)]=E[(x1β1)(x1γ1)]++E[(xmβm)(xmγm)]+E[ϵδ]=i=1mCov[xiβi,xiγi]+Cov[ϵ,δ]=i=1mβiViγi+ρe, (Equation 15)

where ρg,local,i=Cov[xiβi,xiγi]=βiViγi is the local genetic covariance between the pair of traits attributed to genetic variants at region i. Following strategies outlined in previous sections, we arrive at the estimator for genetic covariance at the ith region,

ρˆg,local,i=n1n2g(βˆgwas,i,γˆgwas,i,k)ns(ρj=1,jimρˆg,local,j)kin1n2nski, (Equation 16)

which defines a system of linear equation involving m unknown variables and m equations. Following bilinear form theory, we obtain variance estimate for ρˆg,local,i as

Var[ρˆg,local,i]=(n1n2n1n2nski)2[(kiρensn1n2)2+σϵ2σδ2kin1n2+σδ2hgϕ,local,i2n2+σϵ2hgψ,local,i2n1+2nsρeρg,local,in1n2]+j=1,jim(nskjn1n2nski)2Var[ρˆg,local,j], (Equation 17)

which also defines a system of linear equations with m equations and m variables. In the special case where there is no sample overlap (ns = 0), ρˆg,local,i reduces to g(βˆgwas,γˆgwas,k) with Var[ρˆg,local,i]=σϵ2σδ2kin1n2+σδ2hgϕ,local,i2n2+σϵ2hgψ,local,i2n1σϵ2σδ2kin1n2, i.e., both the local genetic covariance and its variance can be estimated independent of all other windows.

When k1==km=k, i.e., all regions use the same number of eigenvectors in the truncated-SVD regularization, summing over i on both sides of Equation 16 yields

ρˆg=i=1ρˆg,local,i=n1n2n1n2nski=1mg(βˆgwas,i,γˆgwas,i,k)knsn1n2nski=1m(ρj=1,jimρˆg,local,j)=n1n2n1n2nski=1mg(βˆgwas,i,γˆgwas,i,k)knsn1n2nski=1m(ρρˆg+ρˆg,local,i)=n1n2n1n2nski=1mg(βˆgwas,i,γˆgwas,i,k)+knsmknsn1n2nskρˆgknsmρn1n2nsk. (Equation 18)

Solving for ρˆg yields

ρˆg=n1n2i=1mg(βˆgwas,i,γˆgwas,i,k)knsmρn1n2knsm, (Equation 19)

which has variance

Var[ρˆg]=(n1n2n1n2knsm)2i=1mVar[g(βˆgwas,i,γˆgwas,i,k)]. (Equation 20)

Thus, if k is chosen such that (n1n2knsm) is small (i.e., n1n2n1n2knsm large), the estimate of total genetic covariance will have large standard error. To reduce standard error in the estimates (at the cost of some bias), we recommend choosing k such that n1n2n1n2knsm is less than 2. When testing for statistical significance, we assume that the estimates of local and genome-wide genetic covariance and correlation follow a normal distribution.

Standardizing Local Genetic Covariance

We estimate the local genetic correlation for the ith region as

rˆg,local,i=ρˆg,local,ihˆgϕ,local,i2hˆgψ,local,i2, (Equation 21)

where hˆgϕ,local,i2 and hˆgψ,local,i2 denote the local SNP heritability of trait ϕ and ψ at the ith region. In some cases, this estimator of local genetic correlation may yield an estimate with magnitude greater than 1, and we cap the estimate at −1 or 1. In simulations, we show that rˆg,local,i is approximately unbiased when both traits are heritable at the ith region. In practice, however, the terms hˆgϕ,local,i2 and hˆgψ,local,i2 can be close to zero, greatly inflating the standard error of rˆg,local,i. Thus, we recommend estimating local genetic correlation only at regions with significant local SNP heritability. One can also estimate local genetic correlation at a set of regions. For example, to estimate genetic correlation at regions indexed by the index set C, one applies the formula

rˆg,C=iCρˆg,local,iiChˆϕ,g,local,i2iChˆψ,g,local,i2. (Equation 22)

We estimate standard error of local genetic correlation at a single region through a parametric bootstrap approach25 and local genetic correlation at a set of regions through jackknife.

Simulation Framework

Starting from half (202 individuals) of the EUR reference panel from the 1000 Genomes Project,26 we simulated genotype data for 50,000 individuals at HapMap327 SNPs with minor allele frequency (MAF) greater than 5% in 100 randomly selected LD-independent regions defined in Berisa and Pickrell18 on chromosome 1 using HAPGEN2.27 We used the other half of the EUR reference panel (203 individuals) to obtain external reference LD matrices.

We simulated phenotypes from the genotypes according to the linear model ϕ=+ϵ and ψ=+δ, where X is the column-standardized genotype matrix. We drew the effects of causal SNPs (βC, γC) from the distribution

N([00],[hgϕ2|C|Iρg|C|Iρg|C|Ihgψ2|C|I]), (Equation 23)

where C is the index set of causal SNPs, and set the effects of all other SNPs to be zero. We then drew (ϵ, δ) from the distribution

N([00],[(1hgϕ2)IρeIρeI(1hgψ2)I]). (Equation 24)

Finally, we simulated GWAS summary statistics using methods outlined in previous sections. For each β and γ drawn from the normal distribution, we simulated 1,000 sets of summary statistics by varying ϵ and δ and applied ρ-HESS to estimate genetic covariance and genetic correlation for each set of the simulated summary statistics.

Empirical Datasets

We obtained GWAS summary data for 36 quantitative complex traits and diseases from 15 GWAS consortia or institutions (see Table 1), all of which are based on individuals of European ancestry and have sample size greater than 20,000. We used approximately independent genomic regions previously defined18 to partition the genome and restricted our analyses on HapMap3 SNPs with minor allele frequency (MAF) greater than 5% in the European population in the 1000 Genomes data.26 We also removed stand-ambiguous SNPs prior to our analyses. We follow the method previously outlined19 to estimate and re-inflate λgc and to choose the number of eigenvectors to include in estimating local genetic covariance and SNP heritability.

Table 1.

A Summary of the 36 GWAS Summary Datasets Analyzed

Trait Name Abbreviation Consortium # Gen Corr All Consortium # Gen Corr outside Consortium Approx. Sample Size
Age at menarche28 AM REPROGEN 21 (4) 21 (4) 133K
Body mass index29 BMI GIANT 27 (17) 23 (14) 231K
Height30 HEIGHT GIANT 17 (2) 13 (1) 241K
Hip circumference31 HIP GIANT 23 (14) 19 (10) 144K
Waist circumference31 WC GIANT 26 (18) 22 (15) 153K
Waist-to-hip ratio31 WHR GIANT 27 (19) 23 (16) 143K
Haemoglobin32 HB HAEMGEN 21 (10) 18 (8) 51K
Mean cell haemoglobin32 MCH HAEMGEN 9 (1) 8 (1) 44K
MCH concentration32 MCHC HAEMGEN 6 (4) 2 (1) 47K
Mean cell volume32 MCV HAEMGEN 12 (3) 10 (1) 49K
Packed cell volume32 PCV HAEMGEN 18 (11) 14 (8) 45K
Red blood cell count32 RBC HAEMGEN 20 (10) 17 (8) 46K
Number of platelets33 PLT HAEMGEN 9 (1) 6 (1) 67K
Fasting glucose34 FG MAGIC 19 (9) 16 (8) 46K
Fasting insulin34 FI MAGIC 20 (12) 18 (12) 46K
HBA1C35 HBA1C MAGIC 19 (14) 18 (13) 46K
HOMA-B34 HOMA-B MAGIC 17 (11) 15 (11) 46K
HOMA-IR34 HOMA-IR MAGIC 21 (12) 21 (12) 46K
High-density lipoprotein36 HDL GLGC 23 (12) 21 (11) 96K
Low-density lipoprotein36 LDL GLGC 19 (6) 17 (4) 91K
Total cholesterol36 TC GLGC 18 (3) 15 (1) 96K
Triglycerides36 TG GLGC 26 (14) 23 (11) 92K
Forearm BMD37 FA GEFOS 4 (1) 2 (0) 53K
Femoral neck BMD37 FN GEFOS 4 (2) 2 (0) 53K
Lumbar spine BMD37 LS GEFOS 7 (1) 5 (0) 53K
Education years38 EY SSGAC 26 (5) 24 (4) 294K
Neuroticism39 NEURO SSGAC 5 (2) 3 (0) 171K
Subjective well-being39 SWB SSGAC 4 (1) 2 (0) 298K
Age first birth40 AFB BIOS 23 (5) 23 (5) 251K
Birth weight41 BW EGG 13 (1) 13 (1) 68K
Urinary albumin-to-creatinine ratio42 UACR DCCT-EDIC 11 (1) 11 (1) 53K
Rest heart rate43 HR EPPINGA 14 (0) 14 (0) 265K
Serum urate concentrations44 URATE GUGC 25 (14) 25 (14) 107K
Body fat45 BF Lu et al. 26 (17) 26 (17) 58K
Extra-glomerular filtration rate of creatinin46 CRN CKDGEN 10 (1) 10 (1) 133K
Age at menopause47 MP BCAC 6 (0) 6 (0) 70K

We list the total number of traits with significant non-zero genome-wide genetic correlation (two-tailed p < 0.05/630) and the total number of traits outside the consortium with significant non-zero genome-wide genetic correlation in the fourth and fifth column, respectively. Number of traits for which the magnitude of genetic correlation is both significantly non-zero and greater than 0.2 is shown in parentheses.

Local Genetic Correlation at Regions Ascertained for GWAS Signals

Recent works leverage the difference in correlations of Z-scores at genomic regions ascertained for GWAS signals specific to each trait to prioritize putative causal models between pairs of complex traits.2, 3 We evaluated the local genetic correlation at regions harboring GWAS signals specific to each trait across all 298 pairs of traits exhibiting significant genome-wide genetic correlation. We estimate local genetic correlations only for pairs of traits for which the number of loci harboring GWAS hits specific to each trait is greater than 10. The confidence intervals (1.96 times jackknife standard error on each side) of the ascertained local genetic correlations (rˆg,local,trait1 and rˆg,local,trait2) do not overlap; one of the confidence intervals overlap with 0 and the other does not.

Results

Local Genetic Correlation Estimation in Simulations

We evaluated the performance of our approach (ρ-HESS) through simulations across a wide range of disease architectures. We included cross-trait LDSC,16 an approach that assumes a random-effect model, in the comparison for completeness purposes. When LD is estimated in-sample, ρ-HESS provides an unbiased estimate of local genetic covariance and nearly unbiased estimates of genetic correlation (i.e., genetic covariance divided by the square root of local SNP heritability, see Material and Methods) (Figure S2). Next, we quantified the performance in the more realistic case when in-sample LD is unavailable and needs to be estimated from external reference panels. Although both cross-trait LDSC and ρ-HESS provide accurate estimates of genetic correlation, we observe superior accuracy with higher precision for ρ-HESS (Figures 3, S4, S6, and S7). We attribute the lower standard error of ρ-HESS to the truncated-SVD regularization of the LD matrix which effectively reduces the degree of freedom of the bi-linear form in Equation 7 (Figure S10). Different genomic regions vary in their total amount of LD and we observed that the accuracy of genetic correlation estimation decreases with the total amount of regional LD (Figure S11). This is expected as high LD regions lead to high rank deficiencies in the LD matrix and small eigenvalues, thus increasing the level of statistical noise in the estimation. We also evaluated the performance of local genetic correlation estimation in simulations where we varied the number of causal variants in each region. Overall, we observe that our estimator of genetic covariance and correlation is not sensitive to the underlying polygenicity (i.e., number of causal SNPs) (Figures 3, S5, S8, and S9). Finally, we also evaluated the performance of the estimator when causal variants are all drawn from DHS regions48 and observed that the performance is not sensitive to the uneven distribution of causal variants (Figure S3).

Figure 3.

Figure 3

Performance of ρ-HESS and Cross-trait LDSC using External Reference LD across 100 LD-Independent Regions, with Each Region Having 1,000 Simulations

Here, each dot represents the mean (more than 100 regions) of the average performance (more than 1,000 simulations per region), with error bars representing 1.96 times the standard error on both sides. Overall, ρ-HESS provides approximately unbiased estimates of local genetic covariance (A) and correlation (B) and is not sensitive to the underlying genetic architectures (covariance in C and correlation in D). We also observe that ρ-HESS is less biased, is more consistent, and has smaller standard error than cross-trait LDSC.

Local Genetic Correlation across 36 Quantitative Traits

We analyzed GWAS summary data from 36 complex traits to obtain local genetic correlations at 1,703 approximately LD-independent regions in the genome (∼1.6 Mb in width on average).18 First, as a quality control step, we aggregated the local estimates into genome-wide estimates of genetic correlation (see Material and Methods) and compared to the cross-trait LDSC estimates. Reassuringly, we find a high degree of consistency with genetic correlations estimated by cross-trait LDSC regression (R = 0.77; Figures 4 and S13). Our estimator provides lower standard errors as compared to cross-trait LDSC (likely due to the truncated-SVD regularization procedure) and yields consistently lower estimates for pairs of traits from the same consortium where we conservatively assume full sample overlap (see Discussion). Overall, we identify 298 pairs of traits with significant genome-wide genetic correlation (p < 0.05/630). These include previously reported correlations, e.g., body mass index (BMI) and triglyceride (TG), as well as complex traits that have not been studied before using genetic correlation, e.g., red blood cell count (RBC) and fasting insulin (FI) (Figure 4).

Figure 4.

Figure 4

Genetic Correlation across the 36 Complex Traits Obtained by ρ-HESS and Cross-trait LDSC17

The magnitude of the correlation is represented by the color and the size of the square. Among the 630 pairs of traits, ρ-HESS (top half) (cross-trait LDSC [bottom half]) identified 298 (115) pairs showing significant genetic correlation (marked with dots).

Next, we searched for genomic regions that disproportionately contribute to the genetic correlation of the 36 analyzed traits; we excluded the HLA region due to complex LD patterns. We identify 25 genomic regions that show both significant local genetic correlation (two-tailed p < 0.05/1,703) as well as significant local SNP heritability (one-tailed p < 0.05/1703) (see Table 2, Figures S14–S16). For example, the estimate of local genetic correlation between HDL and TG at chr11: 116–117 Mb is −0.82 (95% CI [−0.95, −0.69]), suggesting highly shared genetic architecture at this region for HDL and TG. Indeed, the region chr11: 116M–117M harbors APOA1 (MIM: 107680), which is known to be associated with multiple lipid traits.36 Interestingly, 4 out of the 25 regions do not contain GWAS-significant SNPs (p < 5 × 10−8) for either one or both traits and can be viewed as new risk regions for these traits.

Table 2.

Loci that Show Significant Local Genetic Covariance (Two-Tailed p < 0.05/1,703) and Local SNP Heritability (One-Tailed p < 0.05/1,703) for Both Traits

Trait1 Trait2 Locus hg,local,trait12 hg,local,trait22 rg,local
AM HEIGHT chr9: 107M–109M 0.15 (0.02) 0.05 (0.01) 0.61 ([0.34,0.87])
BMI HIP chr16: 53M–55M 0.22 (0.02) 0.19 (0.03) 0.99 ([0.76,1.00])
BMI HIP chr18: 57M–59M 0.14 (0.02) 0.13 (0.02) 0.99 ([0.71,1.00])
BMI WC chr16: 53M–55M 0.22 (0.02) 0.21 (0.03) 1.00 ([0.78,1.00])
BMI WC chr18: 57M–59M 0.14 (0.02) 0.13 (0.02) 1.00 ([0.72,1.00])
BW HEIGHT chr12: 65M–67M 0.14 (0.02) 0.23 (0.02) 0.93 ([0.70,1.00])
HDL TG chr2: 21M–23M 0.16 (0.03) 0.22 (0.03) −0.94 ([−1.00, −0.65])
HDL TG chr8: 19M–20M 0.65 (0.04) 0.82 (0.04) −1.00 ([−1.00, −0.91])
HDL TG chr11: 116M–117M 0.40 (0.04) 1.27 (0.06) −0.82 ([−0.95,-0.69])
HDL TG chr15: 58M–59M 1.18 (0.06) 0.18 (0.03) 0.89 ([0.68,1.00])
HEIGHT HIP chr16: 4M–5M 0.06 (0.01) 0.10 (0.02) 0.73 ([0.41,1.00])
HIP WC chr16: 53M–55M 0.19 (0.03) 0.21 (0.03) 0.99 ([0.73,1.00])
HIP WC chr18: 57M–59M 0.13 (0.02) 0.13 (0.02) 1.00 ([0.69,1.00])
LDL TG chr1: 61M–63M 0.14 (0.03) 0.28 (0.03) 0.98 ([0.67,1.00])
LDL TG chr2: 21M–23M 0.84 (0.05) 0.22 (0.03) 0.62 ([0.46,0.78])
LDL TG chr8: 126M–128M 0.16 (0.03) 0.32 (0.04) 0.94 ([0.63,1.00])
LDL TG chr19: 18M–19M 0.18 (0.03) 0.21 (0.03) 0.99 ([0.72,1.00])
PLT RBC chr6: 134M–136M 0.26 (0.05) 0.66 (0.09) −0.99 ([−1.00, −0.69])

HDL HEIGHT chr11: 47M–49M 0.17 (0.02) 0.07 (0.01) 0.61 ([0.42,0.80])
HDL LDL chr2: 21M–23M 0.16 (0.03) 0.84 (0.05) −0.56 ([−0.74, −0.39])
HDL LDL chr8: 9M–9M 0.14 (0.02) 0.12 (0.02) 0.99 ([0.70,1.00])
MCH MCV chr6: 24M–25M 0.49 (0.07) 0.37 (0.06) 0.97 ([0.67,1.00])
MCH MCV chr6: 134M–136M 0.86 (0.09) 0.70 (0.08) 0.98 ([0.76,1.00])
MCH PLT chr6: 134M–136M 0.86 (0.09) 0.26 (0.05) 1.00 ([0.72,1.00])
MCH RBC chr6: 134M–136M 0.86 (0.09) 0.66 (0.09) −0.98 ([−1.00, −0.75])
MCV PLT chr6: 134M–136M 0.70 (0.08) 0.26 (0.05) 1.00 ([0.72,1.00])
MCV RBC chr6: 134M–136M 0.70 (0.08) 0.66 (0.09) −0.98 ([−1.00, −0.74])
MP HEIGHT chr5: 175M–177M 0.31 (0.04) 0.10 (0.01) −0.63 ([−0.82, −0.45])
URATE MCH chr6: 24M–25M 0.13 (0.02) 0.53 (0.07) 0.56 ([0.33,0.79])
URATE MCV chr6: 24M–25M 0.13 (0.02) 0.41 (0.06) 0.66 ([0.39,0.92])

We list pairs of traits for which the genome-wide genetic correlation is significant (two-tailed p < 0.05/630) and negligible in top and bottom half of this table, respectively. Here, we focus only on the pairs of traits excluding TC (see Table S1 for pairs of traits involving TC). Numbers in parentheses represent standard errors for local SNP heritability estimates and 95% confidence intervals for local genetic correlation estimates.

Since genetic correlation is an aggregation of local genetic covariance, for pairs of traits with highly positive or negative genetic correlation, we expect the distribution of local genetic covariances to be shifted toward the positive or negative side (see Figure S17), whereas for pairs of traits with low genetic correlation, we expect the distribution of local genetic covariances to be centered around zero (see Figures 5 and6). Indeed, pairs of traits with higher genome-wide genetic correlation tend to harbor more loci with significant local genetic covariance (see Figure S14). For instance, only one region exhibits significant local genetic covariance for the pair of traits age at menarche (AM) and height (rg = 0.13, 95% CI [0.10, 0.13]), whereas four loci show significant local genetic covariance for the pair of traits LDL and TG (rg = 0.45, 95% CI [0.42, 0.49]).

Figure 5.

Figure 5

Manhattan-Style Plots Showing the Estimates of Local Genetic Covariance for the Pairs of Traits HDL and LDL

Although the genome-wide genetic correlation between HDL and LDL does not reach the significance level (p < 0.05/630), 11 loci exhibit significant local genetic covariance.

Figure 6.

Figure 6

Manhattan-Style Plots Showing the Estimates of Local Genetic Covariance for the Pairs of Traits BMI and TG

That the local genetic covariance between BMI and TG is mostly one-sided implies plausible causal relationship between the two traits.

Local Correlations for Pairs of Traits with Negligible Genome-wide Correlation

Several pairs of traits show negligible genome-wide genetic correlation although they share GWAS risk regions. For example HDL and LDL share several GWAS risk loci36 but the genome-wide genetic correlation is negligible (−0.05, 95% CI [−0.09, −0.01]).16 The absence of significant genome-wide genetic correlation between these pairs of traits can be attributed to either symmetric distribution of local genetic covariance (positive local genetic covariance cancels out negative local genetic covariance, see Figure 1) and/or lack of power to declare significance for genome-wide genetic correlation. Thus, we hypothesize that at the region-specific level, many loci may manifest significant local genetic covariance even if the genome-wide genetic correlation between a pair of traits is not significant. Indeed, 11 genomic regions show significant local genetic correlation (two-tailed p < 0.05/1,703) for HDL and LDL (see Figure 5). Some of these loci, e.g., chr2: 21M–23M, chr11: 116M–117M, and chr19: 44M–46M, harbor APOB, APOA1, and APOE (MIM: 107741), respectively, which are known to be involved in lipid genetics.36, 49, 50 Across all pairs of traits with non-significant genome-wide correlation, we identify 6 regions across 10 pairs of traits with significant local genetic correlation (two-tailed p < 0.05/1,703) and local SNP heritability (one-tailed p < 0.05/1,703) (see Table 2, Figure S16). For example, the region chr6: 134M–136M harbors the HBS1L (MIM: 612450)32, 51 and contributes to local genetic covariance across many blood traits (MCH, MCV, RBC, and PLT).

Genetic Correlation Ascertained for GWAS Risk Loci

Assessing the correlation in the effects at genomic regions ascertained for trait-specific GWAS regions can be used to prioritize putative causal models between complex traits. We utilized a recently proposed approach2 to assign putative causal relation to 55 pairs of traits. Restricting to 40 of the 55 pairs of traits that contain at least 10 regions with trait-specific GWAS signals (see Material and Methods), we quantified the local genetic correlation at genomic regions containing GWAS loci specific to each trait (see Table S2, Figure 7). Overall, the local genetic correlation is highly consistent with the putative causal relationships inferred by correlating the top signals at these loci.2 For example, when considering body mass index (BMI) and triglyceride levels (TG), the correlation at BMI-specific regions is significantly greater than TG-specific loci (rˆg,local,BM=0.47 95% CI [0.37, 0.57] versus rˆg,local,TG=0.02 95% [−0.14, 0.10]), indicating that loci that increase BMI tend to consistently increase TG, whereas loci that increase TG do not consistently affect BMI, consistent with the putative model that BMI causally increases TG (see Figure 6).2, 3 We also observe correlations consistent with a model in which years of education (EY) consistently decreases hemoglobin level (HB), LDL, and TG (see Table S2), in line with previous conclusions on the effect of education on health.52, 53 However, we note that education attainment (or other studied traits) may be confounded by other factors such as social status and that one should exercise caution when inferring causality from genetic data. Finally, we also report pairs of traits in which the genetic correlation approach attains different results from bi-directional regression on the top signals.2 For example, when considering body mass index (BMI) and age at menarche (AM), the local correlation approach do not yield different estimates (rg,local,BMI=0.49 95% CI [−0.63, −0.35] versus rg,local,AM=0.47 95% CI [−0.59, −0.35]), whereas the approach of Joseph et al.2 suggests a putative causal relation. This discrepancy can be due to different model assumptions, e.g., single causal variant versus allelic heterogeneity, with further investigations needed to assign causality from these data.

Figure 7.

Figure 7

Estimates of Local Genetic Correlation at Loci Ascertained for GWAS Risk Variants for Eight Example Pairs of Traits that Show Plausible Causal Relationship

We obtained standard error using a jackknife approach. Error bars represent 1.96 times the standard error on each side.

Discussion

We have described ρ-HESS, a method to estimate local genetic correlation from GWAS summary association data. Through extensive simulations, we demonstrated that our method is approximately unbiased and provides consistent results irrespective of causal architecture. We analyzed large-scale GWAS summary association data of 36 quantitative traits. Compared with cross-trait LDSC, our methods identified considerably more pairs of traits displaying significant genome-wide genetic correlation likely because of the truncated-SVD regularization of the LD matrix, which decreases the standard error of the estimates. We identify genomic regions that are significantly correlated across pairs of traits regardless of the significance of genome-wide correlation. Finally, we performed bi-directional analyses over the local genetic correlations to identify putative causal relationships, and report local genetic correlations at loci harboring GWAS signal specific to each trait.

We conclude with several limitations highlighting areas for future work. First, our estimator requires phenotype correlation between two traits, as well as the number of shared individuals between the two GWASs. We estimate the phenotype correlation through cross-trait LDSC assuming full sample overlap between GWAS within the same consortium and no sample overlap between GWAS across two consortia. Second, we note that our bi-directional analyses over local genetic correlation can be further extrapolated to infer putative causal models between complex traits. We refrain from making conclusive causal inferences from the bi-directional analyses because exact inference of causal relations is largely complicated by unobserved confounders such as socioeconomic status, population stratification, and/or biological pathways. Furthermore, most of the GWAS summary association data are adjusted for covariates such as age and gender to increase statistical power,54 and previous works have shown that adjusting for covariates can potentially lead to false positives.55 Third, in our real data analyses, we made the assumption that the loci are independent of each other. In reality, however, correlations may exist across adjacent loci due to long-range LD and can lead to biased estimates. Nevertheless, we note that previous works have indicated the effect of LD leakage to be minimal,19, 56 and we conjecture that this statement still holds in estimating local genetic correlation. Lastly, we use truncated-SVD to regularize LD matrix and to reduce standard error in the estimates of local genetic correlation, at the cost of introducing bias. Currently, we use a fixed number of eigenvectors in the truncated-SVD regularization, across all the loci. However, this approach may not be optimal for genomic regions with different LD structure and leave a principled approach of estimating the number of eigenvectors as future work.

Acknowledgments

This research was supported by NIH (United States Public Health Service) grants R01-HG009120, R01-GM053275, and U01-CA194393. We are grateful to Gleb Kichaev, Malika Kumar, Suraj Alva, and James Boocock for their helpful discussions that greatly improved the quality of this manuscript. We also thank Dr. Nicole Soranzo for kindly sharing summary data for the platelet traits.

Published: November 2, 2017

Footnotes

Supplemental Data include 17 figures and 2 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2017.09.022.

Appendix A

Quantifying Shared Genetics via Covariance of the Causal Effects

An alternative measure of shared genetics is the covariance of the causal effects (β and γ) of the two traits. Under the fixed-effect model, we define covariance of the causal effects, ρg,causal, as the dot product between the causal effect size vectors of the two traits,

ρg,causal=βγ. (Equation A1)

Here, we make the assumption that the average effect size of each SNP is 0.

The definition of covariance of the causal effects in Equation A1 coincides with genetic covariance under the random-effect model. As shown in the supplementary data of Bulik-Sullivan et al.,16 if one assumes that β and γ have zero mean and

Var[(β,γ)]=1p[hgϕ2ρgρghgψ2], (Equation A2)

then it can be shown that the genetic covariance between two traits is

Cov[xβ,xγ]=i=1pj=1pE[xixjβiγj]=i=1pE[xi2βiγi]=i=1pE[xi2]E[βiγi]=ρg. (Equation A3)

The random-effect model makes the implicit assumption that many SNPs are causal, which is appropriate for genome-wide analysis but not for local analysis, where few SNPs are likely to be causal.

Estimating Covariance of the Causal Effects from GWAS Summary Data

For completeness, we derive an estimator for ρg,causal. We assume a linear model for the two traits (see Material and Methods). The effect size estimates from GWAS, βˆgwas and γˆgwas, follow βˆgwasN(,1hϕ2n1V) and γˆgwasN(,1hψ2n2V), with Cov[βˆgwas,γˆgwas]=ρensn1n2V, where n1 and n2 are the sample size for the two GWASs and ns is the number of shared samples (see Material and Methods).

As the sample size, n1 and n2, of the two GWASs go to infinity, we have βgwas=limnβˆgwas= and γgwas=limnγˆgwas=, which implies β=V1βgwas and γ=V1γgwas, suggesting the following estimator for covariance of the causal effects,

ρg,causal=βγ=βgwasV2γgwas. (Equation A4)

In reality, however, finite sample sizes of GWAS results in noise in the estimates of β and γ, creating bias in the estimate of ρg,causal. From bilinear form theory, it can be shown that

E[βˆgwasV2γˆgwas]=βγ+ρentr(V2V)=βγ+ρentr(V1), (Equation A5)

suggesting the unbiased estimator of ρg,causal,

ρˆg,causal=βˆgwasV2γˆgwasnsρen1n2tr(V1), (Equation A6)

where the environmental covariance can be estimated through cross-trait LD Score regression.16

Web Resources

Supplemental Data

Document S1. Figures S1–S17 and Tables S1 and S2
mmc1.pdf (523KB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (1.9MB, pdf)

References

  • 1.Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pickrell J.K., Berisa T., Liu J.Z., Ségurel L., Tung J.Y., Hinds D.A. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mancuso N., Shi H., Goddard P., Kichaev G., Gusev A., Pasaniuc B. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 2017;100:473–487. doi: 10.1016/j.ajhg.2017.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W., Jansen R., de Geus E.J., Boomsma D.I., Wright F.A. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Price A.L., Spencer C.C., Donnelly P. Progress and promise in understanding the genetic basis of common diseases. Proc. Biol. Sci. 2015;282:20151684. doi: 10.1098/rspb.2015.1684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sheehan N.A., Didelez V., Burton P.R., Tobin M.D. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med. 2008;5:e177. doi: 10.1371/journal.pmed.0050177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Voight B.F., Peloso G.M., Orho-Melander M., Frikke-Schmidt R., Barbalic M., Jensen M.K., Hindy G., Hólm H., Ding E.L., Johnson T. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet. 2012;380:572–580. doi: 10.1016/S0140-6736(12)60312-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lawlor D.A., Harbord R.M., Sterne J.A., Timpson N., Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 2008;27:1133–1163. doi: 10.1002/sim.3034. [DOI] [PubMed] [Google Scholar]
  • 9.Davey Smith G., Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 2014;23(R1):R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Smith G.D., Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  • 11.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lee S.H., Yang J., Goddard M.E., Visscher P.M., Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Neale M., Cardon L. Volume 67. Springer Science & Business Media; 1992. (Methodology for Genetic Studies of Twins and Families). [Google Scholar]
  • 15.Haseman J.K., Elston R.C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
  • 16.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.R., ReproGen Consortium, Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3. Duncan L., Perry J.R. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pasaniuc B., Price A.L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shi H., Kichaev G., Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 2016;99:139–153. doi: 10.1016/j.ajhg.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hegmann J.P., Possidente B. Estimating genetic correlations from inbred strains. Behav. Genet. 1981;11:103–114. doi: 10.1007/BF01065621. [DOI] [PubMed] [Google Scholar]
  • 21.Carey G. Inference about genetic correlations. Behav. Genet. 1988;18:329–338. doi: 10.1007/BF01260933. [DOI] [PubMed] [Google Scholar]
  • 22.Isserlis L. On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika. 1918;12:134–139. [Google Scholar]
  • 23.Shayle R. John Wiley & Sons, Inc.; 1971. Searle. Linear models; p. 65. [Google Scholar]
  • 24.Hansen P.C. The truncatedsvd as a method for regularization. BIT. 1987;27:534–553. [Google Scholar]
  • 25.Efron B. Bayesian inference and the parametric bootstrap. Ann. Appl. Stat. 2012;6:1971–1997. doi: 10.1214/12-AOAS571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Richard A., International HapMap Consortium The international hapmap project. Nature. 2003;426:789–796. doi: 10.1038/nature02168. [DOI] [PubMed] [Google Scholar]
  • 28.Perry J.R., Day F., Elks C.E., Sulem P., Thompson D.J., Ferreira T., He C., Chasman D.I., Esko T., Thorleifsson G., Australian Ovarian Cancer Study. GENICA Network. kConFab. LifeLines Cohort Study. InterAct Consortium. Early Growth Genetics (EGG) Consortium Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514:92–97. doi: 10.1038/nature13545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shungin D., Winkler T.W., Croteau-Chonka D.C., Ferreira T., Locke A.E., Mägi R., Strawbridge R.J., Pers T.H., Fischer K., Justice A.E., ADIPOGen Consortium. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GEFOS Consortium. GENIE Consortium. GLGC. ICBP. International Endogene Consortium. LifeLines Cohort Study. MAGIC Investigators. MuTHER Consortium. PAGE Consortium. ReproGen Consortium New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.van der Harst P., Zhang W., Mateo Leach I., Rendon A., Verweij N., Sehmi J., Paul D.S., Elling U., Allayee H., Li X. Seventy-five genetic loci influencing the human red blood cell. Nature. 2012;492:369–375. doi: 10.1038/nature11677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Gieger C., Radhakrishnan A., Cvejic A., Tang W., Porcu E., Pistis G., Serbanovic-Canic J., Elling U., Goodall A.H., Labrune Y. New gene functions in megakaryopoiesis and platelet formation. Nature. 2011;480:201–208. doi: 10.1038/nature10659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Dupuis J., Langenberg C., Prokopenko I., Saxena R., Soranzo N., Jackson A.U., Wheeler E., Glazer N.L., Bouatia-Naji N., Gloyn A.L., DIAGRAM Consortium. GIANT Consortium. Global BPgen Consortium. Anders Hamsten on behalf of Procardis Consortium. MAGIC investigators New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 2010;42:105–116. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Soranzo N., Sanna S., Wheeler E., Gieger C., Radke D., Dupuis J., Bouatia-Naji N., Langenberg C., Prokopenko I., Stolerman E., WTCCC Common variants at 10 genomic loci influence hemoglobin A1(C) levels via glycemic and nonglycemic pathways. Diabetes. 2010;59:3229–3239. doi: 10.2337/db10-0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Willer C.J., Schmidt E.M., Sengupta S., Peloso G.M., Gustafsson S., Kanoni S., Ganna A., Chen J., Buchkovich M.L., Mora S., Global Lipids Genetics Consortium Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Zheng H.F., Forgetta V., Hsu Y.H., Estrada K., Rosello-Diez A., Leo P.J., Dahia C.L., Park-Min K.H., Tobias J.H., Kooperberg C., AOGC Consortium. UK10K Consortium Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature. 2015;526:112–117. doi: 10.1038/nature14878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Okbay A., Beauchamp J.P., Fontana M.A., Lee J.J., Pers T.H., Rietveld C.A., Turley P., Chen G.B., Emilsson V., Meddens S.F., LifeLines Cohort Study Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539–542. doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Okbay A., Baselmans B.M., De Neve J.E., Turley P., Nivard M.G., Fontana M.A., Meddens S.F., Linnér R.K., Rietveld C.A., Derringer J. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 2016;48:624–633. doi: 10.1038/ng.3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Barban N., Jansen R., de Vlaming R., Vaez A., Mandemakers J.J., Tropf F.C., Shen X., Wilson J.F., Chasman D.I., Nolte I.M., BIOS Consortium. LifeLines Cohort Study Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 2016;48:1462–1472. doi: 10.1038/ng.3698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Horikoshi M., Beaumont R.N., Day F.R., Warrington N.M., Kooijman M.N., Fernandez-Tajes J., Feenstra B., van Zuydam N.R., Gaulton K.J., Grarup N., CHARGE Consortium Hematology Working Group Genome-wide associations for birth weight and correlations with adult disease. Nature. 2016;538:248–252. doi: 10.1038/nature19806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Teumer A., Tin A., Sorice R., Gorski M., Yeo N.C., Chu A.Y., Li M., Li Y., Mijatovic V., Ko Y.A., DCCT/EDIC Genome-wide association studies identify genetic loci associated with albuminuria in diabetes. Diabetes. 2016;65:803–817. doi: 10.2337/db15-1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Eppinga R.N., Hagemeijer Y., Burgess S., Hinds D.A., Stefansson K., Gudbjartsson D.F., van Veldhuisen D.J., Munroe P.B., Verweij N., van der Harst P. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality. Nat. Genet. 2016;48:1557–1563. doi: 10.1038/ng.3708. [DOI] [PubMed] [Google Scholar]
  • 44.Köttgen A., Albrecht E., Teumer A., Vitart V., Krumsiek J., Hundertmark C., Pistis G., Ruggiero D., O’Seaghdha C.M., Haller T., LifeLines Cohort Study. CARDIoGRAM Consortium. DIAGRAM Consortium. ICBP Consortium. MAGIC Consortium Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 2013;45:145–154. doi: 10.1038/ng.2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lu Y., Day F.R., Gustafsson S., Buchkovich M.L., Na J., Bataille V., Cousminer D.L., Dastani Z., Drong A.W., Esko T. New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. Nat. Commun. 2016;7:10495. doi: 10.1038/ncomms10495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pattaro C., Teumer A., Gorski M., Chu A.Y., Li M., Mijatovic V., Garnaas M., Tin A., Sorice R., Li Y., ICBP Consortium. AGEN Consortium. CARDIOGRAM. CHARGe-Heart Failure Group. ECHOGen Consortium Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat. Commun. 2016;7:10023. doi: 10.1038/ncomms10023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Day F.R., Ruth K.S., Thompson D.J., Lunetta K.L., Pervjakova N., Chasman D.I., Stolk L., Finucane H.K., Sulem P., Bulik-Sullivan B., PRACTICAL consortium. kConFab Investigators. AOCS Investigators. Generation Scotland. EPIC-InterAct Consortium. LifeLines Cohort Study Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 2015;47:1294–1303. doi: 10.1038/ng.3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Getz G.S., Reardon C.A. Apoprotein E as a lipid transport and signaling protein in the blood, liver, and artery wall. J. Lipid Res. 2009;50(Suppl):S156–S161. doi: 10.1194/jlr.R800058-JLR200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Pallaud C., Gueguen R., Sass C., Grow M., Cheng S., Siest G., Visvikis S. Genetic influences on lipid metabolism trait variability within the Stanislas Cohort. J. Lipid Res. 2001;42:1879–1890. [PubMed] [Google Scholar]
  • 51.Soranzo N., Spector T.D., Mangino M., Kühnel B., Rendon A., Teumer A., Willenborg C., Wright B., Chen L., Li M. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat. Genet. 2009;41:1182–1190. doi: 10.1038/ng.467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Mary A. Silles. The causal effect of education on health: Evidence from the united kingdom. Econ. Educ. Rev. 2009;28:122–128. [Google Scholar]
  • 53.Baker D.P., Leon J., Smith Greenaway E.G., Collins J., Movit M. The education effect on population health: a reassessment. Popul. Dev. Rev. 2011;37:307–332. doi: 10.1111/j.1728-4457.2011.00412.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mefford J., Witte J.S. The covariate’s dilemma. PLoS Genet. 2012;8:e1003096. doi: 10.1371/journal.pgen.1003096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Aschard H., Vilhjálmsson B.J., Joshi A.D., Price A.L., Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am. J. Hum. Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Loh P.R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., Schizophrenia Working Group of Psychiatric Genomics Consortium. de Candia T.R., Lee S.H., Wray N.R., Kendler K.S. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47 doi: 10.1038/ng.3431. 1385–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S17 and Tables S1 and S2
mmc1.pdf (523KB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (1.9MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES