Abstract
Although genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions disproportionately contribute to the genome-wide correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach requires GWAS summary data only and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 36 quantitative traits, and identified 25 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 6 genomic regions that contribute to the genetic correlation of 10 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we report the distribution of local genetic correlations across the genome for 55 pairs of traits that show putative causal relationships.
Keywords: genetic correlation, genetic covariance, complex traits, etiology, epidemiology, genome-wide association study, summary statistics, heritability
Introduction
Genomic regions that harbor variants contributing to multiple traits provide valuable insights into the underlying biological mechanisms with which genetic variation impacts complex traits.1, 2, 3, 4, 5, 6, 7 Therefore, both de novo discovery of such regions as well as the quantification of the correlation in effect sizes at known shared regions are important to epidemiological and etiological studies. For example, genetic variants associated with multiple traits in genome-wide associations studies (GWASs) can be used as instrumental variables in Mendelian randomization analyses to suggest causal relationships among complex traits.7, 8, 9, 10 Unfortunately, many risk variants are left undetected by existing GWASs due to a combination of high polygenicity (i.e., many variants of small effects) and sample sizes which limits the power to detect genetic variants of small effect.11 To improve accuracy at sub-GWAS significant regions, recent works1, 2 proposed to utilize the posterior probability of two traits sharing a causal variant at a given risk region to detect genetic overlap. Although powerful in detecting shared genetic risk variants, the posterior probability does not convey the direction or magnitude of the genetic effect at the overlapped genomic regions.1, 2 Alternative approaches have used genetic correlation (i.e., correlation of the genetic components of two traits), that summarizes both direction and magnitude of effects, to gain insights into genetic overlap of complex traits.12, 13, 14 Traditional methods to estimate genetic correlation are hindered by the lack of availability of large-scale individual-level data due to privacy concerns as they require individual genotype and trait measurements on the same set of individuals.12, 14, 15 More recent works have shown that GWAS summary data (i.e., effect sizes and standard errors at all variants typed in the study) are sufficient to estimate genome-wide genetic correlation under a polygenic trait architecture by aggregating information across all typed variants in the study.16, 17
In this work, we investigate the correlation between traits due to typed genetic variants from a small region in the genome (i.e., local genetic correlation) as means to identify genomic regions that contribute disproportionately to the genetic sharing between traits. We introduce methods that estimate the local genetic correlation from GWAS summary data while allowing for overlapping GWAS samples and linkage disequilibrium (LD) among variants. We partition the genome-wide genetic sharing across approximately independent LD regions of 1.6 Mb in width on average.18 To allow for a broad range of causal effect sizes, our approach makes no distributional assumptions on the causal effect sizes by treating them as fixed quantities. Our method can be viewed as a natural extension to pairs of traits of recently proposed methods that quantify local SNP heritability from GWAS summary data under a fixed-effect model.19
We illustrate the utility of local genetic correlation through an analysis of GWAS summary data of 36 quantitative complex traits. We identify 25 genomic regions that show significant local genetic correlation across 27 pairs of traits; e.g., region chr2: 21M–23M that harbors APOB (MIM: 107730) shows a significant genetic correlation for the pair of traits high-density lipoprotein (HDL) and triglycerides (TG). Notably, 6 (out of the 25) regions show significant local genetic correlation although the genome-wide genetic correlation is not significantly different from 0; e.g., region chr6: 134M–136M shows a significant in local genetic correlation for mean cell volume (MCV) and platelet count (PLT) although the genome-wide genetic correlation MCV-PLT is negligible (0.02, 95% CI [−0.04, 0.07]). This shows that these traits are correlated at a local level (e.g., due to pleiotropy and/or shared pathways) that are not reflected in the genome-wide correlation (due to balancing effect of other loci; e.g., positive correlation partially canceling a negative correlation, see Figure 1). Regions with significant local genetic correlations can also be used to identify new risk loci. For example, although the region chr8: 9.2M–9.6M shows a significant local genetic correlation between HDL and LDL, it does not harbor GWAS variant for HDL and LDL. Finally, we explore putative causal relations between all the 36 studied traits using a recently proposed approach2 and report 55 instances of pairs with putative causality. For most of these pairs, we show that the local genetic correlation ascertained for GWAS signals specific to each trait is consistent with the putative causal relation while providing a directly interpretable quantity of the magnitude of effect.
Material and Methods
Overview of Methods
Genetic covariance measures the similarity between a pair of traits driven by genetic variations and enjoys wide applications in understanding relations between complex traits.13, 20, 21 Genetic covariance is traditionally estimated as a single measure across the entire genome to capture the genome-wide contribution of genetic variations to the correlation between phenotypes. Here, we introduce local genetic covariance, the similarity between pairs of traits driven by genetic variations localized at a specific region in the genome (e.g., one LD block), as a principled way to partition the shared genetic risk between traits. For example, a high genome-wide genetic covariance can be driven by one genomic region containing a shared risk variant or by a large number of regions, each with a small contribution reflecting putative causal relations (where all risk variants for one trait are risk variants for the other trait) and/or pleiotropy (risk variants contributing to both traits through shared pathways) (see Figure 1). Whereas genetic covariance quantifies the magnitude of co-variation of the genetic components of two traits in their original scale, genetic correlation quantifies co-variation in a standardized scale and is therefore comparable across pairs of traits and/or genomic regions for which magnitude of effect size may differ. As a motivating example, consider two traits modeled by and , where x1 and x2 represent two independent SNPs. In the special case where is proportional to by a factor of α, i.e., , the genetic covariance between the two traits is and is governed by α. However, the genetic correlation between the two traits is always 1 for positive α (−1 for negative α) regardless of the magnitude of α.
We start by defining local genetic covariance under the fixed effect model, making a distinction between genetic covariance and covariance of the causal effects, and (see below). We then describe methods to estimate genetic covariance followed by an approach to standardize the local genetic covariance to estimate local genetic correlation.
Local Genetic Covariance under Fixed-Effect Model
Let and be two traits measured at an individual, standardized so that and , where are the fixed effect size vectors for the two traits; , the genotype vector of the individual at p SNPs, standardized so that , and , the LD matrix; and , random environmental effects independent of , , , with , , , and . Under these assumptions, one can decompose the phenotypic covariance, ρ, between ϕ and ψ into a summation of genetic covariance and environmental covariance, as
(Equation 1) |
where is the genetic covariance between the two traits (i.e., covariance between the genetic components of the two traits, and ), and ρe the environmental covariance (i.e., covariance between the environmental effects of two traits, ϵ and δ). The magnitude and sign of local genetic covariance can be interpreted as the effect and direction of the local genetic component of one trait on that of the other. Thus, given the true effect size vectors, , , and the LD matrix , one can obtain ρg by plugging in these quantities.
Genetic Covariance versus Covariance of the Causal Effects
An alternative approach to the covariance of the genetic components of the traits is to quantify the covariance of the causal effects (i.e., ). In the special case where there is no LD (i.e., , the identity matrix), genetic covariance and covariance of the causal effects coincide, . However, in general genetic covariance is different from covariance of the causal effects as function of the LD between the causal variants. More importantly, high local genetic covariance does not necessarily imply high covariance of the causal effects. In fact, high genetic covariance can be attained even when causal variants are different between the traits. To illustrate the difference, consider an example involving two SNPs. Let and be the causal effect vectors of the two traits, i.e., the two traits have two distinct set of causal variants. And let
be the LD matrix between the SNPs. In this example, the covariance of the causal effects is , whereas the genetic covariance is . Thus, at a region where the causal variants are distinct for the two traits, covariance of the causal effects is always zero, whereas genetic covariance may be non-zero depending on the LD (see Figure 2). The two definitions measure genetic sharing at different levels of resolution. Local genetic covariance measures sharing at regional level, giving a measure of how similar the regional genetic components are between the two traits, and has applications in predicting the regional genetic component of one trait from that of the other. In contrast, local causal effect covariance measures sharing at an individual SNP level, giving a measure of how similar the causal effects are between the two traits. Consider a scenario where two traits are each driven locally by a different SNP in the same gene. In this case, the local causal effect covariance is zero since the two traits share no causal SNP. However, the local genetic covariance is non-zero if the two SNPs are in LD, which induces similarity in the genetic component of the two traits and is an indication of the gene being shared across the two traits. Although in this work we focus on genetic covariance, for completeness we discuss an estimator for covariance of the causal effects () in Appendix A.
Estimating Local Genetic Covariance from GWAS Summary Data
In two GWASs involving n1 individuals for trait 1 (), n2 individuals for trait 2 (), and ns shared individuals, we assume
(Equation 2) |
where and are the standardized trait values of all individuals in each GWAS; and , column standardized genotype matrices of all individuals in each GWAS, where and represent the genotype matrices for the same set of individuals and SNPs but standardized differently in each GWAS; and and are environmental effects of all individuals in each GWAS. We use the subscript s to represent individuals shared by both GWASs. We further assume that , , , , and .
In a traditional GWAS, we obtain marginal effect size estimates, and , as
(Equation 3) |
Assuming individuals in both GWASs are drawn from the same population with LD matrix , we have , . We also find
(Equation 4) |
where the last equality follows from Isserlis’ theorem.22
Under infinite sample sizes, , and we have , . Thus, local genetic covariance, , can be computed as
(Equation 5) |
However, when sample sizes are finite, from bilinear form theory,23 the covariance between and creates bias, resulting in
(Equation 6) |
Correcting for bias, we arrive at the unbiased estimator
(Equation 7) |
For rank-deficient LD matrix , one replaces with the pseudo-inverse () and p with , yielding the unbiased estimator
(Equation 8) |
Thus, in order to obtain an unbiased estimate of genetic covariance between a pair of traits, one needs to know their phenotypic covariance. When phenotypic covariance is not available, one can obtain an estimate from genome-wide summary association data using cross-trait LD Score regression,16
(Equation 9) |
where and are the Z-scores of SNP j in the two traits, and lj the LD score of SNP j. Cross-trait LD Score regression regresses the product of Z-scores at each SNP against its LD score, lj, and accounts for bias generated by overlapping samples through the intercept term, ,16 from which one can obtain an estimate of phenotypic covariance, ρ.
In the special case when and are obtained for the same trait on the same set of individuals (i.e., , n1 = n2 = ns, ), Equation 7 reduces to the local SNP-heritability estimator.19 When ns = 0 (i.e., no shared individuals between the GWASs), the unbiased estimator is simply . An interpretation for this simple formula is that in the absence of sample overlap, the covariance in the noise, ϵ and δ, is 0 and does thus not introduce bias into the estimate of .
Following bilinear form theory,23 we can estimate the variance for as
(Equation 10) |
For rank deficient LD matrix with , one replaces p with q in Equation 10.
Accounting for Statistical Noise in LD Estimates
Limited sample size of external reference panels creates statistical noise in the estimated LD matrix that biases our estimates. Following our previous work,19 we apply truncated-SVD regularization24 to remove noise in external reference LD. We note that , where and are the eigenvalues and eigenvectors of the LD matrix and . We use to denote the counterpart obtained from external reference LD matrix . We show through simulations that the bulk of comes from si where and that for , thus justifying truncated-SVD as an appropriate regularization method when only external reference LD () is available.
Let be the truncated-SVD regularized estimates for , then it can be shown that
(Equation 11) |
Assuming and for , Equation 11 is a biased approximation of , with bias . Correcting for the bias, we arrive at the estimator
(Equation 12) |
which has variance
(Equation 13) |
Extension to Multiple Independent Regions
For genome partitioned into m regions, let
(Equation 14) |
denote the phenotype measurements of two traits at an individuals, where we assume that SNPs in different pairs of regions are independent, i.e., for all , , and , where pi and pj are the number of SNPs in region i and j. Under these assumptions, we decompose the phenotypic covariance, ρ, between ϕ and ψ, into a summation of per-region genetic covariance and environmental covariance
(Equation 15) |
where is the local genetic covariance between the pair of traits attributed to genetic variants at region i. Following strategies outlined in previous sections, we arrive at the estimator for genetic covariance at the ith region,
(Equation 16) |
which defines a system of linear equation involving m unknown variables and m equations. Following bilinear form theory, we obtain variance estimate for as
(Equation 17) |
which also defines a system of linear equations with m equations and m variables. In the special case where there is no sample overlap (ns = 0), reduces to with , i.e., both the local genetic covariance and its variance can be estimated independent of all other windows.
When , i.e., all regions use the same number of eigenvectors in the truncated-SVD regularization, summing over i on both sides of Equation 16 yields
(Equation 18) |
Solving for yields
(Equation 19) |
which has variance
(Equation 20) |
Thus, if k is chosen such that is small (i.e., large), the estimate of total genetic covariance will have large standard error. To reduce standard error in the estimates (at the cost of some bias), we recommend choosing k such that is less than 2. When testing for statistical significance, we assume that the estimates of local and genome-wide genetic covariance and correlation follow a normal distribution.
Standardizing Local Genetic Covariance
We estimate the local genetic correlation for the ith region as
(Equation 21) |
where and denote the local SNP heritability of trait ϕ and ψ at the ith region. In some cases, this estimator of local genetic correlation may yield an estimate with magnitude greater than 1, and we cap the estimate at −1 or 1. In simulations, we show that is approximately unbiased when both traits are heritable at the ith region. In practice, however, the terms and can be close to zero, greatly inflating the standard error of . Thus, we recommend estimating local genetic correlation only at regions with significant local SNP heritability. One can also estimate local genetic correlation at a set of regions. For example, to estimate genetic correlation at regions indexed by the index set , one applies the formula
(Equation 22) |
We estimate standard error of local genetic correlation at a single region through a parametric bootstrap approach25 and local genetic correlation at a set of regions through jackknife.
Simulation Framework
Starting from half (202 individuals) of the EUR reference panel from the 1000 Genomes Project,26 we simulated genotype data for 50,000 individuals at HapMap327 SNPs with minor allele frequency (MAF) greater than 5% in 100 randomly selected LD-independent regions defined in Berisa and Pickrell18 on chromosome 1 using HAPGEN2.27 We used the other half of the EUR reference panel (203 individuals) to obtain external reference LD matrices.
We simulated phenotypes from the genotypes according to the linear model and , where is the column-standardized genotype matrix. We drew the effects of causal SNPs (, ) from the distribution
(Equation 23) |
where C is the index set of causal SNPs, and set the effects of all other SNPs to be zero. We then drew (, ) from the distribution
(Equation 24) |
Finally, we simulated GWAS summary statistics using methods outlined in previous sections. For each and drawn from the normal distribution, we simulated 1,000 sets of summary statistics by varying and and applied ρ-HESS to estimate genetic covariance and genetic correlation for each set of the simulated summary statistics.
Empirical Datasets
We obtained GWAS summary data for 36 quantitative complex traits and diseases from 15 GWAS consortia or institutions (see Table 1), all of which are based on individuals of European ancestry and have sample size greater than 20,000. We used approximately independent genomic regions previously defined18 to partition the genome and restricted our analyses on HapMap3 SNPs with minor allele frequency (MAF) greater than 5% in the European population in the 1000 Genomes data.26 We also removed stand-ambiguous SNPs prior to our analyses. We follow the method previously outlined19 to estimate and re-inflate and to choose the number of eigenvectors to include in estimating local genetic covariance and SNP heritability.
Table 1.
Trait Name | Abbreviation | Consortium | # Gen Corr All Consortium | # Gen Corr outside Consortium | Approx. Sample Size |
---|---|---|---|---|---|
Age at menarche28 | AM | REPROGEN | 21 (4) | 21 (4) | 133K |
Body mass index29 | BMI | GIANT | 27 (17) | 23 (14) | 231K |
Height30 | HEIGHT | GIANT | 17 (2) | 13 (1) | 241K |
Hip circumference31 | HIP | GIANT | 23 (14) | 19 (10) | 144K |
Waist circumference31 | WC | GIANT | 26 (18) | 22 (15) | 153K |
Waist-to-hip ratio31 | WHR | GIANT | 27 (19) | 23 (16) | 143K |
Haemoglobin32 | HB | HAEMGEN | 21 (10) | 18 (8) | 51K |
Mean cell haemoglobin32 | MCH | HAEMGEN | 9 (1) | 8 (1) | 44K |
MCH concentration32 | MCHC | HAEMGEN | 6 (4) | 2 (1) | 47K |
Mean cell volume32 | MCV | HAEMGEN | 12 (3) | 10 (1) | 49K |
Packed cell volume32 | PCV | HAEMGEN | 18 (11) | 14 (8) | 45K |
Red blood cell count32 | RBC | HAEMGEN | 20 (10) | 17 (8) | 46K |
Number of platelets33 | PLT | HAEMGEN | 9 (1) | 6 (1) | 67K |
Fasting glucose34 | FG | MAGIC | 19 (9) | 16 (8) | 46K |
Fasting insulin34 | FI | MAGIC | 20 (12) | 18 (12) | 46K |
HBA1C35 | HBA1C | MAGIC | 19 (14) | 18 (13) | 46K |
HOMA-B34 | HOMA-B | MAGIC | 17 (11) | 15 (11) | 46K |
HOMA-IR34 | HOMA-IR | MAGIC | 21 (12) | 21 (12) | 46K |
High-density lipoprotein36 | HDL | GLGC | 23 (12) | 21 (11) | 96K |
Low-density lipoprotein36 | LDL | GLGC | 19 (6) | 17 (4) | 91K |
Total cholesterol36 | TC | GLGC | 18 (3) | 15 (1) | 96K |
Triglycerides36 | TG | GLGC | 26 (14) | 23 (11) | 92K |
Forearm BMD37 | FA | GEFOS | 4 (1) | 2 (0) | 53K |
Femoral neck BMD37 | FN | GEFOS | 4 (2) | 2 (0) | 53K |
Lumbar spine BMD37 | LS | GEFOS | 7 (1) | 5 (0) | 53K |
Education years38 | EY | SSGAC | 26 (5) | 24 (4) | 294K |
Neuroticism39 | NEURO | SSGAC | 5 (2) | 3 (0) | 171K |
Subjective well-being39 | SWB | SSGAC | 4 (1) | 2 (0) | 298K |
Age first birth40 | AFB | BIOS | 23 (5) | 23 (5) | 251K |
Birth weight41 | BW | EGG | 13 (1) | 13 (1) | 68K |
Urinary albumin-to-creatinine ratio42 | UACR | DCCT-EDIC | 11 (1) | 11 (1) | 53K |
Rest heart rate43 | HR | EPPINGA | 14 (0) | 14 (0) | 265K |
Serum urate concentrations44 | URATE | GUGC | 25 (14) | 25 (14) | 107K |
Body fat45 | BF | Lu et al. | 26 (17) | 26 (17) | 58K |
Extra-glomerular filtration rate of creatinin46 | CRN | CKDGEN | 10 (1) | 10 (1) | 133K |
Age at menopause47 | MP | BCAC | 6 (0) | 6 (0) | 70K |
We list the total number of traits with significant non-zero genome-wide genetic correlation (two-tailed p < 0.05/630) and the total number of traits outside the consortium with significant non-zero genome-wide genetic correlation in the fourth and fifth column, respectively. Number of traits for which the magnitude of genetic correlation is both significantly non-zero and greater than 0.2 is shown in parentheses.
Local Genetic Correlation at Regions Ascertained for GWAS Signals
Recent works leverage the difference in correlations of Z-scores at genomic regions ascertained for GWAS signals specific to each trait to prioritize putative causal models between pairs of complex traits.2, 3 We evaluated the local genetic correlation at regions harboring GWAS signals specific to each trait across all 298 pairs of traits exhibiting significant genome-wide genetic correlation. We estimate local genetic correlations only for pairs of traits for which the number of loci harboring GWAS hits specific to each trait is greater than 10. The confidence intervals (1.96 times jackknife standard error on each side) of the ascertained local genetic correlations ( and ) do not overlap; one of the confidence intervals overlap with 0 and the other does not.
Results
Local Genetic Correlation Estimation in Simulations
We evaluated the performance of our approach (ρ-HESS) through simulations across a wide range of disease architectures. We included cross-trait LDSC,16 an approach that assumes a random-effect model, in the comparison for completeness purposes. When LD is estimated in-sample, ρ-HESS provides an unbiased estimate of local genetic covariance and nearly unbiased estimates of genetic correlation (i.e., genetic covariance divided by the square root of local SNP heritability, see Material and Methods) (Figure S2). Next, we quantified the performance in the more realistic case when in-sample LD is unavailable and needs to be estimated from external reference panels. Although both cross-trait LDSC and ρ-HESS provide accurate estimates of genetic correlation, we observe superior accuracy with higher precision for ρ-HESS (Figures 3, S4, S6, and S7). We attribute the lower standard error of ρ-HESS to the truncated-SVD regularization of the LD matrix which effectively reduces the degree of freedom of the bi-linear form in Equation 7 (Figure S10). Different genomic regions vary in their total amount of LD and we observed that the accuracy of genetic correlation estimation decreases with the total amount of regional LD (Figure S11). This is expected as high LD regions lead to high rank deficiencies in the LD matrix and small eigenvalues, thus increasing the level of statistical noise in the estimation. We also evaluated the performance of local genetic correlation estimation in simulations where we varied the number of causal variants in each region. Overall, we observe that our estimator of genetic covariance and correlation is not sensitive to the underlying polygenicity (i.e., number of causal SNPs) (Figures 3, S5, S8, and S9). Finally, we also evaluated the performance of the estimator when causal variants are all drawn from DHS regions48 and observed that the performance is not sensitive to the uneven distribution of causal variants (Figure S3).
Local Genetic Correlation across 36 Quantitative Traits
We analyzed GWAS summary data from 36 complex traits to obtain local genetic correlations at 1,703 approximately LD-independent regions in the genome (∼1.6 Mb in width on average).18 First, as a quality control step, we aggregated the local estimates into genome-wide estimates of genetic correlation (see Material and Methods) and compared to the cross-trait LDSC estimates. Reassuringly, we find a high degree of consistency with genetic correlations estimated by cross-trait LDSC regression (R = 0.77; Figures 4 and S13). Our estimator provides lower standard errors as compared to cross-trait LDSC (likely due to the truncated-SVD regularization procedure) and yields consistently lower estimates for pairs of traits from the same consortium where we conservatively assume full sample overlap (see Discussion). Overall, we identify 298 pairs of traits with significant genome-wide genetic correlation (p < 0.05/630). These include previously reported correlations, e.g., body mass index (BMI) and triglyceride (TG), as well as complex traits that have not been studied before using genetic correlation, e.g., red blood cell count (RBC) and fasting insulin (FI) (Figure 4).
Next, we searched for genomic regions that disproportionately contribute to the genetic correlation of the 36 analyzed traits; we excluded the HLA region due to complex LD patterns. We identify 25 genomic regions that show both significant local genetic correlation (two-tailed p < 0.05/1,703) as well as significant local SNP heritability (one-tailed p < 0.05/1703) (see Table 2, Figures S14–S16). For example, the estimate of local genetic correlation between HDL and TG at chr11: 116–117 Mb is −0.82 (95% CI [−0.95, −0.69]), suggesting highly shared genetic architecture at this region for HDL and TG. Indeed, the region chr11: 116M–117M harbors APOA1 (MIM: 107680), which is known to be associated with multiple lipid traits.36 Interestingly, 4 out of the 25 regions do not contain GWAS-significant SNPs (p < 5 × 10−8) for either one or both traits and can be viewed as new risk regions for these traits.
Table 2.
Trait1 | Trait2 | Locus | |||
---|---|---|---|---|---|
AM | HEIGHT | chr9: 107M–109M | 0.15 (0.02) | 0.05 (0.01) | 0.61 ([0.34,0.87]) |
BMI | HIP | chr16: 53M–55M | 0.22 (0.02) | 0.19 (0.03) | 0.99 ([0.76,1.00]) |
BMI | HIP | chr18: 57M–59M | 0.14 (0.02) | 0.13 (0.02) | 0.99 ([0.71,1.00]) |
BMI | WC | chr16: 53M–55M | 0.22 (0.02) | 0.21 (0.03) | 1.00 ([0.78,1.00]) |
BMI | WC | chr18: 57M–59M | 0.14 (0.02) | 0.13 (0.02) | 1.00 ([0.72,1.00]) |
BW | HEIGHT | chr12: 65M–67M | 0.14 (0.02) | 0.23 (0.02) | 0.93 ([0.70,1.00]) |
HDL | TG | chr2: 21M–23M | 0.16 (0.03) | 0.22 (0.03) | −0.94 ([−1.00, −0.65]) |
HDL | TG | chr8: 19M–20M | 0.65 (0.04) | 0.82 (0.04) | −1.00 ([−1.00, −0.91]) |
HDL | TG | chr11: 116M–117M | 0.40 (0.04) | 1.27 (0.06) | −0.82 ([−0.95,-0.69]) |
HDL | TG | chr15: 58M–59M | 1.18 (0.06) | 0.18 (0.03) | 0.89 ([0.68,1.00]) |
HEIGHT | HIP | chr16: 4M–5M | 0.06 (0.01) | 0.10 (0.02) | 0.73 ([0.41,1.00]) |
HIP | WC | chr16: 53M–55M | 0.19 (0.03) | 0.21 (0.03) | 0.99 ([0.73,1.00]) |
HIP | WC | chr18: 57M–59M | 0.13 (0.02) | 0.13 (0.02) | 1.00 ([0.69,1.00]) |
LDL | TG | chr1: 61M–63M | 0.14 (0.03) | 0.28 (0.03) | 0.98 ([0.67,1.00]) |
LDL | TG | chr2: 21M–23M | 0.84 (0.05) | 0.22 (0.03) | 0.62 ([0.46,0.78]) |
LDL | TG | chr8: 126M–128M | 0.16 (0.03) | 0.32 (0.04) | 0.94 ([0.63,1.00]) |
LDL | TG | chr19: 18M–19M | 0.18 (0.03) | 0.21 (0.03) | 0.99 ([0.72,1.00]) |
PLT | RBC | chr6: 134M–136M | 0.26 (0.05) | 0.66 (0.09) | −0.99 ([−1.00, −0.69]) |
HDL | HEIGHT | chr11: 47M–49M | 0.17 (0.02) | 0.07 (0.01) | 0.61 ([0.42,0.80]) |
HDL | LDL | chr2: 21M–23M | 0.16 (0.03) | 0.84 (0.05) | −0.56 ([−0.74, −0.39]) |
HDL | LDL | chr8: 9M–9M | 0.14 (0.02) | 0.12 (0.02) | 0.99 ([0.70,1.00]) |
MCH | MCV | chr6: 24M–25M | 0.49 (0.07) | 0.37 (0.06) | 0.97 ([0.67,1.00]) |
MCH | MCV | chr6: 134M–136M | 0.86 (0.09) | 0.70 (0.08) | 0.98 ([0.76,1.00]) |
MCH | PLT | chr6: 134M–136M | 0.86 (0.09) | 0.26 (0.05) | 1.00 ([0.72,1.00]) |
MCH | RBC | chr6: 134M–136M | 0.86 (0.09) | 0.66 (0.09) | −0.98 ([−1.00, −0.75]) |
MCV | PLT | chr6: 134M–136M | 0.70 (0.08) | 0.26 (0.05) | 1.00 ([0.72,1.00]) |
MCV | RBC | chr6: 134M–136M | 0.70 (0.08) | 0.66 (0.09) | −0.98 ([−1.00, −0.74]) |
MP | HEIGHT | chr5: 175M–177M | 0.31 (0.04) | 0.10 (0.01) | −0.63 ([−0.82, −0.45]) |
URATE | MCH | chr6: 24M–25M | 0.13 (0.02) | 0.53 (0.07) | 0.56 ([0.33,0.79]) |
URATE | MCV | chr6: 24M–25M | 0.13 (0.02) | 0.41 (0.06) | 0.66 ([0.39,0.92]) |
We list pairs of traits for which the genome-wide genetic correlation is significant (two-tailed p < 0.05/630) and negligible in top and bottom half of this table, respectively. Here, we focus only on the pairs of traits excluding TC (see Table S1 for pairs of traits involving TC). Numbers in parentheses represent standard errors for local SNP heritability estimates and 95% confidence intervals for local genetic correlation estimates.
Since genetic correlation is an aggregation of local genetic covariance, for pairs of traits with highly positive or negative genetic correlation, we expect the distribution of local genetic covariances to be shifted toward the positive or negative side (see Figure S17), whereas for pairs of traits with low genetic correlation, we expect the distribution of local genetic covariances to be centered around zero (see Figures 5 and6). Indeed, pairs of traits with higher genome-wide genetic correlation tend to harbor more loci with significant local genetic covariance (see Figure S14). For instance, only one region exhibits significant local genetic covariance for the pair of traits age at menarche (AM) and height (rg = 0.13, 95% CI [0.10, 0.13]), whereas four loci show significant local genetic covariance for the pair of traits LDL and TG (rg = 0.45, 95% CI [0.42, 0.49]).
Local Correlations for Pairs of Traits with Negligible Genome-wide Correlation
Several pairs of traits show negligible genome-wide genetic correlation although they share GWAS risk regions. For example HDL and LDL share several GWAS risk loci36 but the genome-wide genetic correlation is negligible (−0.05, 95% CI [−0.09, −0.01]).16 The absence of significant genome-wide genetic correlation between these pairs of traits can be attributed to either symmetric distribution of local genetic covariance (positive local genetic covariance cancels out negative local genetic covariance, see Figure 1) and/or lack of power to declare significance for genome-wide genetic correlation. Thus, we hypothesize that at the region-specific level, many loci may manifest significant local genetic covariance even if the genome-wide genetic correlation between a pair of traits is not significant. Indeed, 11 genomic regions show significant local genetic correlation (two-tailed p < 0.05/1,703) for HDL and LDL (see Figure 5). Some of these loci, e.g., chr2: 21M–23M, chr11: 116M–117M, and chr19: 44M–46M, harbor APOB, APOA1, and APOE (MIM: 107741), respectively, which are known to be involved in lipid genetics.36, 49, 50 Across all pairs of traits with non-significant genome-wide correlation, we identify 6 regions across 10 pairs of traits with significant local genetic correlation (two-tailed p < 0.05/1,703) and local SNP heritability (one-tailed p < 0.05/1,703) (see Table 2, Figure S16). For example, the region chr6: 134M–136M harbors the HBS1L (MIM: 612450)32, 51 and contributes to local genetic covariance across many blood traits (MCH, MCV, RBC, and PLT).
Genetic Correlation Ascertained for GWAS Risk Loci
Assessing the correlation in the effects at genomic regions ascertained for trait-specific GWAS regions can be used to prioritize putative causal models between complex traits. We utilized a recently proposed approach2 to assign putative causal relation to 55 pairs of traits. Restricting to 40 of the 55 pairs of traits that contain at least 10 regions with trait-specific GWAS signals (see Material and Methods), we quantified the local genetic correlation at genomic regions containing GWAS loci specific to each trait (see Table S2, Figure 7). Overall, the local genetic correlation is highly consistent with the putative causal relationships inferred by correlating the top signals at these loci.2 For example, when considering body mass index (BMI) and triglyceride levels (TG), the correlation at BMI-specific regions is significantly greater than TG-specific loci ( 95% CI [0.37, 0.57] versus 95% [−0.14, 0.10]), indicating that loci that increase BMI tend to consistently increase TG, whereas loci that increase TG do not consistently affect BMI, consistent with the putative model that BMI causally increases TG (see Figure 6).2, 3 We also observe correlations consistent with a model in which years of education (EY) consistently decreases hemoglobin level (HB), LDL, and TG (see Table S2), in line with previous conclusions on the effect of education on health.52, 53 However, we note that education attainment (or other studied traits) may be confounded by other factors such as social status and that one should exercise caution when inferring causality from genetic data. Finally, we also report pairs of traits in which the genetic correlation approach attains different results from bi-directional regression on the top signals.2 For example, when considering body mass index (BMI) and age at menarche (AM), the local correlation approach do not yield different estimates ( 95% CI [−0.63, −0.35] versus 95% CI [−0.59, −0.35]), whereas the approach of Joseph et al.2 suggests a putative causal relation. This discrepancy can be due to different model assumptions, e.g., single causal variant versus allelic heterogeneity, with further investigations needed to assign causality from these data.
Discussion
We have described ρ-HESS, a method to estimate local genetic correlation from GWAS summary association data. Through extensive simulations, we demonstrated that our method is approximately unbiased and provides consistent results irrespective of causal architecture. We analyzed large-scale GWAS summary association data of 36 quantitative traits. Compared with cross-trait LDSC, our methods identified considerably more pairs of traits displaying significant genome-wide genetic correlation likely because of the truncated-SVD regularization of the LD matrix, which decreases the standard error of the estimates. We identify genomic regions that are significantly correlated across pairs of traits regardless of the significance of genome-wide correlation. Finally, we performed bi-directional analyses over the local genetic correlations to identify putative causal relationships, and report local genetic correlations at loci harboring GWAS signal specific to each trait.
We conclude with several limitations highlighting areas for future work. First, our estimator requires phenotype correlation between two traits, as well as the number of shared individuals between the two GWASs. We estimate the phenotype correlation through cross-trait LDSC assuming full sample overlap between GWAS within the same consortium and no sample overlap between GWAS across two consortia. Second, we note that our bi-directional analyses over local genetic correlation can be further extrapolated to infer putative causal models between complex traits. We refrain from making conclusive causal inferences from the bi-directional analyses because exact inference of causal relations is largely complicated by unobserved confounders such as socioeconomic status, population stratification, and/or biological pathways. Furthermore, most of the GWAS summary association data are adjusted for covariates such as age and gender to increase statistical power,54 and previous works have shown that adjusting for covariates can potentially lead to false positives.55 Third, in our real data analyses, we made the assumption that the loci are independent of each other. In reality, however, correlations may exist across adjacent loci due to long-range LD and can lead to biased estimates. Nevertheless, we note that previous works have indicated the effect of LD leakage to be minimal,19, 56 and we conjecture that this statement still holds in estimating local genetic correlation. Lastly, we use truncated-SVD to regularize LD matrix and to reduce standard error in the estimates of local genetic correlation, at the cost of introducing bias. Currently, we use a fixed number of eigenvectors in the truncated-SVD regularization, across all the loci. However, this approach may not be optimal for genomic regions with different LD structure and leave a principled approach of estimating the number of eigenvectors as future work.
Acknowledgments
This research was supported by NIH (United States Public Health Service) grants R01-HG009120, R01-GM053275, and U01-CA194393. We are grateful to Gleb Kichaev, Malika Kumar, Suraj Alva, and James Boocock for their helpful discussions that greatly improved the quality of this manuscript. We also thank Dr. Nicole Soranzo for kindly sharing summary data for the platelet traits.
Published: November 2, 2017
Footnotes
Supplemental Data include 17 figures and 2 tables and can be found with this article online at https://doi.org/10.1016/j.ajhg.2017.09.022.
Appendix A
Quantifying Shared Genetics via Covariance of the Causal Effects
An alternative measure of shared genetics is the covariance of the causal effects ( and ) of the two traits. Under the fixed-effect model, we define covariance of the causal effects, , as the dot product between the causal effect size vectors of the two traits,
(Equation A1) |
Here, we make the assumption that the average effect size of each SNP is 0.
The definition of covariance of the causal effects in Equation A1 coincides with genetic covariance under the random-effect model. As shown in the supplementary data of Bulik-Sullivan et al.,16 if one assumes that and have zero mean and
(Equation A2) |
then it can be shown that the genetic covariance between two traits is
(Equation A3) |
The random-effect model makes the implicit assumption that many SNPs are causal, which is appropriate for genome-wide analysis but not for local analysis, where few SNPs are likely to be causal.
Estimating Covariance of the Causal Effects from GWAS Summary Data
For completeness, we derive an estimator for . We assume a linear model for the two traits (see Material and Methods). The effect size estimates from GWAS, and , follow and , with , where n1 and n2 are the sample size for the two GWASs and ns is the number of shared samples (see Material and Methods).
As the sample size, n1 and n2, of the two GWASs go to infinity, we have and , which implies and , suggesting the following estimator for covariance of the causal effects,
(Equation A4) |
In reality, however, finite sample sizes of GWAS results in noise in the estimates of and , creating bias in the estimate of . From bilinear form theory, it can be shown that
(Equation A5) |
suggesting the unbiased estimator of ,
(Equation A6) |
where the environmental covariance can be estimated through cross-trait LD Score regression.16
Web Resources
European Genome-phenome Archive (EGA) (accession number EGAS00000000132), https://www.ebi.ac.uk/ega
GEFOS consortium, http://www.gefos.org/?q=content/data-release-2015
GIANT consortium, https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files
GLGC consortium, http://csg.sph.umich.edu//abecasis/public/lipids2013/
HESS and ρ-HESS, http://bogdan.bioinformatics.ucla.edu/software/hess/
OMIM, http://www.omim.org/
Psychiatric Genomics Consortium, https://www.med.unc.edu/pgc/acl_users/credentials_cookie_auth/require_login?came_from=http%3A//www.med.unc.edu/pgc/old-pages/downloads
ReproGen, http://www.reprogen.org/
Supplemental Data
References
- 1.Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pickrell J.K., Berisa T., Liu J.Z., Ségurel L., Tung J.Y., Hinds D.A. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mancuso N., Shi H., Goddard P., Kichaev G., Gusev A., Pasaniuc B. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 2017;100:473–487. doi: 10.1016/j.ajhg.2017.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gusev A., Ko A., Shi H., Bhatia G., Chung W., Penninx B.W., Jansen R., de Geus E.J., Boomsma D.I., Wright F.A. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Price A.L., Spencer C.C., Donnelly P. Progress and promise in understanding the genetic basis of common diseases. Proc. Biol. Sci. 2015;282:20151684. doi: 10.1098/rspb.2015.1684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Sheehan N.A., Didelez V., Burton P.R., Tobin M.D. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med. 2008;5:e177. doi: 10.1371/journal.pmed.0050177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Voight B.F., Peloso G.M., Orho-Melander M., Frikke-Schmidt R., Barbalic M., Jensen M.K., Hindy G., Hólm H., Ding E.L., Johnson T. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet. 2012;380:572–580. doi: 10.1016/S0140-6736(12)60312-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lawlor D.A., Harbord R.M., Sterne J.A., Timpson N., Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 2008;27:1133–1163. doi: 10.1002/sim.3034. [DOI] [PubMed] [Google Scholar]
- 9.Davey Smith G., Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 2014;23(R1):R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Smith G.D., Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 11.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee S.H., Yang J., Goddard M.E., Visscher P.M., Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bulik-Sullivan B.K., Loh P.R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Neale M., Cardon L. Volume 67. Springer Science & Business Media; 1992. (Methodology for Genetic Studies of Twins and Families). [Google Scholar]
- 15.Haseman J.K., Elston R.C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 1972;2:3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
- 16.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.R., ReproGen Consortium, Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3. Duncan L., Perry J.R. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pasaniuc B., Price A.L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Shi H., Kichaev G., Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 2016;99:139–153. doi: 10.1016/j.ajhg.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hegmann J.P., Possidente B. Estimating genetic correlations from inbred strains. Behav. Genet. 1981;11:103–114. doi: 10.1007/BF01065621. [DOI] [PubMed] [Google Scholar]
- 21.Carey G. Inference about genetic correlations. Behav. Genet. 1988;18:329–338. doi: 10.1007/BF01260933. [DOI] [PubMed] [Google Scholar]
- 22.Isserlis L. On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika. 1918;12:134–139. [Google Scholar]
- 23.Shayle R. John Wiley & Sons, Inc.; 1971. Searle. Linear models; p. 65. [Google Scholar]
- 24.Hansen P.C. The truncatedsvd as a method for regularization. BIT. 1987;27:534–553. [Google Scholar]
- 25.Efron B. Bayesian inference and the parametric bootstrap. Ann. Appl. Stat. 2012;6:1971–1997. doi: 10.1214/12-AOAS571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Richard A., International HapMap Consortium The international hapmap project. Nature. 2003;426:789–796. doi: 10.1038/nature02168. [DOI] [PubMed] [Google Scholar]
- 28.Perry J.R., Day F., Elks C.E., Sulem P., Thompson D.J., Ferreira T., He C., Chasman D.I., Esko T., Thorleifsson G., Australian Ovarian Cancer Study. GENICA Network. kConFab. LifeLines Cohort Study. InterAct Consortium. Early Growth Genetics (EGG) Consortium Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature. 2014;514:92–97. doi: 10.1038/nature13545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wood A.R., Esko T., Yang J., Vedantam S., Pers T.H., Gustafsson S., Chu A.Y., Estrada K., Luan J., Kutalik Z., Electronic Medical Records and Genomics (eMEMERGEGE) Consortium. MIGen Consortium. PAGEGE Consortium. LifeLines Cohort Study Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 2014;46:1173–1186. doi: 10.1038/ng.3097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Shungin D., Winkler T.W., Croteau-Chonka D.C., Ferreira T., Locke A.E., Mägi R., Strawbridge R.J., Pers T.H., Fischer K., Justice A.E., ADIPOGen Consortium. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GEFOS Consortium. GENIE Consortium. GLGC. ICBP. International Endogene Consortium. LifeLines Cohort Study. MAGIC Investigators. MuTHER Consortium. PAGE Consortium. ReproGen Consortium New genetic loci link adipose and insulin biology to body fat distribution. Nature. 2015;518:187–196. doi: 10.1038/nature14132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.van der Harst P., Zhang W., Mateo Leach I., Rendon A., Verweij N., Sehmi J., Paul D.S., Elling U., Allayee H., Li X. Seventy-five genetic loci influencing the human red blood cell. Nature. 2012;492:369–375. doi: 10.1038/nature11677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gieger C., Radhakrishnan A., Cvejic A., Tang W., Porcu E., Pistis G., Serbanovic-Canic J., Elling U., Goodall A.H., Labrune Y. New gene functions in megakaryopoiesis and platelet formation. Nature. 2011;480:201–208. doi: 10.1038/nature10659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dupuis J., Langenberg C., Prokopenko I., Saxena R., Soranzo N., Jackson A.U., Wheeler E., Glazer N.L., Bouatia-Naji N., Gloyn A.L., DIAGRAM Consortium. GIANT Consortium. Global BPgen Consortium. Anders Hamsten on behalf of Procardis Consortium. MAGIC investigators New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 2010;42:105–116. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Soranzo N., Sanna S., Wheeler E., Gieger C., Radke D., Dupuis J., Bouatia-Naji N., Langenberg C., Prokopenko I., Stolerman E., WTCCC Common variants at 10 genomic loci influence hemoglobin A1(C) levels via glycemic and nonglycemic pathways. Diabetes. 2010;59:3229–3239. doi: 10.2337/db10-0502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Willer C.J., Schmidt E.M., Sengupta S., Peloso G.M., Gustafsson S., Kanoni S., Ganna A., Chen J., Buchkovich M.L., Mora S., Global Lipids Genetics Consortium Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zheng H.F., Forgetta V., Hsu Y.H., Estrada K., Rosello-Diez A., Leo P.J., Dahia C.L., Park-Min K.H., Tobias J.H., Kooperberg C., AOGC Consortium. UK10K Consortium Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature. 2015;526:112–117. doi: 10.1038/nature14878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Okbay A., Beauchamp J.P., Fontana M.A., Lee J.J., Pers T.H., Rietveld C.A., Turley P., Chen G.B., Emilsson V., Meddens S.F., LifeLines Cohort Study Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016;533:539–542. doi: 10.1038/nature17671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Okbay A., Baselmans B.M., De Neve J.E., Turley P., Nivard M.G., Fontana M.A., Meddens S.F., Linnér R.K., Rietveld C.A., Derringer J. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 2016;48:624–633. doi: 10.1038/ng.3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Barban N., Jansen R., de Vlaming R., Vaez A., Mandemakers J.J., Tropf F.C., Shen X., Wilson J.F., Chasman D.I., Nolte I.M., BIOS Consortium. LifeLines Cohort Study Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 2016;48:1462–1472. doi: 10.1038/ng.3698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Horikoshi M., Beaumont R.N., Day F.R., Warrington N.M., Kooijman M.N., Fernandez-Tajes J., Feenstra B., van Zuydam N.R., Gaulton K.J., Grarup N., CHARGE Consortium Hematology Working Group Genome-wide associations for birth weight and correlations with adult disease. Nature. 2016;538:248–252. doi: 10.1038/nature19806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Teumer A., Tin A., Sorice R., Gorski M., Yeo N.C., Chu A.Y., Li M., Li Y., Mijatovic V., Ko Y.A., DCCT/EDIC Genome-wide association studies identify genetic loci associated with albuminuria in diabetes. Diabetes. 2016;65:803–817. doi: 10.2337/db15-1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Eppinga R.N., Hagemeijer Y., Burgess S., Hinds D.A., Stefansson K., Gudbjartsson D.F., van Veldhuisen D.J., Munroe P.B., Verweij N., van der Harst P. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality. Nat. Genet. 2016;48:1557–1563. doi: 10.1038/ng.3708. [DOI] [PubMed] [Google Scholar]
- 44.Köttgen A., Albrecht E., Teumer A., Vitart V., Krumsiek J., Hundertmark C., Pistis G., Ruggiero D., O’Seaghdha C.M., Haller T., LifeLines Cohort Study. CARDIoGRAM Consortium. DIAGRAM Consortium. ICBP Consortium. MAGIC Consortium Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 2013;45:145–154. doi: 10.1038/ng.2500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lu Y., Day F.R., Gustafsson S., Buchkovich M.L., Na J., Bataille V., Cousminer D.L., Dastani Z., Drong A.W., Esko T. New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. Nat. Commun. 2016;7:10495. doi: 10.1038/ncomms10495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pattaro C., Teumer A., Gorski M., Chu A.Y., Li M., Mijatovic V., Garnaas M., Tin A., Sorice R., Li Y., ICBP Consortium. AGEN Consortium. CARDIOGRAM. CHARGe-Heart Failure Group. ECHOGen Consortium Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat. Commun. 2016;7:10023. doi: 10.1038/ncomms10023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Day F.R., Ruth K.S., Thompson D.J., Lunetta K.L., Pervjakova N., Chasman D.I., Stolk L., Finucane H.K., Sulem P., Bulik-Sullivan B., PRACTICAL consortium. kConFab Investigators. AOCS Investigators. Generation Scotland. EPIC-InterAct Consortium. LifeLines Cohort Study Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 2015;47:1294–1303. doi: 10.1038/ng.3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Trynka G., Sandor C., Han B., Xu H., Stranger B.E., Liu X.S., Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 2013;45:124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Getz G.S., Reardon C.A. Apoprotein E as a lipid transport and signaling protein in the blood, liver, and artery wall. J. Lipid Res. 2009;50(Suppl):S156–S161. doi: 10.1194/jlr.R800058-JLR200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pallaud C., Gueguen R., Sass C., Grow M., Cheng S., Siest G., Visvikis S. Genetic influences on lipid metabolism trait variability within the Stanislas Cohort. J. Lipid Res. 2001;42:1879–1890. [PubMed] [Google Scholar]
- 51.Soranzo N., Spector T.D., Mangino M., Kühnel B., Rendon A., Teumer A., Willenborg C., Wright B., Chen L., Li M. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat. Genet. 2009;41:1182–1190. doi: 10.1038/ng.467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Mary A. Silles. The causal effect of education on health: Evidence from the united kingdom. Econ. Educ. Rev. 2009;28:122–128. [Google Scholar]
- 53.Baker D.P., Leon J., Smith Greenaway E.G., Collins J., Movit M. The education effect on population health: a reassessment. Popul. Dev. Rev. 2011;37:307–332. doi: 10.1111/j.1728-4457.2011.00412.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Mefford J., Witte J.S. The covariate’s dilemma. PLoS Genet. 2012;8:e1003096. doi: 10.1371/journal.pgen.1003096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Aschard H., Vilhjálmsson B.J., Joshi A.D., Price A.L., Kraft P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am. J. Hum. Genet. 2015;96:329–339. doi: 10.1016/j.ajhg.2014.12.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Loh P.R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., Schizophrenia Working Group of Psychiatric Genomics Consortium. de Candia T.R., Lee S.H., Wray N.R., Kendler K.S. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47 doi: 10.1038/ng.3431. 1385–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.