Abstract
Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits’ genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (Ntotal 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD’s correlation with cognitive traits and hints at an autoimmune component for ALS.
Keywords: genome-wide association study, genetic covariance, functional annotation, summary statistics, Alzheimer’s disease, amyotrophic lateral sclerosis
Introduction
Genome-wide association studies (GWASs) have been a success in the past 12 years. Despite a simple study design, GWASs have identified tens of thousands of robust associations for a variety of human complex diseases and traits. Based on the GWAS paradigm, linear mixed models, in conjunction with the restricted maximum likelihood (REML) algorithm, have provided great insights into the polygenic genetic architecture of complex traits.1, 2, 3 The cross-trait extension of linear mixed model has further revealed the shared etiology of many different traits.4 Compared to traditional, family-based approaches, these methods do not require all the traits to be measured on the same cohort and therefore make it possible to study a spectrum of human complex traits using independent samples from existing GWASs.5, 6 Recently, Bulik-Sullivan et al. developed cross-trait LDSC, a computationally efficient method that utilizes GWAS summary statistics to estimate genetic correlation between complex traits.7 LDSC is a major advance. As summary statistics from consortium-based GWASs become increasingly accessible,8 it provides great opportunities for systematically documenting the shared genetic basis of a large number of diseases and traits.9, 10 However, large-scale inference sets a high bar for both estimation accuracy and statistical power. Furthermore, existing methods do not allow explicit modeling of functional genome annotations. As shown in later sections, the estimated genetic correlations in many cases are neither statistically significant nor easy to interpret.
To address these challenges, there is a pressing need for a statistical framework that provides more accurate covariance and correlation estimates and allows integration of biologically meaningful functional genome annotations. The method of moments has recently been shown to outperform LDSC in single-trait heritability estimation.11 Integrative analysis of GWAS summary statistics and context-specific functional annotations has provided novel insights into complex disease etiology through a variety of applications.12, 13, 14 In this paper, we introduce GNOVA (genetic covariance analyzer), a principled framework to estimate annotation-stratified genetic covariance using GWAS summary statistics. Through extensive numerical simulations, integrative analysis of 50 complex traits, and an in-depth case study on late-onset Alzheimer disease (LOAD [MIM: 104300]) and amyotrophic lateral sclerosis (ALS [MIM: 105400]), we demonstrate that GNOVA provides accurate covariance estimates and powerful statistical inference that are robust to linkage disequilibrium (LD) and sample overlap. Furthermore, we show that annotation-stratified analysis enhances the interpretability of genetic covariance and provides novel insights into the shared genetic basis of complex traits.
Material and Methods
Statistical Model
Here we outline the genetic covariance estimation framework. The complete derivation, detailed justification for all approximations, and theoretical proofs are presented in Appendix A. In short, the genetic covariance that we aim to estimate is the covariance between the genetic effects of a group of single nucleotide polymorphisms (SNPs) on two complex traits. When functional genome annotations are present, we allow such covariance to vary in different annotation categories. Specifically, we define K functional annotations S1, S2, ..., SK (e.g., protein-coding genes and non-coding regions), whose union covers the entire genome; assume two studies share the same list of m SNPs; and assume two standardized traits y1 and y2 follow the linear models below:
where Xi and Zi denote the standardized genotype matrices defined through annotation Si. Random effects terms βi and γi denote the corresponding genetic effects for each annotation category. SNPs’ genetic effects on two traits follow an annotation-dependent covariance structure:
where mi and ρi denote the total number of SNPs and the total genetic covariance in annotation category Si, respectively. Random variables and δ denote the non-genetic effects. Of note, this notation implicitly assumes the genetic covariance to follow an additive structure in regions where functional annotations overlap.
In practice, two different GWASs often share a subset of samples. Without loss of generality, we assume N1 and N2 to be the sample sizes of two studies and the first Ns samples in each study are shared. To account for the non-genetic correlation introduced by sample overlapping, we allow random error terms and δ to be correlated:
We note that our model does not require any additional assumption on the heritability structure of either trait.
Estimation of Covariance Parameters via the Method of Moments
To estimate genetic covariance parameters (i.e., ), we developed an analysis framework based on the method of moments. First, we derive equations that relate the population moments to the parameters of interest. For an arbitrary N1 × N2 matrix A, we study the expectation of . It can be shown that
Here, quantity Att denotes the tth diagonal element of matrix A. Since there are K+1 parameters in total in the model (K genetic covariance parameters and ρe), we build a linear system of K+1 equations by plugging in K+1 different matrices A1,...,AK+1 into the equation above. Further, we approximate using the sample moments, i.e., the observed value , and get the following equation:
Solving this linear system of K+1 equations would get us the method of moments estimators for genetic covariance.
Choices of Matrix A
The method of moments estimation procedure described above works for arbitrary A matrices. However, it is critical and non-trivial to choose A in practice. Since individual-level genotype and phenotype data from consortium-based GWASs are in many cases difficult to access, it is of practical interest to estimate genetic covariance based on summary statistics only. To achieve this goal, we define the first K matrices as:
Plugging in these matrices, the first K equations become:
The equality is based on the property of trace and the fact that first Ns samples are shared between two studies. These equations can be further approximated by (Appendix A):
Here, denotes the LD between the lth SNP from category Si and the th SNP from category Sj; z1 and z2 denote the z-scores of SNP-level associations from two GWASs; and and represent subsets of z-scores corresponding to the SNPs in annotation category Sj. LD can be estimated using an external reference panel. However, if samples in two studies have different ancestries, and need to be estimated separately using two reference panels. When such reference panels do not exist, individual-level genotype data for a subset of study samples may be needed.
Next, we study the (K+1)th equation. We define:
Divide N1N2 on both sides of the (K+1)th equation, and we get:
Since ρ1, …, ρΚ are the parameters of interest, we subtract the (K+1)th equation from the first K equations and remove ρΚ+1 from the linear system. We denote the remaining K equations in matrix form:
When the sample sizes of both GWASs are large and the sample overlap between two studies is moderate, the K equations can be approximated by:
We define
Then, the point estimate of covariance parameters can be denoted as
Importantly, M can be estimated using a reference panel (e.g., 1000 Genomes Project15) and v is based only on GWAS summary statistics. Of note, the same estimation framework can be directly applied to ascertained case-control studies as well (Appendix A).
Special Cases
Two Independent GWASs
If samples from two GWASs do not overlap, then the non-genetic effects and δ are independent and only K equations are needed for estimating covariance parameters. We still define for j = 1,…,K. That gives us the same covariance estimator:
No Annotation Stratification
If no functional annotation is present, it can be shown that
Here, is the average product of z-scores from two GWASs; is the average LD across all SNP pairs in the study. Under the non-stratified scenario, this estimator can be seen as a two-trait extension of the heritability estimator proposed in Bulik-Sullivan.16
Two GWASs with Substantial Sample Overlap
If the two GWASs have substantial sample overlap, some approximations we have applied in previous sections would fail (Appendix A). The problem gets down to solving the following equations:
Therefore,
where the phenotypic correlation can be either acquired from the literature or estimated using computational methods7, 17, 18 (Appendix A).
Remarks on Overlapping Functional Annotations
When functional annotations overlap, the covariance parameter ρ is not the real quantity of interest. Instead, the total covariance in each annotation category is more biologically meaningful and can be estimated using the weighted estimator
where W is a K × K matrix with element
Here, denotes the number of SNPs in region .
Theoretical Properties
In this section, we establish the statistical optimality of our estimator by showing that it is “almost” the unbiased estimator with minimum variance. Here we state all the propositions (see Appendix A for detailed proofs). Assume y1 and y2 follow a multivariate normal distribution:
We begin with calculating the variance of the quadratic form-like quantity .
Proposition 1. Let A be an N1 × N2 matrix. Then .
It can be shown that the second part, i.e., , is very small compared to the first term in real GWAS data (Appendix A):
With this in mind, the following claim is approximately true:
Next, we define a matrix and show that minimizes tr(ATH1AH2) under some conditions. Based on the argument above, “almost” minimizes too.
Proposition 2. Assume two GWASs do not share samples. We define the following quantities.
-
(i)
Let be an arbitrarily given K-dimensional vector;
-
(ii)
Let S be a K × K symmetric matrix with element for ;
-
(iii)
Let be a vector such that ;
-
(iv)
Define
Then, we have:
-
(1)
;
-
(2)
Let be a matrix such that . Then, .
Proposition 2 tells us that given arbitrary p = (p1,…pK)T, if such that Sλ = p, then is an unbiased estimator for . Furthermore, among all unbiased estimators with the form , has the minimum value of , hence “almost” the minimum variance . Interestingly, by carefully choosing p and λ, we can let equal the matrix we have been using throughout the paper. Therefore, we have the following corollary.
Corollary 1. We assume:
-
(i)
Two GWASs do not overlap;
-
(ii)
The samples in each study are completely independent;
-
(iii)
True LD in both studies (i.e., ZTZ and XTX) is known.
Consider all matrices A that suffice
We define
Then, with has the lowest variance.
Similarly, we could extend these results to annotation-stratified scenarios (Appendix A). These results show that although we initially defined for the purpose of simplifying calculation, the derived covariance estimator actually enjoys some good theoretical properties.
Variance Estimation via Block-wise Jackknife
Following previous work,7 we apply a block-wise jackknife approach to estimate the variance. We divide the genome into b (e.g., b = 200) blocks B1, …, Bb. Let
Here, subscript indicates the subset of SNPs in both functional annotation Si and block Bt. Then, Cov(v) is estimated as:
Therefore, we get
If annotations overlap,
Finally, the test statistic for each covariance parameter is
When annotations overlap,
Genetic Correlation
We provide genetic correlation estimates for non-stratified analysis:
We use the estimator proposed in Bulik-Sullivan16 to estimate heritability for each trait:
When functional annotations are present, the true heritability in each annotation category may be small. Although methods for estimating annotation-stratified heritability have been proposed,11, 12 they may provide unstable, sometimes even negative, heritability estimates, especially when a number of annotation categories are related to the repressed genome. When true heritability is low, variability in the denominator will have great impact on genetic correlation estimates. Therefore, we use genetic covariance as a more robust metric when performing annotation-stratified analysis.
Simulation Settings
We simulated quantitative traits using real genotype data from the WTCCC1 cohort. We removed individuals with genetic relatedness coefficient greater than 0.05 and filtered SNPs with missing rate above 1% and/or MAF lower than 5% in samples with European ancestry from the 1000 Genomes Project.15 In addition, we removed all the strand-ambiguous SNPs. After quality control, 15,918 samples and 254,221 SNPs remained in the dataset. Each simulation setting was repeated 100 times.
Setting 1
We equally divided 15,918 samples into two sub-cohorts. We simulated two traits using genetic effects sampled from an infinitesimal model.
Heritability for both traits was set as 0.5. We set the genetic covariance to be 0, 0.05, 0.1, 0.15, 0.2, and 0.25.
Setting 2
Instead of fixing the heritability, we assumed only that the heritability for both traits was equal. Genetic correlation was fixed as 0.2. We set the genetic covariance to be 0.05, 0.1, 0.15, and 0.2 and chose heritability value accordingly.
Setting 3
We simulated two traits on the same sub-cohort of 7,959 samples. Heritability was fixed as 0.5 for both traits. We set the genetic covariance to be 0, 0.05, 0.1, 0.15, 0.2, and 0.25. Sample overlap correction was applied to estimate genetic covariance.
Setting 4
We randomly partitioned the genome into two annotation categories of the same size. We set the heritability for both traits to be 0.5, and the heritability structure does not depend on functional annotations. Genetic covariance in the first annotation was set to be 0, 0.05, 0.1, 0.15, and 0.2. Genetic effects for two traits are not correlated in the second annotation category.
Setting 5
We randomly partitioned the genome into three categories of the same size. Define annotation-1 to be the union of the first and the second categories, and let annotation-2 be the union of the second and the third categories. We set the heritability for both traits to be 0.5, and the heritability structure does not depend on functional annotations. Genetic covariance parameter for annotation-1 (i.e., ρ1) is set to be 0.1. We set ρ2 to be −0.2, −0.1, 0, and 0.1. The genetic covariance in regions where two annotations overlap follows an additive structure. For example, when ρ1 = 0.1 and ρ2 = 0.2, the total covariance in annotation-1 is
Similarly, the total covariance in annotation-2 is
GWAS Data Analysis
Details of 48 GWASs and the URLs for summary statistics files are summarized in Table S1. For each summary statistics dataset, we applied the same quality-control steps described in Bulik-Sullivan et al.7 using the munge_sumstats.py script in LDSC. In addition, we removed all the strand-ambiguous SNPs from each dataset. For each pair of complex traits, we took the overlapped SNPs between two summary statistics files, matched the effect alleles, and removed SNPs with MAF below 5% in the 1000 Genomes Project phase III samples with European ancestry. SNPs on sex chromosomes were also removed from the analysis. We then applied the GNOVA framework to the remaining SNPs to estimate genetic covariance. Sample overlap correction was applied when two GWASs have a large sample overlap. When calculating genetic correlation between ALS and other traits, we used previously reported 0.085 as the heritability of ALS due to negative heritability estimates.19
Annotation Data
GenoCanyon and GenoSkyline functional annotations, as previously reported,14, 20, 21 integrate various types of transcriptomic and epigenomic data from ENCODE22 and Roadmap Epigenomics Project23 to predict functional DNA regions in the human genome. GenoCanyon utilizes an unsupervised learning framework to identify non-tissue-specific functional regions. GenoSkyline and GenoSkyline-Plus further extended this framework to identify tissue- and cell type-specific functionality in the human genome. We applied GenoSkyline-Plus annotations for seven broadly defined tissue categories (i.e., brain, cardiovascular, epithelium, gastrointestinal, immune, muscle, and other) to stratify genetic covariance by tissue type. When integrating these annotations in GNOVA, we also included the whole genome as an annotation category to guarantee that the union of all annotations covers the genome. The whole genome was not added as an additional annotation track in analyses or simulations when the functional annotations covered all SNPs in the dataset. The MAF quartiles were calculated using the genotype data of phase III samples with European ancestry from the 1000 Genomes Project after filtering SNPs with MAF below 5%.
LD Score Regression Implementation
We implemented cross-trait LD score regression using the LDSC software package. For the purpose of fair comparison, we ran LD score regression on all SNPs in the dataset in the simulation studies. When analyzing real GWAS data, we followed the protocol suggested in Bulik-Sullivan et al.7 and used HAPMAP3 SNPs. LD scores were estimated using phase I samples with European ancestry in the 1000 Genomes Project.
Ethical Statement
Procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation. Proper informed consent was obtained when needed.
Results
Simulations
We simulated two traits using genotype data from the Wellcome Trust Case Control Consortium (WTCCC) while assuming a correlated genetic covariance structure. Detailed simulation settings are described in the Material and Methods. Since LDSC cannot estimate annotation-stratified genetic covariance, we compared GNOVA and LDSC using data simulated from a non-stratified, infinitesimal genetic covariance structure (Figures 1A–1D). Both methods provided unbiased covariance estimates, but GNOVA estimator had consistently lower variance across all simulation settings. The same pattern could be observed for genetic correlation estimates (Figure S1). Neither method showed inflated type I error when the true covariance is 0. When comparing the frequencies of rejecting the null hypothesis, GNOVA is nearly twice as powerful as LDSC when the true genetic covariance is below 0.1. To evaluate GNOVA’s robustness against sample overlap, we simulated two traits using genotype data of the same cohort. After applying sample overlap correction, GNOVA still outperformed LDSC, showing higher estimation accuracy and statistical power (Figure S2).
Next, we investigated GNOVA’s capability to estimate annotation-stratified genetic covariance. We randomly partitioned the genome into two non-overlapping annotation categories and simulated two traits using annotation-dependent genetic covariance (Material and Methods). GNOVA provided unbiased estimates for the genetic covariance in each category across all settings (Figures 1E and 1F). Of note, type I error was well controlled in the annotation category without genetic covariance even when the true covariance in the other annotation category was non-zero, suggesting GNOVA’s robustness under the influence of LD. Furthermore, when functional annotations overlapped, our method still provided accurate covariance estimates and powerful inference (Figures 1G and 1H).
Estimation of Pairwise Genetic Correlation for 48 Human Complex Traits
We applied GNOVA to estimate genetic correlations for 48 complex traits using publicly available GWAS summary statistics (Ntotal 4.5 million). Trait acronyms and other details of all GWASs are summarized in Table S1. Out of 1,128 pairs of traits in total, we identified 176 pairs with statistically significant genetic correlation after Bonferroni correction (Table S2 and Figure S3). We also applied LDSC to the same datasets and identified only 127 significant pairs (Table S3 and Figure S4). A total of 52 significantly correlated trait pairs were uniquely identified by GNOVA while only 3 trait pairs were uniquely identified using LDSC. Overall, the genetic correlations estimated using GNOVA and LDSC are concordant (Figure 2). Consistent with our simulation results, GNOVA is more powerful when genetic correlation is moderate.
To evaluate model validity, we examined correlations between several traits that are closely related either physiologically or epidemiologically (Table S4). As expected, femoral and lumbar bone mineral density (FNBMD and LSBMD) and depressive symptoms (DEP) and major depressive disorder (MDD [MIM: 608516]) showed strong positive genetic correlations. We also observed negative correlations between subjective well-being (SWB) and neuropsychiatric disorders such as schizophrenia (MIM: 181500), anxiety (MIM: 607834), two depression traits (DEP and MDD), and neuroticism.
We further examined pairwise correlations between 48 traits (Figures 3 and S3). Following hierarchical clustering, broad patterns suggesting disease relatedness emerged. These results are well documented in the literature: neuropsychiatric conditions, metabolic diseases, and gastrointestinal inflammatory disorders clustered together with positive correlations within each individual cluster. We replicated several previous genetic correlation findings,7 including significant correlations of adult height (HGT) with coronary artery disease (CAD [MIM: 608320]) and age at menarche (AM), and of years of education (EDU) with CAD, bipolar disorder (BIP), body-mass index (BMI), triglycerides, and smoking status (SMK). Furthermore, two previous results that passed multiple correction testing at only 1% FDR passed Bonferroni correction in our analysis; namely, we observed a statistically significant negative correlation between AM and CAD and a positive correlation between autism (ASD [MIM: 209850]) and EDU.
We also identified a number of genetic correlations that are consistent with the genetic relationships reported in the previous literature. For example, previous genetic correlation analyses identified a negative correlation between anorexia nervosa (AN [MIM: 606788]) and obesity, a result we also observed.7 In addition, we found negative correlations of AN with glucose and triglyceride levels, as well as a positive correlation with high-density lipoprotein (HDL). These results provide further support for existing hypotheses proposing an underlying neural, rather than metabolic, etiology for metabolic syndrome.12, 21, 24 We see an unsurprising positive correlation between glucose and insulin levels, which is consistent with our understanding of diabetes.25 Positive correlations between multiple sclerosis (MS [MIM: 126200]) and Crohn disease (CD) and more generally, inflammatory bowel disease (IBD [MIM: 266600]), agree with existing reports of shared susceptibility for these diseases.26, 27, 28 We demonstrate a positive correlation between asthma (MIM: 600807) and eczema (MIM: 603165), which share numerous loci identified in previous GWASs.29 We also reproduced recent findings linking bone mineral density with metabolic dysfunction with positive correlations between FNBMD and both glucose and type II diabetes (T2D [MIM: 125853]).30 Interestingly, however, we did not see significant correlations of bone mineral density with cardiovascular diseases. Among neuropsychiatric disorders, we identified positive correlations between BIP and both depression and neuroticism. Associations between neuroticism and depression are well documented. Neuroticism is highly comorbid with MDD,31, 32 and our findings are consistent with previously observed genetic pleiotropy among neuroticism, MDD, BIP, and schizophrenia.33, 34
Especially notable are findings that suggest a genetic basis for associations between traits regarding which the literature is either equivocal or absent, and which provide useful information to guide further study. For example, we observed correlations of serum urate (SU) with AM (−0.12), T2D (0.275), and triglycerides (0.38), and we consistently observed associations of SU and markers of metabolic syndrome. In the literature, the genetic architecture of this association has not been extensively studied.35 Alleles in IRF8 (MIM: 601565), a regulatory factor of type I interferons, are associated with MS and systemic lupus erythematosus (SLE [MIM: 152700]), but with opposite effect; high type I IFN titers are thought to be causal in SLE but are lower in MS relative to healthy controls.36 In this analysis, however, we found a positive correlation between MS and SLE. We also draw attention to the significant negative correlation between MS and ASD. This replicates a previous genetic association between MS and ASD, with more recent evidence suggesting shared biomedical markers, such as increase in concentrations of tumor necrosis factor-alpha (TNF-α) in serum in ASD and in cerebrospinal fluid in MS.37, 38 However, previous treatment of MS with anti-TNF-α led to an increase in the number of demyelinating lesions and a significantly higher relapse rate.39 Furthermore, we observed a positive genetic correlation between ulcerative colitis (UC) and primary billary cirrhosis (PBC [MIM: 109720]). CD, also an IBD and thus closely related, has been reported to share susceptibility genes with PBC including TNFSF15 (MIM: 604052), ICOSLG (MIM: 605717), and CXCR5 (MIM: 601613).40 Here we show that ulcerative colitis may also be genetically related to PBC.
Stratification of Genetic Covariance by Functional Annotation
In this section, we apply functional annotations to further dissect the shared genetic architecture of 48 complex traits. We have previously developed GenoCanyon, a statistical framework to predict functional DNA elements in the human genome through integration of annotation data.20 We partitioned the genome into two non-overlapping categories (i.e., functional and non-functional) based on GenoCanyon scores (Material and Methods) and estimated genetic covariance within the functional and the non-functional genome for each pair of traits (Table S5). The total genetic covariance estimated using the stratified model is highly concordant with covariance estimated using the non-stratified model (Figure 4A). However, genetic covariance is enriched in the predicted functional genome for most traits (Figure 4B). Based on this approach, we identified one more pair of correlated traits, i.e., low-density lipoprotein (LDL) and total cholesterol (TC), whose genetic covariance largely concentrated in the predicted functional genome and achieved significance (ρfunc = 0.060; p = 1.0 × 10−6) while the overall covariance did not (ρoverall = 0.062; p = 7.7 × 10−5).
Next, we partitioned genetic covariance based on quartiles of SNPs’ minor allele frequencies (MAFs) in subjects with European ancestry from the 1000 Genomes Project (Material and Methods; Table S6). Similar to the previous analysis, we identified high concordance between the total covariance estimated using MAF-stratified model and the covariance estimates based on non-stratified model (Figure 4C). Overall, the estimated genetic covariance in four MAF quartiles was comparable (Figure S5). However, we identified three pairs of traits that are uniquely correlated in the lowest MAF quartile (Figure 4D), namely asthma with chronic kidney disease (CKD; p = 1.8 × 10−5), gout (MIM: 138900) with CKD (p = 4.2 × 10−8), and asthma with gout (p = 4.4 × 10−5). For several trait pairs, covariance in the lowest MAF quartile showed reversed direction compared to other quartiles. Covariance between CKD and gout even showed reversed direction compared to the estimated total covariance, highlighting the distinction in how common and less common variants are involved in the shared genetic architecture between these traits. Our findings also hint at the possible selection pressure on DNA variations contributing to metabolic traits including CKD and gout, as well as immune diseases including asthma.
Finally, we studied tissue specificity of genetic covariance through integration of GenoSkyline-Plus annotations (Material and Methods). GenoSkyline-Plus integrates multiple epigenomic and transcriptomic annotations from the Roadmap Epigenomics Project to identify tissue- and cell type-specific functional regions in the human genome.14 We utilized seven broadly defined tissue and cell types (i.e., brain, cardiovascular, epithelium, gastrointestinal, immune, muscle, and other) to stratify genetic covariance for 1,128 pairs of traits (Table S7). Six tests from four pairs of traits passed Bonferroni correction, i.e., p < 0.05/(1,128 × 7) = 6.3 × 10−6 (Figures 4E and S6). As expected, UC, as an IBD, was significantly and positively correlated with IBD in immune-related functional genome (p = 2.0 × 10−6), and two psychiatric diseases, BIP and schizophrenia, were specifically correlated in the genome predicted to be functional in brain (p = 8.7 × 10−8). In addition, we identified cognitive function (COG) and EDU, and birth weight (BW) and HGT to be significantly correlated in both brain- and immune-related functional genome. Of note, since the sizes of functional annotations are linked to statistical power, p values here should not be interpreted as reflecting the importance of each tissue. Some tissues may be critically involved in the etiology of analyzed traits even if they may have p values that are not statistically significant. For example, IBD and UC were substantially correlated in the gastrointestinal tract (p = 3.7 × 10−4). Many of these tests may become significant in the near future as GWASs with larger sample sizes are published.
Dissection of Shared and Distinct Genetic Architecture between LOAD and ALS
LOAD and ALS are neurodegenerative diseases. Despite success of large-scale GWASs,19, 41 our understanding of their genetic architecture is still far from complete. We applied GNOVA to dissect the genetic covariance between LOAD and ALS using publicly available GWAS summary statistics (NLOAD = 54,162; NALS = 36,052; Table S8).
We identified positive and significant genetic correlation between LOAD and ALS (correlation = 0.175, p = 2.0 × 10−4). LDSC provided similar estimates but failed to achieve significance (Table 1). 82.6% of the total genetic covariance between LOAD and ALS is concentrated in 33% of the genome predicted to be functional by GenoCanyon (p = 8.2 × 10−5). Furthermore, MAF-stratified analysis showed that 54.6% of the covariance could be explained by the SNPs in the highest MAF quartile (p = 0.005). In fact, genetic covariance is lower with lower MAF, and covariance in the lowest MAF quartile is nearly negligible. This is surprising considering that the heritability of ALS is enriched in variants with lower MAF.19 We also performed tissue-stratified analysis using GenoSkyline-Plus annotations (Table S9). No tissue passed the significance threshold after multiple testing correction, but covariance is more concentrated in immune, brain, and cardiovascular functional genome, and showed nominal significance in the immune annotation track (p = 0.014). Whether this will lead to a potential neuroinflammation pathway shared between LOAD and ALS remains to be studied in the future using larger datasets.
Table 1.
Annotation | Category | Covariance | p Value |
---|---|---|---|
Non-stratified | GNOVA | 0.016 (0.004) | ∗2.0 × 10−4 |
LDSC | 0.012 (0.007) | 0.075a | |
GenoCanyon | functional | 0.016 (0.004) | ∗8.2 × 10−5 |
non-functional | 0.003 (0.004) | 0.377 | |
MAF | Q1 | −0.001 (0.003) | 0.842 |
Q2 | 0.003 (0.004) | 0.361 | |
Q3 | 0.004 (0.004) | 0.327 | |
Q4 | 0.008 (0.003) | ∗0.005 |
Numbers in parentheses indicate standard errors. Significant p values after adjusting for multiple testing within each section are indicated by an asterisk (∗).
p value in LDSC was calculated from genetic correlation instead of genetic covariance.
Next, we stratified genetic covariance between LOAD and ALS by chromosome. Somewhat surprisingly, we did not observe a linear relationship between per-chromosome genetic covariance and chromosome size (Figure 5A) given that the overall genetic covariance is positive and significant. Since we have observed the concentration of genetic covariance in the functional genome, we further partitioned each chromosome by genome functionality. We identified a clear and positive linear relationship between genetic covariance in the functional genome and the size of predicted functional DNA on each chromosome (Figure 5B). The correlation between per-chromosome genetic covariance in the non-functional genome and the size of non-functional chromosome is negative and significantly smaller than the corresponding quantity in the functional genome (Figure S7; p = 0.044; tested using Fisher transformation). Our findings suggest a polygenic covariance architecture between LOAD and ALS and highlight the importance of stratifying genetic covariance by functional annotation.
Finally, we jointly analyzed LOAD, ALS, and 48 other complex traits (Table S10). Interestingly, LOAD and ALS showed distinct patterns of genetic correlations with other complex traits (Figure 6). We identified negative and significant correlations between LOAD and cognitive traits including COG and EDU. HGT and age at first birth (AFB), two traits related to hormonal regulation as well as socio-economic status, were also significantly and negatively correlated with LOAD. Consistent with previous reports, we did not identify substantial correlation between LOAD and other neurological and/or psychiatric diseases.7, 9 We identified negative correlations between LOAD and gastrointestinal inflammatory diseases including a significant correlation with PBC. Asthma and eczema were both positively correlated with LOAD, suggesting a complex genetic relationship between LOAD and different immune-related diseases. Although some of these traits had the same correlation direction with ALS, none of them were significant. Instead, ALS was significantly and positively correlated with MS, a neurological disease with a well-established immune component.42 ALS was also positively correlated with several other immune-related diseases including celiac disease (CEL [MIM: 212750]), asthma, PBC, and IBD (including CD and UC), though none of these were statistically significant. The nominal correlations between ALS and neurological and psychiatric diseases including epilepsy, schizophrenia, BIP, AN, and MDD also remain to be validated in the future using studies with larger sample sizes.
Discussion
Although our understanding of complex disease etiology is still far from complete, we have gained valuable knowledge about the genetic architecture of numerous complex traits from large-scale association studies, partly due to advances in statistical genetics. First, a large proportion of trait heritability can be explained by SNPs that do not pass the Bonferroni-corrected significance threshold.1 Therefore, it is often helpful to utilize genome-wide data instead of focusing only on significant SNPs in post-GWAS analyses. Second, sample size is critical for many statistical genetics applications. However, individual-level genotype and phenotype data from consortium-based GWASs are not always easily accessible due to policy and privacy concerns. Thanks to the great efforts from large international collaborations such as the Psychiatric Genomics Consortium in promoting open science and data sharing, it has become a tradition for GWAS consortia to share summary statistics to the broader scientific community. Therefore, it is of practical interest to use GWAS summary statistics as the input of downstream analytical methods.8 Finally, integration of high-throughput transcriptomic and epigenomic annotation data has been shown to improve statistical power as well as interpretability in many recent complex trait studies.12, 13, 14 As large consortia such as ENCODE22 and Roadmap Epigenomics Project23 continue to expand, integrative approaches based on functional genome annotations will become an even greater success. In this paper, we developed a novel method to estimate and partition genetic covariance between complex traits. Our method enjoys all the aforementioned advantages. It requires only genome-wide summary statistics and a reference panel as input and allows stratification of genetic covariance by functional genome annotation, which provides novel insights into the shared genetic basis between complex traits and, in some cases, improves the statistical power.
Numerous studies have hinted at a shared genetic basis among neurodegenerative diseases.43, 44 Due to the convenience and efficiency of LDSC and the wide accessibility of GWAS summary statistics, several attempts have been made to estimate genetic correlation between neurodegenerative diseases.9, 45 To date, these efforts have not been as successful as similar studies on psychiatric diseases and immune-related traits. One reason is that existing methods may not be statistically powerful enough to identify moderate genetic correlation using GWASs with limited sample sizes. In addition, the shared genetics among neurodegenerative diseases may not fit the global, infinitesimal covariance structure that most existing tools are based on. In this study, we applied GNOVA to dissect the genetic covariance between LOAD and ALS, two major neurodegenerative diseases, using summary statistics from the largest available GWASs. Our findings suggest that covariance between LOAD and ALS is concentrated in the predicted functional genome and in very common SNPs. Moreover, after applying functional annotations to stratify the genome, estimated per-chromosome genetic covariance is proportional to chromosome size, suggesting a shared polygenetic architecture between LOAD and ALS and also demonstrating the importance of incorporating predicted genetic activity with GenoCanyon. In addition, joint analysis with 50 complex traits also revealed distinctive genetic covariance profiles for LOAD and ALS. LOAD is negatively correlated with multiple traits related to cognitive function and hormonal regulation, while ALS is positively correlated with MS and a few other immune-related traits. Our findings provided novel insights into the shared and distinct genetic architecture between LOAD and ALS and also further demonstrated the benefits of incorporating functional genome annotations into genetic covariance analysis.
Also of note are findings involving serum urate. SU was positively correlated with gout but also with a few metabolic traits. Gout is an arthritic inflammatory process caused by deposition of uric acid crystals in joints, and the role of hyperuricemia in gout is well established. More recently, a role for hyperuricemia in the pathophysiology of metabolic syndrome and CKD has been suggested.46 While associations between hyperuricemia and cardiovascular disease are well described,47 multiple hypotheses exist regarding details of its involvement.48 For example, hyperuricemia may lead to inflammation in the kidney through vascular smooth muscle proliferation, inducing hypertension via pre-glomerular vascular changes.49 It has also been shown to induce oxidative stress in various settings; in adipocytes and islet cells, this may be involved in development of diabetes, and it may also result in impaired endothelin function and activation of the renin-angiotensin-aldosterone system, leading to hypertension.50, 51, 52, 53 Despite this evidence, genetic investigations have not identified a strong relationship between hyperuricemia and metabolic syndrome. Polymorphism in gene SLC22A12 (MIM: 607096) was associated with hyperuricemia but not with metabolic syndrome.54 Mendelian randomization studies showed an association between uric acid and gout but did not find an association with T2D or cardiovascular risk factors such as hypertension, glucose, or CAD.55, 56 Our results suggest that GNOVA successfully isolated a signal of biological and clinical significance that provides important impetus for further inquiry in the etiology of metabolic syndrome.
Dissecting relationships among complex traits is a major goal in human genetics research. Genetic covariance is a useful metric to quantify such relationships, but it has its limitations. First, genetic covariance implicitly imposes a strong assumption on the shared genetic basis between complex traits. Not only may the same set of genetic components affect multiple traits, but their effect sizes on both traits are also assumed to be proportional. In the future, it is of interest to extend our method to estimate more generalized metrics, e.g., consistency in effect directions. Second, genetic covariance analysis does not highlight specific DNA segments with pleiotropic effects. Several SNP-based methods have been developed to identify pleiotropic associations using GWAS summary statistics.57, 58 However, due to the large number of SNPs in the genome, statistical power is a critical issue and large-scale inference remains challenging. In addition, we have demonstrated that integrating functional annotations into genetic covariance analysis could reveal subtle structures in shared genetics between complex traits, but interpretation of genetic covariance remains a challenge. Pickrell et al. recently proposed an approach to distinguishing causal relationships among traits from pleiotropic effects via independent biological pathways.59 Han et al. developed a method to distinguish pleiotropy from phenotypic heterogeneity.60 Although many questions remain unanswered, these recent studies have broadened our view on interpreting complex genetic relationships between human traits. Further, statistical power in genetic covariance analysis will be reduced if the shared genetic components have discordant effect directions on different traits. This problem can be partly addressed by the aforementioned SNP-based methods. Recently, Shi et al. developed a method to estimate local heritability and genetic correlation.61, 62 This approach provides an alternative methodological option for analyzing genetic effects at specific loci. Finally, we note that common SNPs in GWASs do not fully explain phenotypic similarity. For example, the estimated genetic covariance among lipid traits explains only 10%–15% of their phenotypic covariance available on LD Hub.10 Other factors such as rare variants, copy-number variations, and environmental factors may have substantial contributions to the phenotypic covariance among complex traits. Dissection of these complex relationships will be an interesting topic to pursue in the future. Our method, in conjunction with many other tools, provides the most complete picture to date about shared genetics between complex phenotypes.
In summary, we developed GNOVA, a novel statistical framework to perform powerful, annotation-stratified genetic covariance analysis using GWAS summary statistics. Through theoretical proof, we have established GNOVA’s statistical optimality within the framework of method of moments. Compared to LD score regression, GNOVA provides more accurate genetic covariance estimates and powerful statistical inference. Its unique feature of performing annotation-stratified analysis also adds depth to existing analysis strategies. Using GNOVA, we were able to expand the discovery of genetic covariance among a spectrum of common diseases and complex traits. Our findings shed light onto the shared and distinct genetic architecture of complex traits. As the sample sizes in genetic association studies continue to grow, our method has the potential to continue identifying shared genetic components and providing novel insights into the etiology of complex diseases.
Published: December 7, 2017
Footnotes
Supplemental Data include 10 figures, 10 tables, and Supplemental Acknowledgments and can be found with this article online at https://doi.org/10.1016/j.ajhg.2017.11.001.
Appendix A
Model Details
We begin with introducing a general scenario. Assume two standardized traits y1 and y2 follow a linear model:
Matrices X and Z denote the standardized genotype information for two GWASs. To simplify the algebra, we assume both the genotypes (X and Z) and phenotypes (y1 and y2) are standardized. We define K possibly overlapping functional annotations S1, S2, …, SK. All together, these annotations cover the entire genome. We assume two studies share the same list of m SNPs. Vectors and are random effect terms that quantify the genetic effects on traits y1 and y2, respectively. Variables and denote the non-genetic effects. Genetic and non-genetic effects on the same trait are assumed to be independent. A SNP’s genetic effects on two different traits can be correlated. The genetic covariance depends on functional annotations and follows an additive structure in regions where functional annotations overlap. Specifically, we have
where mc denotes the total number of SNPs in annotation Sc. Notation indicates that the jth SNP is located in functional annotation Sc. If we use Xi and Zi to denote the genotype matrices within annotation Si (some SNPs may be counted multiple times if the functional annotations overlap) and use and to denote the corresponding genetic effects, the model can be equivalently re-written as follows:
In practice, two different GWASs often share a subset of samples. Without loss of generality, we assume N1 and N2 to be the sample sizes of two studies and the first NS samples in each study are shared. Therefore, the first NS rows of matrices Xi and Zi (i = 1, …, K) are identical. To account for the non-genetic correlation introduced by sample overlapping, we allow random error terms and to be correlated.
To summarize, this framework explicitly models the annotation-stratified genetic covariance in the genome. It also allows functional annotations to overlap, which is important when applied to real-world annotation data. Furthermore, we take the sample overlap between different GWASs into account. Finally, our model does not require any additional assumption on the heritability structure. In following sections, we discuss how to estimate covariance parameters ρ1, …, ρK.
Estimate Covariance Parameters
First, for an arbitrary N1 × N2 matrix A, we study the expectation of .
Here, quantity Att denotes the tth diagonal element of matrix A. To estimate the covariance parameters, we plug in K+1 different matrices ,..., into the equation above. Next, we apply method of moments to approximate using the observed value . After these steps, we get the following equations:
Solving this linear system of K+1 equations would get us a set of point estimates for covariance parameters. We discuss the details in the following section.
Choose Matrix A
The estimation approach described the previous section works for an arbitrary set of A matrices. So how do we properly choose them in practice? We begin with solving a practical issue. Processing large-scale GWASs requires a substantial amount of resource for both computation and data storage. Moreover, individual-level genotype and phenotype data from consortium-based GWASs are often non-accessible due to policy concerns. However, sharing the summary statistics has become a common practice in the field of complex disease genetics. Summary data for many GWASs are openly accessible online. Therefore, it is of practical interest to estimate genetic covariance based on summary statistics only. To achieve this goal, we define the first K matrices as:
Plugging in these matrices, the first K equations become:
The second equality is based on the property of trace and the fact that first NS samples are shared between two studies. To calculate all the terms in these equations, we note that
Here, we used approximation . This is because genotype data are standardized and the shared sub-cohort is a subset of all individuals. In practice, if two GWASs share samples, the shared sample size is usually greater than several hundred, which is sufficient to make this approximation reasonable.
We approximate the sample linkage disequilibrium (LD) matrices from both studies, i.e., and , using the population LD matrix Dij. In practice, we estimate LD using a reference panel, e.g., samples from the 1000 Genomes Project with European ancestry. In the formula, denotes the LD between the th SNP from category Si and the (l′)th SNP from category Sj.
Here, z1 and z2 denote the z-scores of SNP-level associations from two GWASs; and represent z-scores corresponding to the SNPs in annotation category Sj.
We plug in these quantities and divide N1N2 on both sides of the K equations, then we get:
Next, we study the (K+1)th equation. We define:
We make the following observations.
Again, the approximation is based on the facts that the genotype data are standardized, the first NS rows of matrices Xi and Zi are identical, and the shared sub-cohort is a subset of the complete study with sufficient sample size.
Plugging in these quantities and dividing N1N2 on both sides of the (K+1)th equation, we get:
We denote all K+1 equations in matrix form:
Since ρ1, …, ρK are the parameters of interest, we subtract the (K+1)th equation from the first K equations and remove ρK+1 from the linear system:
When the sample sizes of both GWASs are large and the sample overlap between two studies is moderate, then NS / (N1N2) is a small quantity. We use the approximation . Similarly, we have:
Here, is the phenotypic correlation between two traits among the shared NS samples and is bounded by 1. Therefore the approximation is reasonable. Additional justification on these approximations will be given in the next section.
In summary, the K equations can be approximated by:
We denote
Then, the point estimate of covariance parameters can be denoted as
Importantly, we emphasize that M can be estimated using a reference panel and v is based only on GWAS summary data. No individual-level genotype or phenotype information from the original GWASs is needed in this framework. Finally, we note that in some rare cases (e.g., very similar annotations are used simultaneously in the analysis), matrix M may not be invertible. In that case, we can acquire the genetic covariance estimator through the following minimization problem.
Remarks on Approximation
Several approximations are critical in the estimation framework described above. In this section, we discuss why these approximations are reasonable.
Approximation 1.
This approximation is based on law of large numbers and two assumptions. (1) The genotype matrix is standardized. (2) If two GWASs share samples, the shared sample size NS needs to be sufficiently large. The first assumption is commonly seen in complex trait genetic models. It is actually not a required condition, but it simplifies the algebra. The second assumption is also most likely going to hold in practice. If two GWASs have a sample overlap, it is often because one or more cohorts were used in both studies. A cohort like this usually has a sample size that ranges from several hundred to a few thousand, which is sufficiently large for the law of large numbers to hold.
Approximation 2.
Since the first term does not depend on GWAS sample size, this approximation holds when N1 and N2 are large and the shared sample size NS is moderate. Notably, this condition does not contradict with the condition in approximation 1. In approximation 1, we require the value of NS to exceed several hundred so that the law of large numbers could hold. Here, we require the ratio between NS and the actual GWAS sample size to be small. Since large-scale GWAS meta-analyses published in recent years often have sample sizes on the scale of 104 or 105, this approximation is reasonable. Of note, the term NS / N1N2 is introduced when we remove parameter from the linear system by subtracting the (K+1)th equation from the first K equations. If the true value of , i.e., the non-genetic covariance introduced by sample overlap, is in fact very small compared with the genetic covariance, then this approximation can be omitted. Finally, we note that even if the two GWASs are performed on the identical cohort (i.e., complete sample overlap), then NS / N1N2 = 1/N is still a small quantity as long as the sample size N is big.
Approximation 3.
The z-scores in GWASs usually do not deviate much from the standard normal distribution. Therefore is close to the true correlation between z1 and z2. Similarly, since we assume the phenotypes are standardized, is the phenotypic correlation between two traits among the shared NS samples. Therefore, as long as
or equivalently,
then it is reasonable to omit the term . Therefore, similar to the condition in approximation 2, if the GWAS sample sizes N1 and N2 are large and the shared sample size NS is moderate, then approximation 3 holds. However, we note that if there is a substantial overlap between two studies (e.g., when analyzing two traits measured on the same cohort), then and we can no longer omit the term from the equation.
Special Cases
(1) Two Independent GWASs
If samples from two GWASs do not overlap, then the non-genetic effects and are independent and ρe = 0. So only K equations are needed for estimating covariance estimators. We still define for j = 1,…,K. That gives us:
Therefore, none of the approximations discussed in the previous section is needed in this simple scenario. The covariance estimator remains the same:
(2) No Annotation Stratification
If we do not stratify covariance by functional annotation, then is just a one-dimensional estimator for the overall genetic covariance.
Here, is the average product of z-scores from two GWASs; is the average LD across all SNP pairs in the study, or equivalently, the average LD score across all SNPs in the study. Interestingly, this estimator can be seen as a two-trait extension of the heritability estimator proposed by Bulik-Sullivan.16
(3) Two Different Traits Measured on the Same Cohort
If the samples completely overlap between two GWASs (i.e., N1 = N2 = NS = N), as we discussed in the previous section, approximations 1 and 2 still hold as long as the sample size is large but approximation 3 would fail. Therefore, after subtracting the (K+1)th equation from the first K equations and removing parameter , we get:
In practice, since we do not assume access to the individual-level phenotype data, an estimate of phenotypic correlation needs to be acquired elsewhere (since we assumed phenotypes to be standardized, this is equivalent to phenotypic covariance). Then, we could get the covariance estimate under sample overlap correction:
For some traits, may have been reported in the literature. Otherwise, we need to estimate using GWAS summary statistics. Bulik-Sullivan et al. showed the following LD score regression equation without annotation structure in the genome:7
In this special case when two traits are measured on the same cohort, the formula becomes
Therefore, we could apply LD score regression and use the estimated intercept as .
(4) Binary Traits
In this section, we investigate whether we could analyze ascertained case-control studies using our framework. It has been previously shown that the following formula holds under the liability threshold model:7
where ρobs denotes the covariance on the observed scale, Na,b denotes the number of samples with phenotype a in study 1 and phenotype b in study 2, denotes the total number of samples with phenotype a in study i, and Pi denotes the sample prevalence of trait yi. Using the same approximation we used in method of moments,
it is straightforward to extend it to the following matrix form that allows annotation stratification:
where
We note that when two GWASs do not share any sample. In that case, covariance estimator remains the same:
We just need to interpret it as the covariance on the observed scale.
When two studies have a substantial sample overlap, cannot be ignored in the equations and therefore needs to be estimated. Notably, is in fact the intercept term in cross-trait LD score regression. Therefore, similar to the previous scenario in this section, we could estimate by running LD score regression first and then plug it in the equations to calculate the covariance estimate:
Finally, we note that the argument can be extended to estimate the covariance between a continuous trait and a binary trait. However, the estimated genetic covariance will be on the half-observed scale.7
Remarks on Overlapping Functional Annotations
We have discussed parameter estimation in previous sections. Our framework allows functional annotations to overlap, which is an important feature in real data analysis. However, when functional annotations overlap, the covariance parameter ρ is not the real quantity of interest. Instead, the total covariance in each annotation category is more meaningful biologically. For instance, the total covariance in functional annotation S1 is
where denotes the number of SNPs in region . Of note, this quantity equals to ρ1 when S1 does not overlap with any other functional annotation. Therefore, we use the weighted estimator to estimate the total covariance in each category when functional annotations overlap:
Here, W is a K × K matrix with element
Theoretical Properties of Covariance Estimator and Some Numerical Justifications
As discussed in previous sections, matrices have two major properties under the ideal case where two GWASs do not share samples.
-
(1)
Vector can be directly calculated using GWAS summary statistics.
-
(2)
, where terms only depends on LD and therefore can be estimated using a reference panel.
In this section, we investigate whether changing A could get us another covariance estimator, , that is even better than , which is based on . We show that under reasonable conditions, all estimators in our framework are unbiased but “almost” has the minimal variance. The proof is an extension of the MINQUE theory developed in Rao et al.63
To prove the theoretical properties, we need an additional assumption on the distribution of y1 and y2. We assume that y1 and y2 are marginally standardized and follow a multivariate normal distribution:
H1 and H2 denote the variance-covariance matrices of two traits; denotes the covariance elements between two traits. Based on the model we have described throughout the paper:
We begin with calculating the variance of the quadratic form-like quantity .
Proposition 1. Let A be a N1 × N2 matrix. Then .
Proof:
We note that
Therefore,
Since
we have
Matrix is symmetric; therefore is a quadratic form. This gives us
Therefore,
Notably, if y1 = y2 and A is symmetric, then this result becomes the well-known variance formula for quadratic forms.
Proposition 1 tells us that the variance of contains two parts. Later we will show that the second part, i.e., , is very small compared to the first term, , when analyzing real GWAS data. This is because the individuals in GWASs are almost independent samples and the elements of matrix are small. On the contrary, we assume the data to be standardized, so the diagonal elements of matrices H1 and H2 are always 1. This leads to
With this in mind, the following claim is approximately true.
In the next proposition, we define a N1 × N2 matrix and show that minimizes under some conditions. Based on the argument above, “almost” minimizes too.
Proposition 2. Assume two GWASs do not share samples. We define the following quantities.
-
(i)
Let be an arbitrarily given K-dimensional vector;
-
(ii)
Let be a symmetric matrix with element for ;
-
(iii)
Let be a vector such that ;
-
(iv)
Define .
Then, we have:
-
(1)
;
-
(2)
Let A be a matrix such that . Then, .
Proof:
(1)
Note that
Therefore, it is equivalent to show
In fact,
(2)
Let , then
First, we show that .
In the first part of this proof, we have shown that
Since
or equivalently,
Therefore,
This gives us
Thus, all that remains is to show that . Since H1 and H2 are positive definite, , such that and . Then,
Hence,
Proposition 2 tells us that given arbitrary , if such that , then is an unbiased estimator for . Furthermore, among all unbiased estimators with the form , has the minimum value of , hence “almost” the minimum variance .
Corollary 1. (without annotation stratification) We assume:
-
(i)
Samples from two GWASs do not overlap;
-
(ii)
The samples in each study are completely independent;
-
(iii)
True LD in both studies (i.e., and ) is known.
Consider all matrices A that suffice
We define
Then, with has the lowest variance.
Proof:
Let A be a matrix that suffices . The goal is to show that
Since the samples in each GWAS are completely independent, we have:
Therefore,
Let and . Then, by definition we have
Since we have
by proposition 2, we know that
Therefore,
Of note, is identical to the non-annotation-stratified covariance estimator we developed in previous sections (see section Special Cases). Although we initially defined for the purpose of simplifying calculation, corollary 1 tells us that actually enjoys some good theoretical properties. As we have emphasized before, matrix could greatly simplify the estimation procedure because (1) can be calculated from GWAS summary statistics and (2) depends only on LD. In corollary 1 we showed that if we want to keep the convenient property , then it is impossible to improve the variance of estimator by choosing another matrix . We note, however, additional variability may be introduced when we estimate LD using a reference panel in practice.
Similarly, we have a corollary for annotation-stratified covariance estimator.
Corollary 2. (with annotation stratification) We assume:
-
(i)
Samples from two GWASs do not overlap;
-
(ii)
The samples in each study are completely independent;
-
(iii)
The two LD matrices are known and identical (i.e., );
-
(iv)
SNPs in different functional annotations are not in LD.
Consider all matrix sets that suffice
We define
Then, with has the lowest variance.
Proof:
Since the samples in each GWAS are completely independent, we have:
Therefore,
Given integer such that , let where
and where
Then, by definition we have
By condition (iii), it is straightforward to check
Therefore, by proposition 2 we know that
Since c is arbitrary, we have
Finally, since SNPs in different functional annotations are not in LD,
Therefore, the variance of the cth estimated covariance component:
It is straightforward to check that is identical to the annotation-stratified covariance estimator we developed in previous sections. In corollary 2, we showed that under some reasonable conditions, the annotation-stratified covariance estimator also has the minimal variance property. However, since we assumed linkage equilibrium for SNPs in different functional annotations, this result does not apply to overlapping annotations.
In this section, we have shown some theoretical properties of our covariance estimator. However, the claim
is critical in our argument. Moreover, each proposition and corollary has its assumptions, which may or may not hold in practice. Therefore, we provide numerical justifications to our claims.
Numerical Study 1. Compare and
Simulation workflow:
Step 1. We simulate a matrix whose elements are independently sampled from standard normal distribution .
Step 2. We simulate the matrix
by fixing the diagonal elements to be 1 and sampling the non-diagonal elements from uniform distribution . Sample pairs with a genetic relatedness coefficient greater than 0.05 are often removed from GWAS analysis. Therefore, the matrix we simulate here closely mimics the phenotypic covariance matrices we see in real studies.
Step 3. We sample 10,000 independent vectors from the distribution
Step 4. We calculate and record , , and the sample variance of .
Step 5. Repeat steps 1–4 100 times.
From the simulations, we can see that closely approximates the sample variance of , while is a negligible term (Figure S9). The median log fold, i.e., , is 3.12. Therefore, is on average around 1,300 times greater than in our simulation, which is consistent with our claim.
Numerical Study 2. Compare and
Simulation workflow:
Step 1. Randomly divide 15,918 samples from the Wellcome Trust Case Control Consortium (WTCCC) dataset into two subgroups (each with 7,959 samples and m = 254,221 SNPs after quality control). We simulate 100 independent sets of continuous traits y1 and y2 using real genotype data from WTCCC and the following covariance structure on heritability and genetic covariance:
Step 2. We simulate a 7,959 × 7,959 matrix A whose elements are independently sampled from standard normal distribution N(0,1). We also simulate a matrix A′ of the same size by permuting elements of . Then we rescale matrices A and A′ so that
or equivalently,
This makes all three matrices comparable.
Step 3. Calculate and for all 100 independent sets. Record the sample variance for each quantity.
Step 4. Repeat steps 2–3 100 times and get a distribution for and . Compare them with the sample variance of .
The results are consistent with our previous conclusions. Both and are consistently and substantially greater than . In fact, the variances are not on the same scale. Median is 8.4 × 107 times greater than and median is also 2.5 × 107 times greater (Figure S10). These results suggest that matrix indeed enjoys the minimal variance property when applied to real genetic data.
Estimate Variance via Block-wise Jackknife
In the previous section, we showed that if two traits follow multivariate normal distributions, then . In fact, we could get similar results for covariance, too.
Therefore, the variance-covariance matrix of can be calculated accordingly:
However, it is difficult to calculate . Estimating H1 and H2 would involve additional assumptions on the heritability structure. Even if we could accurately estimate H1 and H2, cannot be calculated using standard GWAS summary statistics. Therefore, following Bulik-Sullivan et al.,7 we apply a block-wise jackknife approach to estimate the variance.
First, we estimate the variance-covariance matrix of
We divide the genome into b (e.g., b = 200) blocks B1, …, Bb. Let
Here, subscript indicates the subset of SNPs in both functional annotation Si and block Bt. Therefore, is the re-calculated vi after removing all SNPs in block Bt from the analysis. Then, Cov(v) is estimated as:
Therefore, we get
If annotations overlap,
Finally, the test statistic for each covariance parameter is
When annotations overlap,
Genetic Correlation
In non-stratified analysis, we could provide the genetic correlation estimate as follows:
We use the estimator proposed in Bulik-Sullivan16 to estimate heritability for each trait.
Compared to genetic covariance, genetic correlation is a more interpretable metric. It is also robust against certain systematic bias that exists in both genetic covariance and heritability (e.g., genomic control correction). However, statistical inference based on genetic covariance is equivalent to that based on genetic correlation. Estimating heritability requires additional model assumptions on the heritability structure and introduces additional variability into the estimation framework. Therefore, although we report the point estimate for genetic correlation, the statistical inference in our method is completely based on genetic covariance only.
In annotation-stratified analysis, the heritability in each annotation category may be small. This is especially true when applying annotations related to the repressed genome. Although methods for estimating annotation-stratified heritability have been proposed,11, 12 they may provide unstable, sometimes even negative, heritability estimates. Therefore, we focus on genetic covariance only when performing annotation-stratified analysis.
Web Resources
Supplemental Data
References
- 1.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yang J., Manolio T.A., Pasquale L.R., Boerwinkle E., Caporaso N., Cunningham J.M., de Andrade M., Feenstra B., Feingold E., Hayes M.G. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 2011;43:519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lee S.H., Yang J., Goddard M.E., Visscher P.M., Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Vattikuti S., Guo J., Chow C.C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lee S.H., Ripke S., Neale B.M., Faraone S.V., Purcell S.M., Perlis R.H., Mowry B.J., Thapar A., Goddard M.E., Witte J.S., Cross-Disorder Group of the Psychiatric Genomics Consortium. International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 2013;45:984–994. doi: 10.1038/ng.2711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.-R., Duncan L., Perry J.R., Patterson N., Robinson E.B., ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pasaniuc B., Price A.L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 2017;18:117–127. doi: 10.1038/nrg.2016.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Anttila V., Bulik-Sullivan B., Finucane H.K., Bras J., Duncan L., Escott-Price V., Falcone G., Gormley P., Malik R., Patsopoulos N. Analysis of shared heritability in common disorders of the brain. bioRxiv. 2016 doi: 10.1126/science.aap8757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zheng J., Erzurumluoglu A.M., Elsworth B.L., Kemp J.P., Howe L., Haycock P.C., Hemani G., Tansey K., Laurin C., Pourcain B.S. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics. 2016;33:272–279. doi: 10.1093/bioinformatics/btw613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Zhou X. A unified framework for variance component estimation with summary statistics in genome-wide association studies. bioRxiv. 2016 doi: 10.1214/17-AOAS1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., ReproGen Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. RACI Consortium Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gusev A., Lee S.H., Trynka G., Finucane H., Vilhjálmsson B.J., Xu H., Zang C., Ripke S., Bulik-Sullivan B., Stahl E., Schizophrenia Working Group of the Psychiatric Genomics Consortium. SWE-SCZ Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. SWE-SCZ Consortium Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 2014;95:535–552. doi: 10.1016/j.ajhg.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lu Q., Powles R.L., Abdallah S., Ou D., Wang Q., Hu Y., Lu Y., Liu W., Li B., Mukherjee S. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 2017;13:e1006933. doi: 10.1371/journal.pgen.1006933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bulik-Sullivan B. Relationship between LD Score and Haseman-Elston Regression. bioRxiv. 2015 [Google Scholar]
- 17.Cichonska A., Rousu J., Marttinen P., Kangas A.J., Soininen P., Lehtimäki T., Raitakari O.T., Järvelin M.R., Salomaa V., Ala-Korpela M. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics. 2016;32:1981–1989. doi: 10.1093/bioinformatics/btw052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zheng J., Richardson T., Millard L., Hemani G., Raistrick C., Vilhjalmsson B., Haycock P., Gaunt T. PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics. bioRxiv. 2017 doi: 10.1093/gigascience/giy090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.van Rheenen W., Shatunov A., Dekker A.M., McLaughlin R.L., Diekstra F.P., Pulit S.L., van der Spek R.A., Võsa U., de Jong S., Robinson M.R., PARALS Registry. SLALOM Group. SLAP Registry. FALS Sequencing Consortium. SLAGEN Consortium. NNIPPS Study Group Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 2016;48:1043–1048. doi: 10.1038/ng.3622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lu Q., Hu Y., Sun J., Cheng Y., Cheung K.-H., Zhao H. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci. Rep. 2015;5:10576. doi: 10.1038/srep10576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lu Q., Powles R.L., Wang Q., He B.J., Zhao H. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet. 2016;12:e1005947. doi: 10.1371/journal.pgen.1005947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C., Snyder M., ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Farooqi I.S. Defining the neural basis of appetite and obesity: from genes to behaviour. Clin. Med. (Lond.) 2014;14:286–289. doi: 10.7861/clinmedicine.14-3-286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Manning A.K., Hivert M.-F., Scott R.A., Grimsby J.L., Bouatia-Naji N., Chen H., Rybin D., Liu C.-T., Bielak L.F., Prokopenko I., DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. Multiple Tissue Human Expression Resource (MUTHER) Consortium A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 2012;44:659–669. doi: 10.1038/ng.2274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Matesanz F., Potenciano V., Fedetz M., Ramos-Mozo P., Abad-Grau Mdel.M., Karaky M., Barrionuevo C., Izquierdo G., Ruiz-Peña J.L., García-Sánchez M.I. A functional variant that affects exon-skipping and protein expression of SP140 as genetic mechanism predisposing to multiple sclerosis. Hum. Mol. Genet. 2015;24:5619–5627. doi: 10.1093/hmg/ddv256. [DOI] [PubMed] [Google Scholar]
- 27.Sawcer S., Hellenthal G., Pirinen M., Spencer C.C., Patsopoulos N.A., Moutsianas L., Dilthey A., Su Z., Freeman C., Hunt S.E., International Multiple Sclerosis Genetics Consortium. Wellcome Trust Case Control Consortium 2 Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011;476:214–219. doi: 10.1038/nature10251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Franke A., McGovern D.P., Barrett J.C., Wang K., Radford-Smith G.L., Ahmad T., Lees C.W., Balschun T., Lee J., Roberts R. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 2010;42:1118–1125. doi: 10.1038/ng.717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Marenholz I., Esparza-Gordillo J., Lee Y.-A. Shared genetic determinants between eczema and other immune-related diseases. Curr. Opin. Allergy Clin. Immunol. 2013;13:478–486. doi: 10.1097/ACI.0b013e328364e8f7. [DOI] [PubMed] [Google Scholar]
- 30.Reppe S., Wang Y., Thompson W.K., McEvoy L.K., Schork A.J., Zuber V., LeBlanc M., Bettella F., Mills I.G., Desikan R.S., GEFOS Consortium Genetic sharing with cardiovascular disease risk factors and diabetes reveals novel bone mineral density loci. PLoS ONE. 2015;10:e0144531. doi: 10.1371/journal.pone.0144531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Alnaes R., Torgersen S. Personality and personality disorders predict development and relapses of major depression. Acta Psychiatr. Scand. 1997;95:336–342. doi: 10.1111/j.1600-0447.1997.tb09641.x. [DOI] [PubMed] [Google Scholar]
- 32.Kendler K.S., Neale M.C., Kessler R.C., Heath A.C., Eaves L.J. A longitudinal twin study of personality and major depression in women. Arch. Gen. Psychiatry. 1993;50:853–862. doi: 10.1001/archpsyc.1993.01820230023002. [DOI] [PubMed] [Google Scholar]
- 33.de Moor M.H., van den Berg S.M., Verweij K.J., Krueger R.F., Luciano M., Arias Vasquez A., Matteson L.K., Derringer J., Esko T., Amin N., Genetics of Personality Consortium Meta-analysis of genome-wide association studies for neuroticism, and the polygenic association with major depressive disorder. JAMA Psychiatry. 2015;72:642–650. doi: 10.1001/jamapsychiatry.2015.0554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gale C.R., Hagenaars S.P., Davies G., Hill W.D., Liewald D.C., Cullen B., Penninx B.W., Boomsma D.I., Pell J., McIntosh A.M., International Consortium for Blood Pressure GWAS, CHARGE Consortium Aging and Longevity Group Pleiotropy between neuroticism and physical and mental health: findings from 108 038 men and women in UK Biobank. Transl. Psychiatry. 2016;6:e791. doi: 10.1038/tp.2016.56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sun H.-L., Pei D., Lue K.-H., Chen Y.-L. Uric acid levels can predict metabolic syndrome and hypertension in adolescents: a 10-year longitudinal study. PLoS ONE. 2015;10:e0143786. doi: 10.1371/journal.pone.0143786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chrabot B.S., Kariuki S.N., Zervou M.I., Feng X., Arrington J., Jolly M., Boumpas D.T., Reder A.T., Goulielmos G.N., Niewold T.B. Genetic variation near IRF8 is associated with serologic and cytokine profiles in systemic lupus erythematosus and multiple sclerosis. Genes Immun. 2013;14:471–478. doi: 10.1038/gene.2013.42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jung J.Y., Kohane I.S., Wall D.P. Identification of autoimmune gene signatures in autism. Transl. Psychiatry. 2011;1:e63. doi: 10.1038/tp.2011.62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Guloksuz S.A., Abali O., Aktas Cetin E., Bilgic Gazioglu S., Deniz G., Yildirim A., Kawikova I., Guloksuz S., Leckman J.F. Elevated plasma concentrations of S100 calcium-binding protein B and tumor necrosis factor alpha in children with autism spectrum disorders. Rev. Bras. Psiquiatr. 2017;39:195–200. doi: 10.1590/1516-4446-2015-1843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.van Oosten B.W., Barkhof F., Truyen L., Boringa J.B., Bertelsmann F.W., von Blomberg B.M., Woody J.N., Hartung H.-P., Polman C.H. Increased MRI activity and immune activation in two multiple sclerosis patients treated with the monoclonal anti-tumor necrosis factor antibody cA2. Neurology. 1996;47:1531–1534. doi: 10.1212/wnl.47.6.1531. [DOI] [PubMed] [Google Scholar]
- 40.Aiba Y., Yamazaki K., Nishida N., Kawashima M., Hitomi Y., Nakamura H., Komori A., Fuyuno Y., Takahashi A., Kawaguchi T. Disease susceptibility genes shared by primary biliary cirrhosis and Crohn’s disease in the Japanese population. J. Hum. Genet. 2015;60:525–531. doi: 10.1038/jhg.2015.59. [DOI] [PubMed] [Google Scholar]
- 41.Lambert J.-C., Ibrahim-Verbaas C.A., Harold D., Naj A.C., Sims R., Bellenguez C., DeStafano A.L., Bis J.C., Beecham G.W., Grenier-Boley B., European Alzheimer’s Disease Initiative (EADI) Genetic and Environmental Risk in Alzheimer’s Disease. Alzheimer’s Disease Genetic Consortium. Cohorts for Heart and Aging Research in Genomic Epidemiology Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 2013;45:1452–1458. doi: 10.1038/ng.2802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Haines J.L., Ter-Minassian M., Bazyk A., Gusella J.F., Kim D.J., Terwedow H., Pericak-Vance M.A., Rimmler J.B., Haynes C.S., Roses A.D., The Multiple Sclerosis Genetics Group A complete genomic screen for multiple sclerosis underscores a role for the major histocompatability complex. Nat. Genet. 1996;13:469–471. doi: 10.1038/ng0896-469. [DOI] [PubMed] [Google Scholar]
- 43.Bertram L., Tanzi R.E. The genetic epidemiology of neurodegenerative disease. J. Clin. Invest. 2005;115:1449–1457. doi: 10.1172/JCI24761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Nuytemans K., Maldonado L., Ali A., John-Williams K., Beecham G.W., Martin E., Scott W.K., Vance J.M. Overlap between Parkinson disease and Alzheimer disease in ABCA7 functional variants. Neurol. Genet. 2016;2:e44. doi: 10.1212/NXG.0000000000000044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gagliano S.A., Pouget J.G., Hardy J., Knight J., Barnes M.R., Ryten M., Weale M.E. Genomics implicates adaptive and innate immunity in Alzheimer’s and Parkinson’s diseases. Ann. Clin. Transl. Neurol. 2016;3:924–933. doi: 10.1002/acn3.369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Reginato A.M., Mount D.B., Yang I., Choi H.K. The genetics of hyperuricaemia and gout. Nat. Rev. Rheumatol. 2012;8:610–621. doi: 10.1038/nrrheum.2012.144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Culleton B.F., Larson M.G., Kannel W.B., Levy D. Serum uric acid and risk for cardiovascular disease and death: the Framingham Heart Study. Ann. Intern. Med. 1999;131:7–13. doi: 10.7326/0003-4819-131-1-199907060-00003. [DOI] [PubMed] [Google Scholar]
- 48.Kanbay M., Jensen T., Solak Y., Le M., Roncal-Jimenez C., Rivard C., Lanaspa M.A., Nakagawa T., Johnson R.J. Uric acid in metabolic syndrome: From an innocent bystander to a central player. Eur. J. Intern. Med. 2016;29:3–8. doi: 10.1016/j.ejim.2015.11.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Watanabe S., Kang D.-H., Feng L., Nakagawa T., Kanellis J., Lan H., Mazzali M., Johnson R.J. Uric acid, hominoid evolution, and the pathogenesis of salt-sensitivity. Hypertension. 2002;40:355–360. doi: 10.1161/01.hyp.0000028589.66335.aa. [DOI] [PubMed] [Google Scholar]
- 50.Rao G.N., Corson M.A., Berk B.C. Uric acid stimulates vascular smooth muscle cell proliferation by increasing platelet-derived growth factor A-chain expression. J. Biol. Chem. 1991;266:8604–8608. [PubMed] [Google Scholar]
- 51.Yu M.-A., Sánchez-Lozada L.G., Johnson R.J., Kang D.-H. Oxidative stress with an activation of the renin-angiotensin system in human vascular endothelial cells as a novel mechanism of uric acid-induced endothelial dysfunction. J. Hypertens. 2010;28:1234–1242. [PubMed] [Google Scholar]
- 52.Sautin Y.Y., Nakagawa T., Zharikov S., Johnson R.J. Adverse effects of the classic antioxidant uric acid in adipocytes: NADPH oxidase-mediated oxidative/nitrosative stress. Am. J. Physiol. Cell Physiol. 2007;293:C584–C596. doi: 10.1152/ajpcell.00600.2006. [DOI] [PubMed] [Google Scholar]
- 53.Krishnan E., Pandya B.J., Chung L., Hariri A., Dabbous O. Hyperuricemia in young adults and risk of insulin resistance, prediabetes, and diabetes: a 15-year follow-up study. Am. J. Epidemiol. 2012;176:108–116. doi: 10.1093/aje/kws002. [DOI] [PubMed] [Google Scholar]
- 54.Jang W.C., Nam Y.H., Ahn Y.C., Park S.M., Yoon I.K., Choe J.-Y., Park S.-H., Her M., Kim S.-K. G109T polymorphism of SLC22A12 gene is associated with serum uric acid level, but not with metabolic syndrome. Rheumatol. Int. 2012;32:2257–2263. doi: 10.1007/s00296-011-1952-5. [DOI] [PubMed] [Google Scholar]
- 55.Pfister R., Barnes D., Luben R., Forouhi N.G., Bochud M., Khaw K.-T., Wareham N.J., Langenberg C. No evidence for a causal link between uric acid and type 2 diabetes: a Mendelian randomisation approach. Diabetologia. 2011;54:2561–2569. doi: 10.1007/s00125-011-2235-0. [DOI] [PubMed] [Google Scholar]
- 56.Yang Q., Köttgen A., Dehghan A., Smith A.V., Glazer N.L., Chen M.-H., Chasman D.I., Aspelund T., Eiriksdottir G., Harris T.B. Multiple genetic loci influence serum urate levels and their relationship with gout and cardiovascular disease risk factors. Circ Cardiovasc Genet. 2010;3:523–530. doi: 10.1161/CIRCGENETICS.109.934455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bhattacharjee S., Rajaraman P., Jacobs K.B., Wheeler W.A., Melin B.S., Hartge P., Yeager M., Chung C.C., Chanock S.J., Chatterjee N., GliomaScan Consortium A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 2012;90:821–835. doi: 10.1016/j.ajhg.2012.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Majumdar A., Haldar T., Bhattacharya S., Witte J. An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations. bioRxiv. 2017 doi: 10.1371/journal.pgen.1007139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Pickrell J.K., Berisa T., Liu J.Z., Ségurel L., Tung J.Y., Hinds D.A. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 2016;48:709–717. doi: 10.1038/ng.3570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Han B., Pouget J.G., Slowikowski K., Stahl E., Lee C.H., Diogo D., Hu X., Park Y.R., Kim E., Gregersen P.K., Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases. Nat. Genet. 2016;48:803–810. doi: 10.1038/ng.3572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Shi H., Kichaev G., Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 2016;99:139–153. doi: 10.1016/j.ajhg.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Shi H., Mancuso N., Spendlove S., Pasaniuc B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. bioRxiv. 2016 doi: 10.1016/j.ajhg.2017.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Rao C.R. Estimation of variance and covariance components in linear models. J. Am. Stat. Assoc. 1972;67:112–115. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.