Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2018 Oct 5;14(10):e1007549. doi: 10.1371/journal.pgen.1007549

Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits

Guanghao Qi 1, Nilanjan Chatterjee 1,2,*
Editor: Michael P Epstein3
PMCID: PMC6192650  PMID: 30289880

Abstract

Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.

Author summary

Pleiotropy is a common phenomenon in genetics that one genetic variant has effects on multiple traits. The shared genetic information across correlated traits can potentially be exploited for enhanced detection of susceptibility loci. Most existing multi-trait methods borrow information across phenotypes but not across SNPs, which can be inefficient for traits that have major overlap. We propose a method that borrows information both across traits and across SNPs to conduct powerful association analysis using summary-level data. Simulations show that the method has correct type-I error rate and substantial increase in power. Application to blood lipids, psychiatric diseases and social science traits identified plenty of new loci that cannot be detected by individual trait analysis. Our method can potentially be extended to high-dimensional phenotypes as a dimension reduction technique.

Introduction

Genome-wide association studies of increasingly large sample sizes are continuing to inform genetic basis of complex diseases. These studies have now led to identification of scores of susceptibility SNPs underlying a vast variety of individual complex traits and diseases [13]. Moreover, analyses of heritability and effect-size distributions have shown that each trait is likely to be associated with thousands to tens of thousands of additional susceptibility variants, each of which individually has very small effects, but in combination they can explain substantial fraction of trait variation [415]. GWAS of increasing sample sizes as well as re-analysis of current studies with powerful statistical methods are expected to lead to identification of many of these additional variants.

An approach to increase the power of existing GWAS is to borrow strength across related traits. Comparisons of GWAS discoveries across traits have clearly shown that pleiotropy is a common phenomenon [3, 14, 1619]. Aggregated analysis of multiple related traits have led to identification of novel SNPs that could not be detected through analysis of individual traits alone [2023]. Further, analysis of genetic correlation using genome-wide panel of SNPs have identified groups of traits that are likely to share many underlying genetic variants of small effects [10, 12, 14, 24, 25]. As summary-level association statistics from large GWAS are now increasingly accessible, there is a great opportunity to accelerate discoveries through novel cross-trait analysis of these datasets.

A variety of methods have been developed in the past decade to increase power of GWAS analysis by combining information across multiple traits [2637]. Many of these methods have focused on developing test-statistics that are likely to have optimal power for detecting an individual SNP under certain types of alternatives of its shared effects across multiple traits [26, 30, 31, 35, 38, 39]. These approaches do not borrow information across SNPs and may be inefficient for analysis of traits that are likely to have major overlap in their underlying genetic architecture. For the analysis of psychiatric diseases, for example, it has been shown that borrowing pleiotropic information across SNPs can be used to improve power of detection of individual SNP associations and genetic risk prediction [40, 41].

In this article, we propose a novel method for powerful aggregated association analysis for individual SNPs across groups of multiple, highly related, traits informed by genome-wide estimates of heritability and genetic covariance. We derive optimal test-statistics based on orthogonal linear combinations of association coefficients across traits–the directions are expected to maximize genome-wide averages of the underlying non-centrality parameters in a gradually decreasing order. We exploit recent developments in LD-score regression methodology [14, 42] for estimation of phenotypic and genotypic correlations for implementation of the method using only summary-level results from GWAS. Our method applies to traits measured on different samples with unknown overlap.

We evaluate performance of the proposed method through extensive simulation studies using a novel scheme for directly generating summary-level association statistics for large GWAS for multiple traits with possibly overlapping samples. We use the proposed method to analyze summary-statistics available from consortia of GWAS of lipid traits [43], psychiatric diseases [20] and social science traits [44]. These applications empirically illustrate that HIPO components can be highly enriched with association signals and can identify novel and replicable associations that are not identifiable at comparable level of significance based on analysis of the individual traits.

Material and methods

Model and assumptions

Suppose that the summary level results are available for K traits. For a given SNP j, let β^j and sj denote vectors of length K containing estimates of regression parameters and associated standard errors, respectively, for the K traits. Let M be the total number of SNPs under study. Throughout, we will assume both genotypes and phenotypes are standardized to have mean 0 and variance 1. Let Nk denote the sample size for GWAS for the k-th trait. For binary traits, Nk is the effective sample size Ncase,kNcontrol,kNcase,k+Ncontrol,k. We assume Nk can vary across studies because traits may be measured on distinct, but potentially overlapping, samples (see Section C in S1 Appendix for discussion of the case where sample size varies across SNPs within the same study). We assume that summary-level statistics in GWAS are obtained based on one SNP at a time analysis and that β^j|βj follows a multivariate normal distribution: N(βj,Σβ^j), where βj = (βj1, …, βjK)T is referred to as the “marginal” effect sizes, the coefficients that will be obtained by fitting single-SNP regression models across the individual traits in the underlying population. The variance-covariance matric Σβ^j, which may include non-zero covariance terms when the studies have overlapping samples, will be estimated based on estimates of standard errors of the individual coefficients (sj) and estimate of “phenotypic correlation” that could be obtained based on LD-score regression.

Power optimization

Power has a one-to-one correspondence with the non-centrality parameter (NCP, denoted by δ) of the underlying χ2-statistic. Therefore, we try to find the linear combination cTβ^ that maximizes the average NCP across SNPs (denoted by E[δ]), which is given by

E[δ]=E[(cTβ)2]var(cTβ^). (1)

Here c is the vector of weights for a HIPO component associated with individual traits; we drop the subscript j from β^j and βj and use β^ and β for simple notations. The denominator is easy to simplify: var(cTβ^)=cTΣβ^c, which does not depend on true value of β. We derive an expression of the numerator based on commonly used random effect models that are used to characterize genetic variance-covariances.

Let βj(J)=(βj1(J),,βjK(J))T denote the vector of “joint” effect sizes associated with SNP j that could be obtained by simultaneous analysis of SNPs in multivariate models across the K individual traits. We assume that βj(J) follows a multivariate normal distribution N(0,ΣgM), where Σg is the genetic covariance matrix. It follows that βj, the vector of marginal regression coefficients, is also normally distributed with mean 0 and E[βjTβj|lj]=ljΣgM, where lj=j=1Mrjj2 is the LD score. Here rjj is the correlation of genotypes between SNP j and j′.

Thus, based on the above model, the numerator of (1) can be written as E[(cTβ]2]=cTE[E[ββT|l]]c=E[l]McTΣgc. Therefore, we have

E[δ]=E[l]McTΣgccTΣβ^c.

The matrix Σβ^ needs to take into account the sample size differences and overlaps across studies. When all the phenotypes are measured on the same set of people, Σβ^ is proportional to the phenotypic variance-covariance matrix and E[δ] reduces to maximizing the heritability of the combined traits cTy (MaxH) [34]. Here y = (y1, …, yK) is the vector of phenotypes. But HIPO is more general and can be applied to traits measured on different samples with unknown overlap. The LD-score regression allows estimation of both Σg and Σβ^ based on underlying slope and intercept parameters, respectively, using GWAS summary-level statistics (Section A in S1 Appendix) [14, 42].

The first HIPO component c1 is given by solving the following optimization problem:

maxccTΣg^csubjecttocTΣβ^^c=1.

Subsequent components ck are defined iteratively by solving a slightly different optimization problem

maxccTΣg^csubjecttocTΣβ^^c=1andcTΣβ^^cl=0(l=1,2,,k-1).

The above procedure can be implemented by suitable eigen decomposition, resulting in a total of K HIPO components (Section B in S1 Appendix). We call the first HIPO component HIPO-D1, the second HIPO component HIPO-D2, and so on. Interestingly, it can be shown that the eigenvalues resulting from this procedure are the average NCP for χ2 association-statistics across SNPs along the HIPO directions (up to the same scale constant, Section B in S1 Appendix). Ideally, it is adequate to consider the top HIPO components if a few eigenvalues clearly dominate the others.

For the kth HIPO component, the association for the SNP j is tested using Z-statistics in the form

zj,ck,=ckTβ^jckTΣβ^^ck.

It is easy to see that HIPO z-statistics reduce to the inverse standard error weighted z-scores when all traits have the same heritability, have genetic correlation 1 and, there is no sample overlap across studies. Therefore, HIPO can also be viewed as an extension of standard single-trait meta-analysis.

It can be expected from theory that the performance of HIPO, characterized by the increase of average NCP of HIPO components compared to individual traits, does not directly depend on the overlaps of causal SNPs across traits or the overlap of samples across studies. In fact, E[δ] only depends on the covariance matrices var(β) and var(β^). These two matrices can stay the same under different degrees of causal SNP overlap and sample overlap.

A closely related method is MTAG, which uses the summary level data of multiple traits to estimate single trait effects, based on genetic and phenotypic correlation across traits [36]. MTAG is also based on linear combinations of summary statistics but the weights are different from those obtained by HIPO. More specifically, MTAG solves the moment equation

E[β^j-ωkωkkβj,k]=0,

which gives the solution

β^MTAG,j,k=ωkTωkk(Ω-ωkωkTωkk+Σβ^)-1ωkTωkk(Ω-ωkωkTωkk+Σβ^)-1ωkωkkβ^j.

Here var(βj) = Ω, and ωk is a vector equal to the k-th column of Ω and ωkk is a scalar equal to the k-th diagonal element of Ω. The matrix Σβ^ is the same for all SNPs if both genotypes and phenotypes are standardized to have mean 0 and variance 1. We will compare HIPO with MTAG in simulations and real data analysis.

Simulations

We use a novel simulation method that directly generates summary level data for GWAS of multiple traits preserving realistic genotypic and phenotypic correlation structures. We proposed the single-trait version of this approach in a recent study [15]. We propose to simulate GWAS estimate for marginal effects across K traits, denoted as β^j=(β^j1,,β^jK)T, using a model of the form

β^j=βj+vj+ej,

where two types of errors terms, vj and ej, are introduced to account for variability due to population stratification effects and estimation uncertainty, respectively. We assume the population stratification effects vj s follow independent and identically distributed (i.i.d.) multivariate normal across SNPs. We generate the estimation error terms e~=(e1T,,eMT)T following a multivariate normal distribution that takes into account both phenotypic correlation across traits and linkage disequilibrium across SNPs. Note that there is widespread correlation between error terms, which can exist across different SNPs within the same study or for the same SNP across different studies. Correlation can also exist between different SNPs in different studies in the presence of LD, phenotypic correlation and sample overlap. All the possibilities can be captured by simulating from e~~N(0,RΣe) where the covariance matrix is the Kronecker product of the LD coefficient matrix R={rjj}j,j=1,,M and

Σe={NklNkNlcov(yk,yl)}k,l=1,2,K

where the (k, l) element involves sample sizes, the sample overlap Nkl and the phenotypic covariance between the kth and lth trait (Section D in S1 Appendix). We assume that the sample size is the same for all the SNPs within the same study.

We simulate βj by first randomly selecting ~12K causal SNPs out of a reference panel of ~1.2 million HapMap3 SNPs with MAF >5% in 1000 Genomes European population. This SNP list is downloaded from LD Hub [45]. For selected casual SNPs, we generate i.i.d. joint effect sizes βj(J) from a multivariate normal distribution N(0,Σg12,000), where Σg is the genetic covariance matrix. For simplicity we assume all the traits have the same set of causal SNPs. We calculate the marginal effect sizes βj as the sum of the joint effect size of SNPs in neighborhood Nj weighted by the LD coefficient, i.e. βj=jNjβj(J)rjj. The neighborhood Nj is defined to be set of SNPs that are within 1MB distance and have r2 > 0.01 with respect to SNP j.

For simulation of e~, we observe that in a GWAS study where the phenotypes have no association with any of the markers, the summary-level association statistics is expected to follow the same multivariate distribution as e~. We utilize individual level genotype data available from 489 European samples from the 1000 Genomes Project. For each of the 489 subjects, we simulate a vector of phenotype from a predetermined multivariate normal distribution without any reference to their genotypes. We then conducted standard one SNP at a time GWAS analysis for each trait to compute the association statistics β^j,1000G=(β^j1,1000G,,β^jK,1000G)T for the 1.2 million SNPs. To mimic the incomplete sample overlap between traits, we can calculate β^j1,1000G,…β^jK,1000G based on different subsamples of 1000 Genomes EUR, of size n1, …, nK. Finally, to generate error terms according to sample size specification for our simulation studies, we use the adjustment

ej=(n1N1β^j1,1000G,,nKNKβ^jK,1000G)T,

We show in Section D in S1 Appendix that this e~=(e1T,,eMT)T has the desired distribution.

We conduct simulation studies to validate HIPO-based association tests and investigate expected power gain under varying sample size and heritability. For simplicity, we first consider the scenarios where all traits are measured on the same set of subjects. To make the settings more realistic, we use two sets of genetic and phenotypic covariance matrices estimated from real data:

  1. Blood lipid traits: LDL, HDL, TG, TC (see next section for full name) Σg=hmax2(0.87-0.040.300.85-0.041.00-0.620.180.30-0.620.930.300.850.180.300.95),Σy=(1.00-0.100.210.86-0.101.00-0.360.120.21-0.361.000.320.860.120.321.00)

  2. Psychiatric diseases: ASD, BIP, SCZ (see next section for full name) Σg=hmax2(0.690.020.120.020.880.630.120.631.00),Σy=(1.000.010.000.011.000.010.000.011.00).

We choose only 3 psychiatric diseases instead of all 5 involved in real data analysis to speed up computation. ASD, BIP and SCZ have high heritability and substantial genetic correlation, which should be helpful to illustrate the property of the method. We vary the value of scale factor hmax2=0.1,0.2,0.35,0.5 to control heritability of the traits while preserving the genetic correlation structure. We also vary the sample size: N = 10K, 50K, 100K, 500K. The covariance matrix of vj is set to

7.35×10-8(10.50.50.50.510.50.50.50.510.50.50.50.51)and7.35×10-8(10.50.50.510.50.50.51)

in the first and second settings, respectively. This choice of parameters leads to an average per SNP population stratification that is about 25% of the per SNP heritability when hmax2=0.35. For each setting we repeat the simulation 100 times.

More extensive simulations are conducted to investigate the performance of HIPO under partial sample overlap across studies, and when different traits have different sets of causal SNPs (S1 Table). We also investigate the performance in higher dimensions by simulating 10 traits which are divided into two blocks of 5 traits, with higher correlation within blocks and lower correlation between blocks (S1 Table). In addition, we conduct simulations using UK Biobank data to study the type I error of HIPO under unbalanced case-control design (Section E.1 of S1 Appendix), as well as the relationship between the number of dominant HIPO components and underlying genetic mechanisms (Section E.2 in S1 Appendix).

Summary level data

We analyze publicly available GWAS summary-level results across three groups of traits measured on European ancestry samples using the proposed method. Global Lipids Genetics Consortium (GLGC) provides the GWAS results for levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides (TG) and total cholesterol (TC) [43]. The data consist of 188,577 European-ancestry individuals with ~1.8 million SNPs after implementing the LD Hub quality control procedure (described at the end of this section).

The Psychiatric Genomics Consortium (PGC) cross-disorder study analyzed data for 5 psychiatric disorders: autism spectrum disorder (ASD), attention deficit-hyperactivity disorder (ADHD), bipolar disorder (BIP), major depressive disorder (MDD) and schizophrenia (SCZ) [20, 4649]. Two of the five traits involved trio data: ASD (4788 trio cases, 4788 trio pseudocontrols, 161 cases, 526 controls, equivalent to 4949 cases and 5314 controls) and ADHD (1947 trio cases, 1947 trio pseudocontrols, 840 cases, 688 controls, equivalent to 2787 cases and 2635 controls). The other three studies did not involve trios: BIP (6990 cases, 4820 controls), MDD (9227 cases, 7383 controls) and SCZ (9379 cases, 7736 controls). After applying the same QC procedure, we included ~1.05 million SNPs for HIPO analysis.

The Social Science Genetic Association Consortium (SSGAC) provides summary statistics for depressive symptoms (DS, N = 161,460), neuroticism (NEU, N = 170,911) and subjective well-being (SWB, N = 298,420) [44]. The DS data are the meta-analysis results combining a study by the Psychiatric Genomics Consortium [48], the initial release of UK Biobank (UKB) [50] and the Resource for Genetic Epidemiology Research on Aging cohort (dbGap, phs000674.v1.p1). For neuroticism, the study pooled summary level data sets from UKB and Genetics Personality Consortium (GPC). The SWB data is the meta-analysis result from 59 cohorts [44]. All subjects are of European ancestry. We analyzed ~2.1 million SNPs after QC.

For all three groups of traits, we use the GWAS parameter estimates and standard errors to compute the z-statistics and p-values without making post-meta-analysis correction of genomic control factors. We perform SNP filtering to all three groups of phenotypes based on LD Hub quality control guideline. Markers that meet the following conditions are removed: (1) with extremely large effect size (χ2 > 80) (to avoid the results to be unduly influenced by outliers) (2) within the major histocompatibility complex (MHC) region (26Mb~34Mb on chromosome 6) (3) MAF less than 5% in 1000 Genomes Project Phase 3 European samples (4) sample size less than 0.67 times the 90th percentile of the total sample size (to account for SNP missingness) (5) alleles do not match the 1000 Genomes alleles. We further remove SNPs that are missing for at least one trait. The summary statistics are supplied to LDSC software [14, 42, 45] to fit LD score regression.

We defined a locus to be “novel” if it contains at least one SNP that reach genome-wide significance (p-value < 5×10−8) by the HIPO method and the lead SNP in the region is at least 0.5 Mb away and has r2 < 0.1 from all lead SNPs of genome-wide significance regions identified by individual trait analysis (Section F in S1 Appendix).

Results

Simulations

Simulation results show that all HIPO components maintain the correct type I error rate with or without population stratification, consistently across different sample sizes and values of heritability (Table 1 and S2S5 Tables), even in unbalanced case-control studies (S13 Table). One representative example is the case with covariance structure of blood lipids (Table 1). It can be seen that correct type I errors are maintained under three different significance levels: p<0.05, p<0.01 and p<0.001. Desired type I error rates are also maintained in simulation settings where causal SNPs across traits only overlapped partially and there are incomplete sample overlaps across studies (S3 Table). Similar patterns can be observed for the case with covariance structure of psychiatric diseases (S4 and S5 Tables). In the presence of population stratification, the degree of which is modest according to our simulation scheme, tests based on individual traits show somewhat inflated type I error under large sample size (e.g. 500K) (Table 1 and S4 Table).

Table 1. Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of blood lipids.

See S1 Table 1a and 1b for detailed settings. Summary-level association statistics are simulated for 4 traits using genetic and phenotypic covariance matrices estimated from Global Lipids Genetics Consortium (GLGC) data, with and without population stratification. The results for HIPO-D1 and the most heritable trait are listed. Reported are the average of genome-wide type I error rates across 100 simulations, under significance thresholds p<0.05, p<0.01 and p<0.001.

hmax2 p-value threshold 0.1 0.2 0.35 0.5 0.1 0.2 0.35 0.5
N
HIPO-D1 Without population stratification With population stratification
10K p<0.05 0.051 0.051 0.05 0.05 0.051 0.051 0.05 0.05
p<0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
p<0.001 0.0011 0.001 0.001 0.001 0.0011 0.001 0.001 0.001
50K p<0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
p<0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
p<0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
100K p<0.05 0.05 0.05 0.051 0.05 0.05 0.05 0.05 0.051
p<0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
p<0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
500K p<0.05 0.05 0.051 0.05 0.052 0.05 0.051 0.051 0.052
p<0.01 0.01 0.01 0.01 0.011 0.01 0.01 0.01 0.011
p<0.001 0.001 0.0011 0.001 0.0011 0.001 0.001 0.0011 0.0011
Most heritable trait Without population stratification With population stratification
10K p<0.05 0.051 0.051 0.05 0.051 0.051 0.051 0.05 0.051
p<0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
p<0.001 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.001 0.0011
50K p<0.05 0.05 0.05 0.051 0.051 0.051 0.051 0.051 0.051
p<0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
p<0.001 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011
100K p<0.05 0.051 0.051 0.051 0.051 0.052 0.051 0.051 0.051
p<0.01 0.01 0.01 0.01 0.01 0.011 0.011 0.011 0.011
p<0.001 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011 0.0011
500K p<0.05 0.05 0.051 0.051 0.051 0.055 0.055 0.055 0.055
p<0.01 0.01 0.01 0.01 0.01 0.012 0.012 0.012 0.012
p<0.001 0.0011 0.0011 0.0011 0.0011 0.0013 0.0013 0.0013 0.0013

hmax2 is the largest heritability among the individual traits.

Results also show that association analysis based on HIPO-D1 leads to substantial number of additional true discoveries compared to that based on the most heritable trait (S6 Table). In most settings, the value of average χ2-statistics are larger for HIPO-D1 than those for the individual traits and MTAG estimates (S7 and S8 Tables). After LD-pruning, HIPO components identify a substantial number of novel loci that are not discovered by individual trait analysis (Table 2, S9 and S10 Tables). For the case with covariance structure of blood lipids, the largest power increase can be observed when N = 50K and N = 100K, where HIPO can identify up to 206 new loci (Table 2). Similar patterns are followed in other settings (S9 and S10 Tables). Results also show that QQ plots for HIPO-D1 to be more enriched with signals than those for the most heritable trait (S1S4 Figs).

Table 2. Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids.

See 1a-1d in S1 Table for detailed settings. We report the average number of truly associated loci identified by all the individual traits/HIPO components/MTAG estimates across 100 simulations, under significance threshold p < 5 × 10−8 and LD pruning threshold r2 < 0.1 and different loci required to be >0.5Mb apart.

hmax2 0.1 0.2 0.35 0.5 0.1 0.2 0.35 0.5
N
Same causal SNPs Without population stratification With population stratification
10K Individual traits 0 1 1 3 0 0 1 3
HIPO 0 0 1 2 0 0 1 2
MTAG 0 1 1 4 0 0 1 4
HIPO new 0 0 1 2 0 0 1 1
MTAG new 0 0 1 2 0 0 1 2
50K Individual traits 3 25 134 341 3 25 136 344
HIPO 2 24 129 317 2 23 128 318
MTAG 4 32 165 393 3 32 165 394
HIPO new 2 12 52 98 1 12 51 101
MTAG new 2 13 51 90 1 12 50 90
100K Individual traits 25 193 722 1299 26 201 727 1314
HIPO 23 185 679 1232 24 186 669 1229
MTAG 32 233 799 1382 33 234 791 1381
HIPO new 12 68 167 206 12 66 154 197
MTAG new 12 65 139 157 13 62 130 148
500K Individual traits 1303 2536 3173 3490 1352 2557 3186 3491
HIPO 1240 2464 3132 3471 1204 2425 3119 3454
MTAG 1394 2568 3175 3488 1358 2531 3164 3474
HIPO new 202 144 81 53 171 120 70 46
MTAG new 162 88 33 17 121 63 26 12
Partial causal SNP overlap Partial sample overlap
10K Individual traits 0 0 1 3 0 0 1 1
HIPO 0 0 1 3 0 0 0 1
MTAG 0 0 1 4 0 0 1 1
HIPO new 0 0 1 2 0 0 0 1
MTAG new 0 0 1 2 0 0 0 1
50K Individual traits 3 26 135 348 1 8 49 136
HIPO 2 21 122 312 1 8 50 139
MTAG 4 33 163 403 1 11 67 179
HIPO new 2 13 60 126 1 5 26 61
MTAG new 2 12 49 96 1 6 28 64
100K Individual traits 26 196 730 1336 8 73 334 719
HIPO 23 178 651 1219 8 73 332 715
MTAG 33 239 802 1412 12 98 413 845
HIPO new 14 84 195 256 5 36 122 199
MTAG new 13 68 136 153 6 38 118 184
500K Individual traits 1338 2601 3269 3588 725 1954 2799 3193
HIPO 1225 2523 3243 3583 721 1938 2772 3181
MTAG 1419 2632 3274 3589 854 2078 2846 3219
HIPO new 254 198 116 75 194 231 143 94
MTAG new 155 82 31 15 186 189 91 54

hmax2 is the largest heritability among the individual traits.

We compare HIPO and MTAG in all our simulation settings. Both MTAG and HIPO find many novel loci compared to individual trait analysis and the set of novel loci identified by HIPO and MTAG tend to have some overlaps, but each method identifies some unique ones that is not discovered by the other. Although MTAG tends to find more loci than HIPO in total, HIPO tends to find more novel loci that are not detected by individual trait analysis and MTAG tends to replicate existing findings (Table 2, S9 and S10 Tables). This pattern is more obvious when the sample size is large (N ≥ 100K) and is not sensitive to the LD clumping criterion (S11 Table). HIPO tends to be more powerful in settings of higher dimensions, with the number of novel loci nearly twice of the number identified by MTAG in several cases (S10 Table).

Application to blood lipid data

We applied our method to the Global Lipids Genetics Consortium (GLGC) data [43]. The average NCP decreases from 0.213 for HIPO-D1 to 0.026 for HIPO-D4, with most association signals appears to be associated with the first and second components (S14 Table). This may be due to the fact that LDL and TC are highly correlated with each other and as are HDL and TG, and HIPO-D1 and HIPO-D2 each captures the signal of one pair of traits. HIPO-D1 is positively related to TG, negatively related to HDL and TC and depends weakly on LDL. HIPO-D2 depends mostly on TC. The last component HIPO-D4, which contains very little genetic association signals and hence can be used as a negative control, is positively correlated with TC and negatively with the other three traits. Note that all 4 traits have similar heritability and sample size (S14 Table), hence the difference in weights is likely driven by genetic correlations. The order of λGC and average of empirical χ2 statistic also tracks with the average NCP (Fig 1), suggesting that the observed enrichments are likely due to polygenic effects. We identified twenty novel loci by HIPO-D1 and 4 by HIPO-D2 at genome-wide significant level (p < 5 × 10−8) (Table 3). The number of novel loci changed to 16 for HIPO-D1 and 4 for HIPO-D2 under a more stringent LD-pruning criterion: r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. The pattern of p-values for individual traits show that the proposed method detects novel SNPs that contain moderate degree of association signals across multiple traits. There is very little overlap between new loci found by HIPO-D1 and by HIPO-D2, as expected from genetic orthogonality of the two components (S5 Fig). Among the 24 new loci found by HIPO-D1 and HIPO-D2, 9 are not found by any of the MTAG estimates. MTAG identified 10 novel loci that are not detected by HIPO-D1 or HIPO-D2 (S17 Table). The result is fairly consistent with our simulation studies which shows that MTAG and HIPO tend to find some overlapping and some unique novel loci.

Fig 1. QQ plots for individual traits and underlying HIPO components across blood lipids, psychiatric diseases, social science traits.

Fig 1

Blood lipid traits include HDL, LDL, triglycerides (TG) and total cholesterol (TC). Psychiatric diseases include autism spectrum disorder (ASD), ADHD, bipolar disorder (BIP), major depressive disorder (MDD) and schizophrenia (SCZ). Meta-analysis QQ plot is also included for psychiatric diseases (in green). Social science traits include depressive symptoms (DS), neuroticism (NEU) and subjective well-being (SWB). Genomic control factors and average χ2 statistics are shown in the legend.

Table 3. Novel loci discovered at genome-wide significance level (p < 5 × 10−8) by the first and second HIPO components of blood lipid traits.

SNP CHR MBP Nearest Gene (Distance) pLDL pHDL pTG pTC pHIPO-D1 pHIPO-D2
HIPO-D1
rs4850047 2 3.6 RPS7(+6.244kb) 2.87e-03 (+) 1.15e-04 (+) 8.58e-04 (-) 2.14e-06 (+) 1.13e-09# 7.82e-04
rs2249105 2 65.3 CEP68(0) 8.72e-02 (+) 6.35e-06 (+) 1.89e-06 (-) 4.66e-01 (+) 1.33e-08 7.92e-01
rs2062432 3 123.1 ADCY5(0) 9.78e-01 (+) 3.88e-06 (-) 2.44e-03 (+) 6.75e-02 (-) 4.30e-08 5.52e-01
rs6855363 4 157.7 PDGFC(-12.22kb) 6.44e-01 (-) 3.20e-07 (+) 3.18e-04 (-) 5.89e-01 (+) 2.27e-08 5.44e-01
rs17199964 4 102.7 BANK1(-3.972kb) 3.11e-01 (-) 9.19e-08 (-) 1.27e-01 (+) 9.85e-04 (-) 3.03e-08 2.43e-02
rs10054063 5 173.4 CPEB4(+5.085kb) 7.00e-01 (-) 6.13e-04 (-) 7.72e-07 (+) 2.67e-01 (-) 3.69e-08 7.80e-01
rs11987974 8 36.8 KCNU1(+30.17kb) 7.66e-01 (+) 3.79e-06 (-) 1.85e-06 (+) 7.16e-01 (-) 3.91e-09# 3.32e-01
rs740746 10 115.8 ADRB1(-11.02kb) 4.32e-01 (-) 1.09e-06 (-) 1.61e-03 (+) 1.07e-01 (-) 4.21e-08 5.74e-01
rs10832027 11 13.4 ARNTL(0) 2.53e-02 (+) 1.52e-07 (+) 5.73e-07 (-) 1.87e-02 (+) 3.88e-12# 3.16e-01
rs7938117 11 68.6 CPT1A(0) 3.69e-01 (-) 2.05e-07 (-) 9.49e-06 (+) 4.01e-02 (-) 3.35e-11# 5.30e-01
rs1565228 11 27.6 BDNF-AS(0) 2.46e-01 (-) 3.50e-05 (-) 3.44e-06 (+) 7.19e-02 (-) 2.56e-09# 6.44e-01
rs661171 11 110.0 ZC3H12C(0) 9.06e-03 (+) 9.84e-07 (+) 1.77e-01 (-) 2.70e-06 (+) 2.58e-08 2.23e-04
rs895953 12 122.2 SETD1B(0) 7.23e-01 (-) 1.45e-06 (+) 3.98e-07 (-) 5.43e-01 (+) 2.84e-10# 3.86e-01
rs2384034 12 113.2 RPH3A(-24.86kb) 9.47e-03 (+) 3.61e-07 (+) 3.18e-02 (-) 5.00e-05 (+) 4.39e-09# 2.51e-03
rs11048456 12 26.5 ITPR2(-25.2kb) 8.65e-02 (-) 2.93e-07 (-) 1.59e-03 (+) 5.74e-02 (-) 1.75e-08 3.28e-01
rs721772 15 41.8 RPAP1(0) 5.17e-01 (+) 2.26e-07 (-) 4.30e-05 (+) 7.10e-01 (-) 4.25e-09# 3.75e-01
rs11079810 17 46.2 SKAP1(0) 1.99e-02 (+) 1.71e-07 (+) 2.23e-04 (-) 9.51e-03 (+) 4.30e-10# 1.32e-01
rs4805755 19 32.9 ZNF507(0) 9.34e-01 (-) 5.58e-08 (+) 4.83e-03 (-) 9.85e-02 (+) 7.71e-09# 6.28e-01
rs10408163 19 47.6 ZC3H4(0) 1.47e-01 (-) 9.99e-07 (+) 3.20e-07 (-) 2.70e-01 (-) 8.87e-09 1.65e-02
rs6059932 20 33.2 PIGU(0) 1.30e-01 (+) 5.73e-07 (+) 6.33e-05 (-) 6.33e-02 (+) 1.27e-09# 4.76e-01
HIPO-D2:
rs4683438 3 142.7 LOC100507389(0) 5.70e-05 (-) 6.85e-01 (+) 3.77e-04 (-) 2.87e-07 (-) 5.41e-01 2.99e-08
rs176813 4 69.6 UGT2B15(+63.04kb) 2.62e-05 (+) 1.75e-01 (+) 4.23e-04 (+) 5.68e-08 (+) 5.62e-01 8.10e-09#
rs2268719 6 52.4 TRAM2(0) 7.52e-07 (-) 3.08e-01 (+) 4.14e-02 (-) 6.66e-08 (-) 9.39e-01 2.13e-08
rs7939352 11 78.0 GAB2(0) 2.79e-06 (-) 9.75e-01 (-) 5.47e-03 (-) 2.48e-07 (-) 9.57e-01 3.47e-08

Independent SNPs were identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart. Loci are considered novel if they are not identified at genome-wide significance level through analysis of individual traits. For each lead SNP, p-values for association are shown for HIPO components and for individual traits. The directions of association (+/-) are also shown for each of the individual traits. TG: triglycerides; TC: total cholesterol. HIPO-D1 and HIPO-D2: 1st and 2nd HIPO components. The weights for the first and second HIPO components are: β^HIPO-D1=0.147β^LDL-0.618β^HDL+0.591β^TG-0.469β^TC,β^HIPO-D2=0.206β^LDL-0.017β^HDL+0.228β^TG+0.765β^TC.

# in the last two columns indicates that this locus passes the Bonferroni corrected significance threshold p<5×10-86=8.3×10-9 (11 loci by HIPO-D1 and 1 by HIPO-D2).

Application to psychiatric diseases

Applications of HIPO to Psychiatric Genomics Consortium (PGC) cross-disorder data [20] show that most association signals are captured by HIPO-D1 (S15 Table), which has an average NCP twice larger than that of HIPO-D2. The first HIPO component puts the highest weights on BIP and SCZ, which have the largest heritability and relatively large sample sizes. It is noteworthy that for a few of the strongest signals, HIPO is outperformed by standard meta-analysis, which was implemented in PGC cross-disorder analysis as a way for detecting SNPs that may be associated with multiple traits. The QQ plot of HIPO-D1, however, dominates those for the individual traits and for the standard meta-analysis when p > 1 × 10−8 (Fig 1). This suggests that HIPO is superior to standard meta-analysis in detecting moderate effects, while sacrificing some efficiency for the top hits. The value of λGC and average χ2 -statistics are higher for HIPO-D1 than those for individual traits and standard meta-analysis.

HIPO-D1 discovers one new locus, marked by the lead SNP rs13072940 (p = 1.71 × 10−8), that is not identified by either the individual traits or the meta-analysis. The same loci is identified under LD-pruning criterion r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. The marker rs13072940 shows association with bipolar disorder (pBIP = 0.0026) and schizophrenia (pSCZ = 2.55 × 10−6) but no association with autism spectrum disorder (pASD = 0.97), ADHD (pADHD = 0.70) or major depressive disorder (pMDD = 0.11). The meta-analysis signal (pMeta = 7.02 × 10−6) does not reach genome-wide significance and is, in fact, weaker compared to that from schizophrenia alone. This SNP shows stronger association in more recent larger studies of bipolar disorder [47] (pBIP = 0.0003) and schizophrenia [51] (pSCZ = 1.32 × 10−7), clearly indicating that this is likely to be a true signal underlying multiple PGC traits. MTAG also identifies the same new locus (rs13072940) as HIPO-D1 under the same significance and LD pruning threshold.

Application to social science traits

Application of HIPO to Social Science Genetic Association Consortium studies reveals that most of the genetic variation is captured by HIPO-D1 that has an average NCP twice larger than that of HIPO-D2 (S16 Table). The component is negatively associated with DS and NEU and is positively associated with SWB. The tail region of QQ plot of HIPO-D1 lies close to that of neuroticism, but the values of λGC and average χ2 are substantially larger for HIPO-D1 (Fig 1). HIPO-D1 identifies 12 new loci that are not discovered by individual trait analysis of SSGAC data (Table 4), increasing the total number of genome-wide significant loci from 24 to 36 (S7 Fig). The number of new loci changes to 11 under a more stringent LD pruning criterion: r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. MTAG identifies 14 loci that are not identified by individual trait analysis, including the 12 loci found by HIPO-D1 (S18 Table).

Table 4. Novel loci discovered at genome-wide significance level (p < 5 × 10−8) by HIPO-D1 for social science traits.

SNP CHR MBP Nearest Gene (Distance) pDS pNEU pSWB pHIPO−D1
HIPO-D1
rs2874367* 1 21.3 EIF4G3(0) 6.33e-05 (-) 6.33e-05 (-) 6.33e-05 (+) 1.38e-08
rs11100449* 4 141.0 MAML3(0) 1.47e-05 (-) 6.33e-05 (-) 6.33e-05 (+) 8.02e-09#
rs10475748* 5 164.6 NA 1.47e-05 (-) 5.73e-07 (-) 1.96e-02 (+) 1.90e-08
rs6919210 6 70.6 COL19A1(0) 1.15e-03 (-) 5.73e-07 (-) 1.77e-04 (+) 1.88e-09#
rs6569095 6 120.3 NA 8.58e-04 (+) 2.14e-05 (+) 1.47e-05 (-) 5.94e-09#
rs210899* 6 11.7 ADTRP(0) 1.10e-01 (+) 2.03e-06 (+) 6.33e-05 (-) 3.61e-08
rs2396726 7 114.0 FOXP2(0) 1.77e-04 (+) 3.06e-06 (+) 8.58e-04 (-) 1.04e-08#
rs12701427* 7 4.2 SDK1(0) 1.15e-03 (-) 1.47e-05 (-) 6.33e-05 (+) 1.28e-08
rs9584850* 13 99.1 FARP1(0) 6.33e-05 (-) 1.52e-07 (-) 8.58e-04 (+) 6.50e-10#
rs11644362 16 13.0 SHISA9(-1.379kb) 2.70e-03 (-) 2.46e-04 (-) 3.06e-06 (+) 3.73e-08
rs7239568 18 52.0 C18orf54(+56.37kb) 5.96e-03 (-) 2.14e-05 (-) 9.64e-08 (+) 8.17e-10#
rs1261093* 18 52.9 TCF4(0) 8.58e-04 (+) 9.64e-08 (+) 2.70e-03 (-) 3.92e-09#

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs are assumed to represent independent loci if they are >0.5Mb apart. Loci are considered novel if they were not identified at genome-wide significance level through analysis individual traits. For each lead SNP, p-values for association are shown for HIPO components and for individual traits. The directions of association (+/-) are also shown for each of the individual traits. DS: depressive symptoms; NEU: neuroticism; SWB: subjective well-being. HIPO-D1: 1st HIPO component. Weights for HIPO-D1: β^HIPO-D1=-0.247β^DS-0.607β^NEU+0.588β^SWB. NA in the Nearest Gene column means there is no gene within 200kb of the SNP. SNPs marked by * indicates underlying loci show evidence of replication in the larger data set used in the MTAG paper (see Table 5).

# in the last column indicates that this locus passes Bonferroni corrected significance threshold p<5×10-84=1.25×10-8 (7 loci in total).

We examined evidence of replication of the novel loci based on more recent and larger studies of DS and SWB that were incorporated in the MTAG analysis [36]. As this study reported only a list of top SNPs (p < 1 × 10−5) after stringent LD-pruning (r2 < 0.1), we could not look up the exact lead SNPs that we report for the novel regions (Table 4). Instead, we searched for SNPs in the top list reported by MTAG study that could be considered proxy (D’>0.75) for our lead SNPs. We found 7 of the 12 novel have such proxies and these proxy SNPs show stronger level of association in the more recent MTAG study for at least one of DS and SWB (Table 5).

Table 5. Evidence of replication of novel loci identified by HIPO analysis for social science traits in subsequent larger studies of DS and SWB.

Lead SNP in Novel Loci Proxy SNP Reported in MTAG Study D’ Individual Trait p-value in SSGAC Individual Trait p-value in MTAG Study
DS
rs11100449 rs1877075 0.78 2.00e-06 1.10e-06
rs10475748 rs10045971 0.99 4.51e-02 1.17e-09
rs12701427 rs4723416 0.91 1.59e-03 1.17e-06
rs9584850 rs4772087 1.00 2.42e-03 1.04e-06
rs1261093 rs11876620 0.82 1.58e-04 4.45e-08
SWB
rs2874367 rs12125335 1.00 NA 7.09e-08
rs11100449 rs769664 0.79 3.18e-03 4.59e-07
rs210899 rs10947543 0.93 NA 3.10e-08

Reported are P-values for proxy SNPs (D > 0.75) for individual trait associations in SSGAC data and the more recent MTAG study. Novel loci are identified through analysis of SSGAC which include studies of DS and SWB with sample sizes Neff = 161,460 and N = 298,420, respectively. The MTAG study includes an expanded set of sample with Neff = 354,862 and N = 388,538 for DS and SWB, respectively. DS: depressive symptoms; NEU: neuroticism; SWB: subjective well-being. NA indicates that the proxy SNP is not present in the SSGAC data.

Discussion

In this report, we present a novel method for powerful pleiotropic analysis using summary level data across multiple traits, accounting for both heritability and sample size variations. Application of the proposed method to three groups of genetically related trait identifies a variety of novel and replicable loci that were not detectable by analysis of individual traits at comparable level of confidence. We also conduct extensive simulation studies in realistic settings of large GWAS to demonstrate the ability of the method to maintain type-I error, achieve robustness to population stratification and enhance detection of novel loci. The novel method we introduce for directly simulating summary-level GWAS statistics, preserving expected correlation structure across both traits and SNP markers, will allow rapid evaluation of alternative methods for pleiotropic analysis in settings of large complex GWAS more feasible in the future.

Application of the proposed method provides new insight into the genetic architecture of groups of related traits. For blood lipids, which have similar sample sizes, the average NCPs for HIPO-D1 and HIPO-D2 dominate the other two, suggesting that there are perhaps two unrelated mechanisms through which most genetic markers are associated with the individual cholesterol traits. For psychiatric diseases and social science traits, the top HIPO component dominates the others, indicating that there is perhaps one major genetic mechanism underlying each group of traits. These conjectures are supported by a simple simulation (Section E.2, S1 Appendix). However, given that top HIPO component down weights traits with smaller sample sizes, it is possible that there exist other independent genetic mechanisms related to these traits that could not be captured by the top HIPO component. Nevertheless, HIPO, by taking into account both heritability and sample sizes, provides a clear guideline how many independent sets of tests should be performed across the different traits to capture most of the genetic signals.

Throughout the paper we report results based on significance threshold p < 5 × 10−8. We do not recommend adjusting the significance threshold to account for multiple comparison, since HIPO components are not independent of individual-trait tests. Similar issues will also arise about MTAG or other pleiotropic methods. Investigation of setting up proper threshold is beyond the scope of the current paper. However, even if we apply Bonferroni correction, HIPO is still able to find a substantial number of new loci. For blood lipids, since there are 4 traits and 2 HIPO components under consideration, the Bonferroni adjusted threshold is p<5×10-86=8.3×10-9. HIPO-D1 still discovers 11 new loci under the new threshold and HIPO-D2 still discovers 1 (Table 3). For social science traits, since there are 3 traits and 1 HIPO component under consideration, the Bonferroni-corrected threshold is p<5×10-84=1.25×10-8. HIPO-D1 still discovers 7 new loci under the new threshold (Table 4).

Earlier studies have proposed methods for association analysis in GWAS informed by heritability analysis. For analysis of multivariate traits observed on the same set of individuals, the MaxH [34] method was proposed to conduct association analysis along directions that maximizes trait heritability. HIPO allows a generalization of this approach by taking into account sample size differences and overlaps across studies allowing powerful cross-disorder analysis using only summary-level data across distinct studies.

Another closely related method is MTAG [36], which also utilizes summary level data and LD score regression to estimate genotypic and phenotypic variance-covariance matrices. MTAG, however, performs association tests for each individual trait by improving estimation of the underlying association coefficients using cross-trait variance-covariance structure. In contrast, we propose finding optimal linear combination of association coefficients across traits that will maximize the power for detecting underlying common signals. The advantage of MTAG is that it does associate the SNPs to individual traits and thus has appealing interpretation. However, strictly speaking, MTAG, similar to HIPO, is only a valid method for testing the global null hypothesis of no association of a SNP across any of the traits and may identify a SNP to be associated with a null trait while in truth it is only related to another trait in the same group. The advantage of HIPO is that it directly focuses on optimization of power in orthogonal directions for cross-disorder analysis and can provide significant dimension reduction for analysis of higher dimensional traits. Simulation studies as well as analysis of real data shows that both HIPO and MTAG identify substantial number of novel loci compared to analysis of individual traits (Table 2, S9, S10, S17 and S18 Tables). The sets of novel loci identified by the two methods tend to be substantially non-overlapping indicating that it may be useful to implement both methods for cross-trait analysis. Simulations also show that when the number of traits become larger, HIPO tends to find substantially more novel SNPs than MTAG (S10 Table).

There exists a variety of methods for pleiotropic analysis [30, 31, 35, 38, 39] that aim to optimize power for testing associations with respect to individual SNPs without being informed by heritability. The method ASSET [39], for example, searches through different subsets of traits to find the optimal subset that yields the strongest meta-analysis z statistic for each individual SNP. Methods like HIPO and MTAG, which use estimates of heritability based on genome-wide set of markers, are likely to be more powerful when the underlying traits have strong genetic correlation, such as that observed for psychiatric disorders. In contrast, methods such as ASSET may be more powerful for analysis of groups of traits that have more moderate genetic correlation, such as cancers of different sites [13], for detection of loci with unique but insightful pleiotropic patterns of association. There is potential to develop intermediate methods, which borrows information across SNPs but in a more localized manner, for example, based on functional annotation information [52, 53]. Further research is also merited for implementation of HIPO for very high-dimensional pleiotropic analysis and rare variant association studies, two settings in both of which there could be challenges associated with dealing with noises associated with estimation of genetic variance-covariance matrices.

In conclusion, HIPO provides a novel and powerful method for joint association analysis across multiple traits using summary-level statistics. Application of the method to multiple datasets shows that it provides unique insight into genetic architecture of groups of related traits and can identify substantial number of novel loci compared to analysis of individual traits. Further extension of the method is merited for facilitating more interpretable and parsimonious association analysis across groups of high-dimensional correlated traits.

Our R package is available at https://github.com/gqi/hipo.

Supporting information

S1 Table. Summary of simulation settings.

(PDF)

S2 Table. Type I error rates for HIPO-D2 to HIPO-D4 observed in datasets simulated under covariance structure estimated from blood lipids traits.

See S1 Table 1a and 1b for detailed settings and see Table 1 for type I errors of HIPO-D1.

(PDF)

S3 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of blood lipids traits allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 1c and 1d for detailed settings.

(PDF)

S4 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of psychiatric diseases.

See S1 Table 2a and 2b for detailed settings.

(PDF)

S5 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of psychiatric diseases allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 2c and 2d for detailed settings.

(PDF)

S6 Table. Percentage increase in average number of true discoveries by HIPO compared to the analysis of most heritable trait in simulation studies.

(PDF)

S7 Table. Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of blood lipids.

(PDF)

S8 Table. Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of psychiatric diseases.

(PDF)

S9 Table. Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from psychiatric diseases.

See S1 Table 2a-2d for detailed settings.

(PDF)

S10 Table. Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis from simulation studies of 10 correlated traits.

See S1 Table part 3 for detailed settings.

(PDF)

S11 Table. Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids under alternative LD-clumping threshold.

See 1a-1d in S1 Table for detailed settings.

(PDF)

S12 Table. Type I error rates for MTAG observed in datasets simulated under covariance structure estimated from studies of blood lipids with population stratification.

(PDF)

S13 Table. Type I error rates for HIPO observed in simulated unbalanced case-control datasets.

(PDF)

S14 Table. Weights associated with individual blood lipid traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S15 Table. Weights associated with individual psychiatric diseases and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S16 Table. Weights associated with individual social science traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S17 Table. Novel loci for blood lipids identified by HIPO and MTAG.

Only HIPO-D1 and HIPO-D2 are considered.

(PDF)

S18 Table. Novel loci for social science traits identified by HIPO and MTAG.

Only HIPO-D1 is considered.

(PDF)

S1 Appendix. Details of mathematical derivations.

(PDF)

S1 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITHOUT population stratification effects.

(PDF)

S2 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITH population stratification effects.

(PDF)

S3 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITHOUT population stratification effects.

(PDF)

S4 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITH population stratification effects.

(PDF)

S5 Fig. Venn diagram for the number of loci discovered to be associated with blood lipid traits by individual trait and HIPO analysis.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

S6 Fig. Venn diagram for the number of associated loci identified by individual trait analysis of psychiatric diseases, HIPO-D1 and meta-analysis.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

S7 Fig. Venn diagram for the number of associated loci identified by individual trait analysis of social science traits and HIPO-D1.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work was supported by Bloomberg Distinguished Professorship Endowment at the Johns Hopkins University and in part by the NIH for the Environmental influences of Child Health Outcomes (ECHO) Data Analysis Center (U24OD023382). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2013;42:D1001–D1006. 10.1093/nar/gkt1229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 2017;45:D896–D901. 10.1093/nar/gkw1133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101:5–22. 10.1016/j.ajhg.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Park J-H, Wacholder S, Gail MH, Peters U, Jacobs KB, Chanock SJ et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat Genet. 2010;42:570–575. 10.1038/ng.610 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. 10.1038/ng.608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lee S, Wray N, Goddard M, Visscher P. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 2011;88:294–305. 10.1016/j.ajhg.2011.02.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.So H, Gui AHS, Cherny SS, Sham PC. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet Epidemiol. 2011;35:310–317. 10.1002/gepi.20579 [DOI] [PubMed] [Google Scholar]
  • 8.Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637 10.1371/journal.pgen.1002637 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chatterjee N, Wheeler B, Sampson J, Hartge P, Chanock SJ, Park J-H. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet. 2013;45:400–405. 10.1038/ng.2579 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45:984–994. 10.1038/ng.2711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou JJ, Cho MH, Castaldi PJ, Hersh CP, Silverman EK, Laird NM. Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers. Am J Respir Crit Care Med. 2013;188:941–947. 10.1164/rccm.201302-0263OC [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen G-B, Lee SH, Brion M-JA, Montgomery GW, Wray NR, Radford-Smith GL et al. Estimation and partitioning of (co)heritability of inflammatory bowel disease from GWAS and immunochip data. Hum Mol Genet. 2014;23:4710–4720. 10.1093/hmg/ddu174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sampson JN, Wheeler WA, Yeager M, Panagiotou O, Wang Z, Berndt et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J Natl Cancer Inst. 2015;107:djv279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–1241. 10.1038/ng.3406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang Y, Qi G, Park J-H, Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits and implications for the future. bioRxiv. 2017 [DOI] [PubMed] [Google Scholar]
  • 16.Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sinauer; Sunderland, MA; 1998 [Google Scholar]
  • 17.Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T et al. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89:607–618. 10.1016/j.ajhg.2011.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Visscher PM, Yang J. A plethora of pleiotropy across complex traits. Nat Genet. 2016;48:707–708. 10.1038/ng.3604 [DOI] [PubMed] [Google Scholar]
  • 19.Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds D. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48:709 10.1038/ng.3570 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 2013;381:1371–1379. 10.1016/S0140-6736(12)62129-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang Z, Zhu B, Zhang M, Parikh H, Jia J, Chow CC et al. Imputation and subset-based association analysis across different cancer types identifies multiple independent risk loci in the TERT-CLPTM1L region on chromosome 5p15.33. Hum Mol Genet. 2014;23:6616–6633. 10.1093/hmg/ddu363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Li YR, Li J, Zhao SD, Bradfield JP, Mentch FD, Maggadottir SM et al. Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases. Nat Med. 2015;21:1018–1018. 10.1038/nm.3933 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ellinghaus D, Jostins L, Spain SL, Cortes A, Bethune J, Han B et al. Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat Genet. 2016;48:510–510. 10.1038/ng.3528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–495. 10.1038/nrg3461 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ji S-G, Juran BD, Mucha S, Folseraas T, Jostins L, Melum E et al. Genome-wide association study of primary sclerosing cholangitis identifies new risk loci and quantifies the genetic relationship with inflammatory bowel disease. Nat Genet. 2016;49:269–269. 10.1038/ng.3745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32:9–19. 10.1002/gepi.20257 [DOI] [PubMed] [Google Scholar]
  • 27.Ferreira MAR, Purcell SM. A multivariate test of association. Bioinformatics. 2008;25:132–133. 10.1093/bioinformatics/btn563 [DOI] [PubMed] [Google Scholar]
  • 28.Liu J, Pei Y, Papasian, Chris, Deng. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol. 2009;33:217–227. 10.1002/gepi.20372 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Zhang H, Liu C-T, Wang X. An association test for multiple traits based on the generalized Kendall’s tau. J Am Stat Assoc. 2010;105:473–481. 10.1198/jasa.2009.ap08387 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.O’Reilly PF, Hoggart CJ, Pomyen Y, Calboli FCF, Elliott P, Jarvelin M-R et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE. 2012;7:e34861 10.1371/journal.pone.0034861 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.He Q, Avery CL, Lin D-Y. A general framework for association tests with multivariate traits in large-scale genomics studies. Genet Epidemiol. 2013;37:759–767. 10.1002/gepi.21759 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bolormaa S, Pryce JE, Reverter A, Zhang Y, Barendse W, Kemper K et al. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet. 2014;10:e1004198 10.1371/journal.pgen.1004198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Galesloot TE, Van Steen K, Kiemeney LALM, Janss LL, Vermeulen SH. A comparison of multivariate genome-wide association methods. PLoS ONE. 2014;9:e95923 10.1371/journal.pone.0095923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhou JJ, Cho MH, Lange C, Lutz S, Silverman EK, Laird NM. Integrating multiple correlated phenotypes for genetic association analysis by maximizing heritability. Hum Hered. 2015;79:93–104. 10.1159/000381641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ray D, Pankow JS, Basu S. USAT: A unified score‐based association test for multiple phenotype‐genotype analysis. Genet Epidemiol. 2016;40:20–34. 10.1002/gepi.21937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA et al. MTAG: multi-trait analysis of GWAS. bioRxiv. 2017. 118810. [Google Scholar]
  • 37.Liu J, Wan X, Wang C, Yang C, Zhou X, Yang C. LLR: a latent low-rank approach to colocalizing genetic risk variants in multiple GWAS. Bioinformatics. 2017. btx512. [DOI] [PubMed] [Google Scholar]
  • 38.Yang Q, Wu H, Guo C-Y, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34:444–454. 10.1002/gepi.20497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Bhattacharjee S, Rajaraman P, Jacobs KB, Wheeler WA, Melin BS, Hartge P et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am J Hum Genet. 2012;90:821–835. 10.1016/j.ajhg.2012.03.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Andreassen OA, Thompson WK, Schork AJ, Ripke S, Mattingsdal M, Kelsoe JR et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 2013;9:e1003455 10.1371/journal.pgen.1003455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Maier R, Moser G, Chen G-B, Ripke S, Absher D, Agartz I et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet. 2015;96:283–294. 10.1016/j.ajhg.2014.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Patterson N et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–295. 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45:1274–1283. 10.1038/ng.2797 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Okbay A, Baselmans BML, De Neve J-E, Turley P, Nivard MG, Fontana MA et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat Genet. 2016;48:624–633. 10.1038/ng.3552 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L, Haycock PC et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics. 2017;33:272–279. 10.1093/bioinformatics/btw613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Neale BM, Medland SE, Ripke S, Asherson P, Franke B, Lesch K-P et al. Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry. 2010;49:884–897. 10.1016/j.jaac.2010.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat Genet. 2011;43:977–983. 10.1038/ng.943 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium. A mega-analysis of genome-wide association studies for major depressive disorder. Mol Psychiatry. 2013;18:497–511. 10.1038/mp.2012.21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Schizophrenia, Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011;43:969–976. 10.1038/ng.940 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779 10.1371/journal.pmed.1001779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ripke S, Neale BM, Corvin A, Walters JTR, Farh K-H, Holmans PA et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421 10.1038/nature13595 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Gusev A, Lee S, Trynka G, Finucane H, Vilhjálmsson B, Xu H et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet. 2014;95:535–552. 10.1016/j.ajhg.2014.10.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T et al. A powerful approach to estimating annotation-stratified genetic covariance using GWAS summary statistics. bioRxiv. 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Summary of simulation settings.

(PDF)

S2 Table. Type I error rates for HIPO-D2 to HIPO-D4 observed in datasets simulated under covariance structure estimated from blood lipids traits.

See S1 Table 1a and 1b for detailed settings and see Table 1 for type I errors of HIPO-D1.

(PDF)

S3 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of blood lipids traits allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 1c and 1d for detailed settings.

(PDF)

S4 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of psychiatric diseases.

See S1 Table 2a and 2b for detailed settings.

(PDF)

S5 Table. Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of psychiatric diseases allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 2c and 2d for detailed settings.

(PDF)

S6 Table. Percentage increase in average number of true discoveries by HIPO compared to the analysis of most heritable trait in simulation studies.

(PDF)

S7 Table. Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of blood lipids.

(PDF)

S8 Table. Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of psychiatric diseases.

(PDF)

S9 Table. Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from psychiatric diseases.

See S1 Table 2a-2d for detailed settings.

(PDF)

S10 Table. Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis from simulation studies of 10 correlated traits.

See S1 Table part 3 for detailed settings.

(PDF)

S11 Table. Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids under alternative LD-clumping threshold.

See 1a-1d in S1 Table for detailed settings.

(PDF)

S12 Table. Type I error rates for MTAG observed in datasets simulated under covariance structure estimated from studies of blood lipids with population stratification.

(PDF)

S13 Table. Type I error rates for HIPO observed in simulated unbalanced case-control datasets.

(PDF)

S14 Table. Weights associated with individual blood lipid traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S15 Table. Weights associated with individual psychiatric diseases and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S16 Table. Weights associated with individual social science traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression.

(PDF)

S17 Table. Novel loci for blood lipids identified by HIPO and MTAG.

Only HIPO-D1 and HIPO-D2 are considered.

(PDF)

S18 Table. Novel loci for social science traits identified by HIPO and MTAG.

Only HIPO-D1 is considered.

(PDF)

S1 Appendix. Details of mathematical derivations.

(PDF)

S1 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITHOUT population stratification effects.

(PDF)

S2 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITH population stratification effects.

(PDF)

S3 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITHOUT population stratification effects.

(PDF)

S4 Fig. QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITH population stratification effects.

(PDF)

S5 Fig. Venn diagram for the number of loci discovered to be associated with blood lipid traits by individual trait and HIPO analysis.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

S6 Fig. Venn diagram for the number of associated loci identified by individual trait analysis of psychiatric diseases, HIPO-D1 and meta-analysis.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

S7 Fig. Venn diagram for the number of associated loci identified by individual trait analysis of social science traits and HIPO-D1.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart.

(PDF)

Data Availability Statement

All relevant data are within the paper and its Supporting Information files.


Articles from PLoS Genetics are provided here courtesy of PLOS

RESOURCES