Summary
Genome-wide association studies (GWAS) for complex diseases have focused primarily on single trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed which require individual-level data. Here, we develop metaUSAT, a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. While the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic p-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.
Keywords: Cross-phenotype association, GWAS, Joint modeling, Meta-analysis, METSIM, Multiple traits, Multivariate analysis, Overlapping samples, PheWAS, Pleiotropy, Score test, Summary statistics, T2D-GENES
Introduction
Meta-analysis of multiple independent studies is routinely performed to test genetic association of traits by aggregating information on a large number of individuals. Individual data are often not available due to restrictions on data sharing, and hence analysis using summary statistics proves useful. Combining association results from multiple samples of individuals increases statistical power to detect subtle genetic effects. For example, Willer et al. (2013) meta-analyzed lipid traits from 188, 577 individuals in 60 studies and detected 62 genome-wide significant loci that were not previously associated with lipid levels in humans.
While statistical approaches for analysis of individual-level data have moved from the single-trait-single-marker paradigm (e.g., Kang et al., 2010) to multiple markers (e.g., Wu et al., 2011; Ray et al., 2015), multiple traits (e.g., Ferreira and Purcell, 2009; Ray et al., 2016), and multiple markers and traits (e.g., Basu et al., 2013; Wu and Pankow, 2016), standard approaches for meta-analysis have focused on the analysis of a single trait and a single marker. Many complex-disease-related traits are correlated. Joint analysis of traits borrows information across all traits and may increase power to detect genetic associations by increasing effective sample size (Diggle et al., 2002). For individual-level data, many articles have developed and advocated statistical methods for jointly analyzing correlated traits (see Zhou and Stephens, 2014; Majumdar et al., 2015; Ray and Basu, 2017). Porter and O’Reilly (2017) performed a comprehensive comparison of some of these multi-trait methods.
It is only recently that joint meta-analysis of multiple traits using summary statistics has received attention. Stephens (2013) proposed a unified framework for multiple-traits-single-marker analysis using Bayesian model comparison and model averaging for multivariate regression. This framework allows for approximate testing and explaining genetic associations by using summary statistics. Zhu et al. (2015) proposed a general framework for integrating association evidence using GWAS summary statistics. Their framework can accommodate statistics of multiple continuous or categorical traits, correlated or independent, from a single study or multiple studies. Zhu et al. proposed two tests: SHom (which assumes equal genetic effect across all traits and studies) and SHet (which allows for trait heterogeneity). Kim et al. (2015) proposed an adaptive sum of powered score (aSPU) test, which lacks a closed form null distribution and depends on Monte Carlo simulations to evaluate p-values. Cichonska et al. (2016) proposed metaCCA that tests association of multiple traits with multiple markers using canonical correlation analysis (CCA) (Ferreira and Purcell, 2009) framework.
Here, we propose the novel multivariate meta-analysis approach metaUSAT, a unified score based association test for the meta-analysis of multiple traits with a single marker using GWAS summary statistics. Current multivariate meta-analysis methods are powerful under certain association patterns (such as sparsity of signals, or homogeneity of signals), and there is a need for a robust association test. metaUSAT is based on the theoretical and empirical findings of Ray et al. (2016) regarding complimentary power performances of CCA/MANOVA (multivariate analysis of variance) and sum of squared score (SSU) tests (Pan, 2009) for individual-level data. Ray et al. (2016) demonstrated that MANOVA may lose significant power when the genetic marker is associated with all the traits, and any test statistic, such as SSU, that does not include the trait correlation structure can be more powerful in such a situation. On the other hand, MANOVA is usually more powerful than other tests when a subset of the correlated traits is associated. The true underlying association scenario (which varies from one genetic marker to another) is not known, and a fixed choice of association test may not be powerful enough. metaUSAT seeks to maximize power by adaptively combining the MANOVA and the SSU tests based solely on the univariate summary statistics. Although both metaMANOVA (the MANOVA test based on summary statistics) and the SSU tests are chi-squared distributed, metaUSAT does not have a closed form null distribution. However, it does not require compute intensive permutations to evaluate p-values; instead, we calculate an approximate p-value using a fast one-dimensional numerical integral. metaUSAT retains the flavor of Zhu et al.’s statistics by accommodating summary statistics for continuous and/or binary traits, correlated and/or independent, from one or more studies, which may include overlapping samples. Using metaUSAT, one may perform meta-analysis of a single trait over multiple studies, or multiple traits over one or more studies.
Material & Methods
Model and Notation
Consider a single GWAS with data on n individuals, genotyped on p genetic variants, and measured for K traits. Let Yk be the n × 1 vector of values for the k-th trait and Y be the n × K matrix of all traits for all individuals. For a given SNP, let Xi = 0,1 or 2 be the number of copies of minor alleles for individual i andX be the n×1 vector of genotypes for all individuals. For simplicity, we assume there is no other covariate (note that this assumption can be relaxed easily). For the time being, we are interested in testing association between the SNP and the K correlated traits from a single study.
The usual approach is to test for association of each trait separately and report the summary statistics and the p-values for each trait based on the marginal/univariate model
(Equation 1) |
for continuous traits, or marginal model
(Equation 2) |
for binary traits. For the k-th trait, βk is the genetic effect and our null hypothesis is H0,k : βk = 0. The Wald test statistic for H0,k is Zk = β̂k/ se( β̂k), where β̂k is the maximum likelihood estimate (MLE) of βk and se( β̂k) is its standard error. Under H0,k, Zk has an asymptotic N(0, 1) distribution. However, for k-th and l-th traits, Zk and Zl are not independently distributed if the trait correlation is non-zero. In fact, one can show that corr(Zk, Zl) ≈ corr(Yk, Yl) when the variability in the estimators of βk and βl are ignored (Zhu et al., 2015; Kim et al., 2015).
To test the global null hypothesis of no association with any trait H0 : β1 = ... = βK = 0, one can use the summary statistics Z = (Z1, ...,ZK)′. Under H0, Z has an asymptotic multivariate normal distribution with mean 0 and covariance matrix R, where R is the K × K correlation matrix of the original traits. Details on estimating R are provided in a later subsection.
Existing Methods
Here we describe how summary statistics of the K traits for a given SNP can be used to test H0. Later, we describe how these methods can be used to conduct meta-analysis using summary statistics from multiple GWAS.
minP
The minimum p-value (minP) approach selects the most significant result among the K single trait association tests using the test statistic
(Equation 3) |
Its asymptotic p-value accounting for correlated Z statistics (Conneely and Boehnke, 2007) is given by
where fZ(.) is the multivariate NK(0, R̂) density of Z, R̂ is the estimate of R and tminP is the observed minP statistic. Computation of pminP requires numerical integration, which can be implemented in R using pmvnorm() in the mvtnorm package (Genz et al., 2016).
metaMANOVA
An alternative is to carry out a joint analysis of all the Z statistics using a test similar to the multivariate score:
(Equation 4) |
We will call this test metaMANOVA because of its similarity to MANOVA statistic in the context of testing multiple trait association with a SNP using individual-level data (Ray et al., 2016). Although multiple authors (Bolormaa et al., 2014; Pausch et al., 2016; He et al., 2016) employed this approach, metaMANOVA’s type I error and power have not been explored previously for stringent significance levels.
SHom and SHet
Zhu et al. (2015) proposed a meta-analysis test SHom (similar to O’Brien (1984)’s test for individual-level data):
(Equation 5) |
where W is a diagonal matrix of weights for the Z-statistics. Zhu et al. (2015) took sample sizes for the weights. SHom achieves maximum power when the genetic effects for all traits are equal and in the same direction. Zhu et al. proposed a second statistic Sτ, which seeks to include only Z statistics corresponding to traits with non-zero genetic effects: , where, for a given τ > 0, Zτ is the sub-vector of Z satisfying |Zk| > τ and the sub-matrices Wτ, R̂τ, 1τ are defined similarly. For large enough τ, it is possible to have all |Zk| < τ. In this scenario, set Sτ = 0. Zhu et al. define the test statistic
(Equation 6) |
The null distribution for SHet can be approximated by a gamma distribution and p-value estimated using simulations in Zhu et al.’s R program CPASSOC.
aSPU and SSU
Kim et al. (2015) defined the Sum of Powered Score (SPU) test as , where γ is a positive integer. They constructed multiple SPU(γ) tests, with γ values 1, 2, ...,8 or ∞, that put more weight on traits with larger Z statistics as γ increases. Kim et al. showed that SPU(1) = SHom. SPU(2), also known as the Sum of Squared Score (SSU) statistic, is approximately distributed as under H0, where a, b, d can be estimated from R̂ (Pan, 2009). The aSPU test adaptively selects the SPU test with minimum p-value. The SPU(γ) statistics for γ > 2, and hence the aSPU statistic, do not have closed form null distributions and require Monte Carlo simulations to estimate p-values.
Proposed Method: metaUSAT
In the presence of individual-level data, Ray et al. (2016) proposed a unified score-based association test (USAT) to analyze association of multiple traits with a single SNP. USAT seeks to maximize power by adaptively combining SSU (well suited to scenarios when most or all traits have non-zero genetic affects) and MANOVA (well-suited to most scenarios unless most or all traits are associated). Here, we propose metaUSAT, a meta-analysis version of USAT, that can be calculated using univariate summary statistics. We consider the weighted statistic Tω = ωTmetaMANOVA + (1 − ω)TSSU, ω ∈ [0, 1], where TSSU = Z′Z is the SSU test statistic. Since TmetaMANOVA and TSSU have asymptotic chi-square distributions under H0, for a given weight ω, Tω is approximately distributed as a linear combination of (potentially dependent) chi-squared variables. The p-value pω of Tω can be calculated using many algorithms (e.g., Davies, 1980; Liu et al., 2009). We define metaUSAT as the weighted combination with the most significant p-value:
(Equation 7) |
We consider a grid of 11 equi-spaced values of ω from 0 to 1, and approximate the corresponding p-value using a fast one-dimensional numerical integral (see Supplementary S1).
Estimation of R and its Effect on metaUSAT
To estimate the trait correlation matrix R, we use the Z-statistics of the SNPs which are not associated with any of the K traits (i.e., SNPs with p-values greater than a pre-defined significance threshold, say 10−5, for any trait). Zhu et al. (2015) showed that under the null hypothesis of no association, the correlation matrix of the univariate summary statistics (obtained by calculating the sample correlation matrix R̂ of the Z’s over a large number of null SNPs) is the same as the trait correlation matrix. This result holds even in the presence of covariates in Equation 1 or Equation 2 (Liu and Lin, 2017).
It is noteworthy that the performance of metaUSAT and the other afore-mentioned summary statistic based tests depends on the estimation of R. In a GWAS, we expect most SNPs to be not associated with any trait, and these null SNPs can be conveniently used to estimate R. However, as pointed out by one reviewer, recent evidence from heavily studied complex traits such as height and schizophrenia seems to suggest that these traits are highly polygenic. Consequently, a large portion of the genome in linkage disequilibrium (LD) with the causal variants is also associated with the traits. For the joint analysis of such highly polygenic complex traits using summary statistics, the relation corr(Zk, Zl) ≈ corr(Yk, Yl) may not be valid and the estimate of R will be affected. The extent to which this misspecified R affects the validity of the tests depends on the strength of association (of the non-null SNPs used to estimate R) as well as on the structure of the test statistic. Our simulation experiments (see Supplementary S4) show that if non-null SNPs with low to moderate strengths of association are used to estimate R, the type I error estimates for metaUSAT and minP are largely unaffected while SHom, SHet and metaMANOVA may be heavily affected. It seems to us that test statistics that directly incorporate R (e.g., metaMANOVA) are heavily affected by its misspecification while test statistics incorporating R indirectly only through its null distribution (e.g., minP) are mostly unaffected. The validity of metaUSAT (a data-adaptive minimum p-value approach) is largely unaffected by misspecified estimate of R arising due to polygenicity of traits. It is important to mention that our conclusion is based on a limited simulation experiment. It is beyond the scope of this paper to explore this aspect in more detail.
Extension to Meta-analysis of Multiple GWAS
Consider summary statistics Zjk for association with a given SNP for trait k (k = 1, 2, ...,K) from study j (j = 1, 2, ..., J). Some or all J studies may or may not have overlapping samples. Let Zj be the vector of K summary statistics for study j, Z be the JK×1 vector of summary statistics from all traits across all studies, and β be the corresponding JK×1 vector of effect sizes. We wish to test H0 : β = 0 against the two-sided alternative that at least one of the traits has non-zero genetic effect in at least one of the studies.
For k-th and l-th traits from two studies j and j′, Lin and Sullivan (2009) showed that , where njj′,kl is the number of overlapping samples, and njk & nj′l are the sample sizes in the two studies. When the studies are independent (njj′,kl = 0), summary statistics from the two studies are uncorrelated.
For the perfect overlap scenario (njj′,kl = njk = nj′l), the correlation of summary statistics is approximately same as the correlation of the traits (same as that of a single study with multiple traits). We estimate the JK × JK correlation matrix R from the JK Z-statistics for the SNPs that do not exceed a pre-defined significance threshold (say, p-value = 10−5) for any trait. The formulation of the Z statistic and the estimation of its correlation in this fashion addresses cryptic relatedness arising from overlapping samples in the studies (Zhu et al., 2015; Kim et al., 2015). Once Z and R are defined, we can use any of the existing methods and metaUSAT.
When meta-analyzing across studies, different studies may have varying sample sizes. Since sample sizes may vary widely across traits and/or studies, we suggest weighting the univariate summary statistics by the corresponding sample sizes. If njk is the sample size for trait k in study j, we use weighted statistics to put more weights on statistics coming from larger studies. Note that this weighting scheme is incorporated in SHom (Equation 5) and SHet (Equation 6) statistics.
Simulation Experiments
We conduct simulation experiments to assess type I error and compare power of metaUSAT and the existing methods. For type I error simulations, we consider significance levels α = 10−2, 10−3, . . . , 10−6, 5 × 10−7. For power simulations, we report empirical powers, based on corrected critical values, at level α = 10−4. All analyses used the estimated R based on summary statistics across null replicates.
Simulation 1: A single study
We generated single study of n = 1, 000 unrelated individuals, each measured for K = 5 or 10 traits and a bi-allelic SNP X with MAF 0.1 at Hardy-Weinberg equilibrium. For each individual, we simulate K phenotypes using a multivariate normal linear model: Y K×1 = β01K×1+XβK×1+εK×1 where β0 = 1 and the error ε is simulated from NK(0, σ2R(ρ)). We took R(ρ) as an exchangeable correlation matrix with pair-wise correlation ρ ∈ {0.2, 0.4, 0.6}. For type I error simulations, the genetic effects β are 0 for all K traits. For power simulations, we choose the genetic effect βk for an associated trait so that the SNP explains 0.5% of the trait variance (k = 1, 2, ...,K). This, alongwith the MAF of the SNP, determines the genetic effect sizes (see Basu et al., 2013, ‘Simulations’). We took positive direction of the effect size for all associated traits. The total variance of an associated trait is fixed at 10, which ensures that the variance due to SNP is 0.05 while the residual variance is σ2 = 9.95. We wish to test H0 : β = 0.
Based on 108 null datasets, we estimate type I error of SHom, SHet, minP, metaMANOVA and metaUSAT as the proportion of null datasets that give p-value ≤α. Our literature search did not yield any article where type I errors of all these summary-statistic-based multivariate methods are studied at a level as low as 5×10−7. We do not consider aSPU for type I error analysis because it requires Monte Carlo simulations, making calculations for 108 datasets computationally undesirable. For comparing statistical powers of all methods (including aSPU), we simulate 104 non-null datasets assuming 20% to 100% of the traits are positively associated with the SNP. To avoid clutter, we are not including SSU (a special case of aSPU) in any of these comparisons.
Simulation 2: Two independent studies
We consider two independent studies of 1, 000 independent individuals, each with measurements on a single SNP with MAF 0.1 and 4 traits inspired by the METSIM lipids data on total cholesterol (TC), high-density lipoprotein (HDL), low-density lipoprotein (LDL) and triglycerides (TG). We use the trait correlation matrix Rmetsim (Figure S1(a)) to simulate the 4 traits using the model described in Simulation 1. We consider 5 association scenarios: (i) only TC is associated, (ii) TC and LDL are associated, (iii) TC, LDL and TG are associated, (iv) all 4 traits are associated, and (v) none of the traits is associated. As before, the SNP explains 0.5% of the trait variance when associated. We assume TC, LDL and TG have negative genetic effects while HDL has positive effect when associated. We simulate two study types: “homogeneous” and “heterogeneous”. For “homogeneous” studies, the association pattern of the traits is same across both studies. For “heterogeneous” studies, we assume association scenarios (i)-(iv) in the first study while the traits are not associated (scenario (v)) in the second study. Figure S1(c) shows the estimated correlation matrix. For type I error analysis, we assume scenario (v) for both studies and simulate 107 null datasets.
Simulation 3: Two studies with overlapping samples
We keep everything the same as in Simulation 2 except that the two studies now have 200 overlapping individuals. For “homogeneous” studies, we assume the association pattern is same across the two studies. For “heterogeneous” studies, excluding the overlap, we assume the association scenarios (i)-(iv) in one study while the traits are not associated (scenario (v)) in the other study. For individuals common to both studies, we assume scenario (v). Figure S1(d) shows the estimated correlation matrix, which is similar to the correlation structure of lipid traits from the METSIM and T2D-GENES studies (Figure S1(b)).
Application to Lipids Data
METSIM Study
The METSIM Study is a single-site, longitudinal study of 10, 197 men (aged 45 − 73 years) randomly selected from the population of Kuopio, Finland (Stančáková et al., 2009). Participants were genotyped with the Illumina OmniExpress GWAS chip and the Illumina exome chip. Here we focus on the association statistics of four lipid traits from the first visit: TC, HDL, LDL, TG. Before obtaining the summary statistics, individuals on lipid-lowering medication are removed and TG is log-transformed. The traits are, then, regressed on age and age2, and residuals are inverse-normalized. We focus on 622, 950 autosomal SNPs with MAF ≥ 1%. We used kinship matrix in the mixed model framework of EMMAX (Kang et al., 2010) to account for within-ancestry population structure and relatedness.
T2D-GENES Study
The T2D-GENES consortium carried out exome sequencing on 6, 504 T2D cases and 6, 436 controls from five ancestry groups (Fuchsberger et al., 2016). Here, we consider the 4, 541 individuals of European origin, 983 of which are part of the METSIM study sample. As before, we focus on the four lipid traits. Exclusions, transformations and analysis parallel those for the METSIM lipid traits. Here, we also adjusted sex as a covariate.
Results
Simulation 1: A single study
The type I error estimates of metaUSAT and other methods are presented in Table 1. Regardless of the number of traits and the strength of trait correlations, all methods control type I error for moderate levels (α ≥ 10−4). For more stringent levels, we observe slightly inflated type I errors for all methods except SHom. The inflation seems to increase with increase in number of traits. We note that type I error of metaUSAT is worst at α = 5×10−7; in what follows we correct for this by computing power using empirical threshold. The empirical threshold is based on 105 null replicates.
Table 1.
Level α | Method | K = 5 | K = 10 | ||||
---|---|---|---|---|---|---|---|
|
|
||||||
ρ= 0.2 | ρ = 0.4 | ρ = 0.6 | ρ = 0.2 | ρ = 0.4 | ρ = 0.6 | ||
10−2 | SHom | 0.95 [0.95, 0.95] | 1.00 [1.00, 1.00] | 1.03 [1.03, 1.03] | 1.00 [1.00, 1.00] | 1.07 [1.07, 1.07] | 1.01 [1.01, 1.01] |
SHet | 1.05 [1.05, 1.05] | 1.05 [1.04, 1.05] | 0.98 [0.98, 0.98] | 1.01 [1.01, 1.01] | 1.06 [1.06, 1.06] | 1.02 [1.01, 1.02] | |
minP | 1.03 [1.03, 1.03] | 1.03 [1.03, 1.03] | 1.02 [1.02, 1.02] | 1.04 [1.04, 1.04] | 1.03 [1.03, 1.03] | 1.03 [1.03, 1.03] | |
metaMANOVA | 1.05 [1.05, 1.05] | 1.05 [1.04, 1.05] | 0.98 [0.98, 0.98] | 1.04 [1.04, 1.04] | 0.97 [0.97, 0.97] | 1.06 [1.06, 1.06] | |
metaUSAT | 0.82 [0.82, 0.82] | 0.92 [0.92, 0.92] | 0.93 [0.93, 0.93] | 0.93 [0.93, 0.93] | 1.01 [1.01, 1.01] | 1.06 [1.06, 1.06] | |
| |||||||
10−3 | SHom | 0.93 [0.93, 0.93] | 1.00 [1.00, 1.00] | 1.03 [1.03, 1.03] | 1.00 [1.00, 1.00] | 1.11 [1.11, 1.12] | 1.02 [1.02, 1.03] |
SHet | 1.11 [1.10, 1.11] | 1.12 [1.11, 1.12] | 1.02 [1.02, 1.02] | 1.01 [1.01, 1.02] | 1.10 [1.09, 1.10] | 1.02 [1.02, 1.03] | |
minP | 1.05 [1.05, 1.06] | 1.05 [1.05, 1.06] | 1.05 [1.05, 1.06] | 1.07 [1.07, 1.08] | 1.06 [1.06, 1.06] | 1.07 [1.07, 1.07] | |
metaMANOVA | 1.09 [1.09, 1.10] | 1.08 [1.08, 1.08] | 0.99 [0.99, 1.00] | 1.08 [1.07, 1.08] | 0.98 [0.98, 0.98] | 1.11 [1.11, 1.12] | |
metaUSAT | 0.90 [0.90, 0.90] | 1.00 [1.00, 1.00] | 1.02 [1.01, 1.02] | 1.05 [1.05, 1.05] | 1.13 [1.12, 1.13] | 1.19 [1.18, 1.19] | |
| |||||||
10−4 | SHom | 0.90 [0.89, 0.91] | 1.00 [1.00, 1.00] | 1.09 [1.08, 1.10] | 1.01 [1.00, 1.02] | 1.17 [1.16, 1.18] | 1.04 [1.03, 1.05] |
SHet | 1.18 [1.17, 1.19] | 1.22 [1.21, 1.23] | 1.09 [1.08, 1.10] | 1.04 [1.03, 1.05] | 1.14 [1.13, 1.15] | 1.05 [1.04, 1.06] | |
minP | 1.10 [1.09, 1.11] | 1.10 [1.09, 1.11] | 1.13 [1.12, 1.14] | 1.13 [1.12, 1.14] | 1.11 [1.10, 1.12] | 1.13 [1.12, 1.14] | |
metaMANOVA | 1.14 [1.13, 1.15] | 1.13 [1.12, 1.14] | 1.03 [1.02, 1.04] | 1.11 [1.10, 1.12] | 0.99 [0.98, 1.00] | 1.17 [1.16, 1.18] | |
metaUSAT | 1.21 [1.20, 1.22] | 1.29 [1.28, 1.30] | 1.18 [1.17, 1.19] | 1.59 [1.58, 1.61] | 1.49 [1.48, 1.51] | 1.39 [1.38, 1.40] | |
| |||||||
10−5 | SHom | 0.88 [0.85, 0.91] | 1.00 [0.97, 1.03] | 1.12 [1.09, 1.15] | 1.07 [1.03, 1.10] | 1.28 [1.25, 1.32] | 1.11 [1.08, 1.14] |
SHet | 1.31 [1.27, 1.34] | 1.36 [1.32, 1.40] | 1.17 [1.14, 1.21] | 1.12 [1.09, 1.16] | 1.27 [1.23, 1.31] | 1.12 [1.09, 1.16] | |
minP | 1.14 [1.10, 1.17] | 1.16 [1.12, 1.19] | 1.30 [1.27, 1.34] | 1.23 [1.20, 1.27] | 1.13 [1.09, 1.16] | 1.25 [1.21, 1.28] | |
metaMANOVA | 1.17 [1.14, 1.20] | 1.16 [1.12, 1.19] | 1.03 [1.00, 1.06] | 1.19 [1.16, 1.23] | 1.03 [1.00, 1.06] | 1.31 [1.27, 1.34] | |
metaUSAT | 1.38 [1.35, 1.42] | 1.45 [1.41, 1.49] | 1.28 [1.25, 1.45] | 1.92 [1.88, 1.97] | 1.74 [1.70, 1.79] | 1.58 [1.54, 1.62] | |
| |||||||
10−6 | SHom | 0.80 [0.71, 0.89] | 0.92 [0.82, 1.02] | 1.11 [1.00, 1.22] | 1.12 [1.01, 1.23] | 1.49 [1.37, 1.61] | 1.26 [1.15, 1.38] |
SHet | 1.44 [1.32, 1.56] | 1.43 [1.31, 1.55] | 1.26 [1.15, 1.37] | 1.21 [1.10, 1.32] | 1.47 [1.35, 1.59] | 1.26 [1.15, 1.38] | |
minP | 1.31 [1.20, 1.42] | 1.27 [1.16, 1.38] | 1.64 [1.51, 1.77] | 1.30 [1.19, 1.41] | 1.16 [1.05, 1.27] | 1.45 [1.33, 1.57] | |
metaMANOVA | 1.19 [1.08, 1.30] | 1.10 [0.99, 1.21] | 1.00 [0.90, 1.10] | 1.32 [1.21, 1.43] | 1.23 [1.12, 1.34] | 1.54 [1.42, 1.66] | |
metaUSAT | 1.46 [1.34, 1.58] | 1.48 [1.36, 1.60] | 1.33 [1.22, 1.45] | 2.38 [2.23, 2.53] | 2.21 [2.06, 2.36] | 2.08 [1.94, 2.22] | |
| |||||||
5 × 10−7 | SHom | 0.72 [0.60, 0.84] | 0.82 [0.69, 0.95] | 1.00 [0.86, 1.14] | 1.20 [1.05, 1.35] | 1.50 [1.33, 1.67] | 1.25 [1.09, 1.41] |
SHet | 1.30 [1.14, 1.46] | 1.42 [1.25, 1.59] | 1.32 [1.16, 1.48] | 1.30 [1.14, 1.46] | 1.38 [1.21, 1.55] | 1.17 [1.02, 1.32] | |
minP | 1.46 [1.29, 1.63] | 1.30 [1.14, 1.46] | 1.90 [1.71, 2.10] | 1.42 [1.25, 1.59] | 1.08 [0.93, 1.23] | 1.64 [1.46, 1.82] | |
metaMANOVA | 1.20 [1.05, 1.36] | 1.25 [1.09, 1.41] | 1.15 [1.00, 1.30] | 1.32 [1.16, 1.48] | 1.20 [1.05, 1.35] | 1.62 [1.44, 1.80] | |
metaUSAT | 1.54 [1.36, 1.72] | 1.74 [1.55, 1.92] | 1.54 [1.36, 1.71] | 2.52 [2.30, 2.74] | 2.42 [2.20, 2.64] | 2.18 [1.97, 2.39] |
Figure 1 summarizes the empirical powers (based on corrected critical values) of all methods. We observe that as correlation becomes stronger and the number of associated traits increase, SHom, minP and aSPU lose power in most association scenarios. SHet is dominated by metaMANOVA, which is usually most powerful. However, metaMANOVA loses power considerably as the proportion of associated traits increases. This phenomenon of metaMANOVA’s power loss is the same as what Ray et al. (2016) observed for MANOVA (for analyzing individual-level data) and provided an explanation for. When most or all of the traits are associated, aSPU and SHom are quite powerful. Irrespective of the number of associated traits and the strength of correlation, metaUSAT, being data-adaptive, has near optimal power to detect association at all scenarios. Results for marker with MAF 0.5 (not shown) are qualitatively similar. Apart from exchangeable correlation, we also consider an AR1(ρ) correlation structure (auto-regressive correlation matrix of order 1 with parameter ρ) and, as before, we find metaUSAT’s power to be robust across association scenarios (Figure S2).
Simulation 2: Two independent studies
The estimated correlation matrix, based on 5, 000 null summary statistics, is given in Figure S1(c). Figures S1(a) and S1(c) show that trait correlations can be approximated by the correlations of summary statistics. Type I error estimates (Table S1) indicate all methods control type I error for low error levels. Table 2 suggests SHet, metaMANOVA, and metaUSAT are usually most powerful. metaMANOVA and metaUSAT have similar powers. SHom and minP are least powerful in most cases. aSPU is least powerful when a small proportion of traits is associated. Results for MAF 0.5 (not shown) are qualitatively similar. We also conducted this power comparison for binary traits and found metaUSAT to be robust across association scenarios (Table S4).
Table 2.
Study type | No. of traits associated | Meta-analysis method | |||||
---|---|---|---|---|---|---|---|
| |||||||
SHom | SHet | minP | aSPU | metaMANOVA | metaUSAT | ||
1 | 0.999 | 0.923 | 0.034 | 0.009 | 1.000 | 1.000 | |
Homogeneous | 2 | 0.000 | 0.111 | 0.046 | 0.082 | 0.151 | 0.133 |
3 | 0.306 | 0.254 | 0.078 | 0.364 | 0.250 | 0.285 | |
4 | 0.009 | 0.725 | 0.100 | 0.387 | 0.665 | 0.632 | |
| |||||||
1 | 0.357 | 0.661 | 0.019 | 0.004 | 0.995 | 0.992 | |
Heterogeneous | 2 | 0.000 | 0.016 | 0.024 | 0.019 | 0.023 | 0.020 |
3 | 0.017 | 0.036 | 0.035 | 0.044 | 0.038 | 0.045 | |
4 | 0.001 | 0.194 | 0.050 | 0.068 | 0.159 | 0.141 |
Simulation 3: Two studies with overlapping samples
Type I error estimates (Table S2) are as expected from the earlier type I error analyses. Empirical powers (Table 3) of the methods in the presence of overlapping samples are similar to the simulation without shared individuals (Table 2). We observed similar conclusions when this power comparison is conducted for binary traits (Table S5).
Table 3.
Study type | No. of traits associated | Meta-analysis method | |||||
---|---|---|---|---|---|---|---|
| |||||||
SHom | SHet | minP | aSPU | metaMANOVA | metaUSAT | ||
1 | 0.984 | 0.852 | 0.034 | 0.018 | 1.000 | 1.000 | |
Homogeneous | 2 | 0.000 | 0.048 | 0.051 | 0.099 | 0.083 | 0.093 |
3 | 0.244 | 0.127 | 0.077 | 0.300 | 0.146 | 0.216 | |
4 | 0.005 | 0.485 | 0.103 | 0.435 | 0.456 | 0.472 | |
| |||||||
1 | 0.101 | 0.495 | 0.006 | 0.001 | 0.865 | 0.807 | |
Heterogeneous | 2 | 0.000 | 0.004 | 0.009 | 0.006 | 0.007 | 0.006 |
3 | 0.007 | 0.009 | 0.011 | 0.012 | 0.012 | 0.012 | |
4 | 0.001 | 0.038 | 0.018 | 0.026 | 0.041 | 0.034 |
METSIM Study: Joint analysis of lipid traits
Single-trait analysis identified 118 associated variants at the 4-trait Bonferroni corrected threshold of 1.25×10−8 (Figure 2(a)). metaMANOVA and metaUSAT respectively identified 159 and 158 associated variants at threshold 5 × 10−8. To identify independent association signals, we grouped significant variants (with pairwise distance < 500 kb) into loci using LD r2 > 0.1. Both metaMANOVA and metaUSAT identified 28 such independent loci, 27 of which (except rs3093032, a 3′-UTR variant in ICAM1 gene) are known to be associated with lipids from published literature (Table S6). Additionally, we jointly analyzed individual-level data on these lipid traits using USAT. Figure 2(b) shows concordance of p-values based on individual-level data and p-values based on summary statistics.
METSIM + T2D-GENES Studies: Meta-analysis of a single trait from studies with overlapping samples
We tested genetic associations of TC with 31, 897 variants (MAF ≥ 1%) using summary statistics from METSIM and T2D-GENES studies. metaUSAT, metaMANOVA and single-trait analyses respectively found 12, 12 and 9 SNPs as significant (Figure 3(a)). Published literature indicate that signals identified by metaUSAT (or metaMANOVA) are known to be associated with cholesterol levels (Table S7). Figure 4(a) plots the metaUSAT p-values when overlap is present against metaUSAT p-values when the overlapping individuals are excluded from the METSIM sample. Concordance of the p-values suggest metaUSAT appropriately accounted for overlapping samples.
METSIM + T2D-GENES Studies: Joint meta-analysis of lipid traits from studies with overlapping samples
metaUSAT, metaMANOVA and single-trait analysis respectively found 26, 22 and 19 SNPs as significant (Figure 3(b)). metaMANOVA and metaUSAT detected more signals by borrowing information from correlated traits across studies. All of the signals found by both metaMANOVA and metaUSAT are known to be associated with lipid levels in humans from previous studies (Table S8). All the SNPs detected by metaMANOVA and by independent analysis of each trait were identified by metaUSAT. Further, metaUSAT exclusively reports 4 significant SNPs (of which 3 are independent) that metaMANOVA fails to find (Table 4). For these SNPs, we also report the empirical p-values (calculated using 8.5 × 109 Monte Carlo simulations) to ensure these are not false associations detected as a result of slightly inflated type I error of metaUSAT at stringent error levels. Details of this empirical p-value calculation of metaUSAT are provided in Supplementary S2. Finally, in Figure 4(b), we again observe concordance of metaUSAT p-values with and without shared individuals.
Table 4.
rsID | chr | position | p-value | empirical p-value metaUSAT | Known association result | |
---|---|---|---|---|---|---|
| ||||||
meta-USAT | meta-MANOVA | |||||
rs2483205 | 1 | 55518316 | 2.5 × 10−8 | 1.1 × 10−7 | 3.2 × 10−8 | Lipids, Lipoprotein fractions1 |
rs1367117† | 2 | 21263900 | 1.5 × 10−9 | 2.3 × 10−7 | 2.8 × 10−9 | Lipids, Lipoprotein fractions2 |
rs2304130 | 19 | 19789528 | 1.5 × 10−9 | 1.8 × 10−6 | 3.3 × 10−9 | Lipids, Lipoprotein fractions, T2D3 |
near many known GWAS hits for lipids (Surakka et al., 2015), lipoprotein fractions (Kettunen et al., 2012), cardiovascular endpoints (Kathiresan et al., 2009).
Illumina OmniExpress Exome Chip ID is exm176096.
known GWAS hit for lipids (Teslovich et al., 2010; Willer et al., 2013; Surakka et al., 2015)
known GWAS hit for lipids (Kristiansson et al., 2012; Willer et al., 2013)
Discussion
MostGWAS have focused on testing genetic association to single traits. Several recent articles have advocated the joint analysis of multiple traits for improving statistical power to detect associated genetic variants. In this article, we propose a new method for multivariate meta-analysis, metaUSAT, an extension of our multivariate association test USAT (Ray et al., 2016). For a given genetic variant, metaUSAT tests the association of multiple traits from a single/multiple studies using univariate summary statistics. Importantly, it bypasses the need for individual-level data, which is often unavailable or difficult to obtain.
Our simulation experiments and real data analyses establish that metaUSAT is often more powerful than any of the existing tests for multivariate meta-analysis. It can be especially advantageous in detecting highly pleiotropic variants that simultaneously influence multiple traits. Apart from proposing new method metaUSAT, we also study power and type I error performances of metaMANOVA and other summary-statistic-based multi-trait methods at stringent error levels. metaUSAT and metaMANOVA can accurately control type I error for moderate α levels, but produce slightly inflated type I error rates at very small α levels (like the other methods). We found that metaMANOVA has a serious drawback: it may fail to detect association when most or all traits are associated (this behavior explored by Ray et al. (2016) in detail). The joint analysis of all lipid traits using METSIM and T2D-GENES studies further confirmed this. The power of metaMANOVA (and other multivariate tests) depends on a complex interplay of the number of truly associated traits, their correlation structure and the directions of the signals. The underlying association scenario changes from one variant to another, and is not known a priori for any real dataset. There is no uniformly most powerful multivariate test, and a particular choice of association test may not be powerful enough to detect true signals. metaUSAT, being data-adaptive in nature, is less affected by the true (unknown) association scenario, and proves to be a robust yet computationally efficient choice for investigators.
The assumption of equal genetic effects across traits and across studies is hardly tenable, making SHom unlikely to be powerful, especially when there is a moderate to large number of traits. aSPU relies on compute intensive p-value calculation approach, which is not feasible when analyzing large GWAS data. SHet is usually dominated by metaMANOVA. On the other hand, metaUSAT is at least as powerful as metaMANOVA and a fast p-value calculation approach makes it suitable for testing genetic associations across multiple traits from multiple large-scale genome wide studies. Power of metaUSAT is robust to the proportion of associated traits. To alleviate any concern of inflated association signals of metaUSAT at stringent levels, we can calculate empirical metaUSAT p-values (as described in Supplementary S2). This need not be done for all variants; instead we can focus only on the handful of variants that have metaUSAT p-values just crossing the chosen threshold.
metaUSAT can be used in a few different ways. We can test association of one or more traits from a single or multiple studies, which may or may not be independent. metaUSAT does not assume homogeneity of trait effects across studies. If the studies are nearly independent and the trait effects are believed to be homogeneous across studies, we can use meta-analyzed summary statistics for each trait (e.g., Z-statistic output from METAL (Willer et al., 2010)) to perform joint meta-analysis of multiple traits. metaUSAT, also, does not require the independence of samples. When samples are related (e.g., in family-based GWAS), metaUSAT can use summary statistics from EMMAX (or other univariate mixed model framework) to appropriately test for genetic associations.
A potentially important contribution of metaUSAT can be in the emerging field of phenome wide association studies (PheWAS) based on epidemiological cohorts. PheWAS systematically analyzes the impact of a genetic variant on a wide variety of human traits. Restrictions on data sharing necessitate use of meta-analysis for PheWAS (Bush et al., 2016). In this age of using publicly available data for increasing power and decreasing sequencing costs, overlapping samples may be a concern when it comes to meta-analysis. Furthermore, current single-trait meta-analysis approach for PheWAS is burdened by multiple comparison testing both at the variant level and at the trait level (Hebbring, 2014). We recommend using metaUSAT to overcome these challenges.
Supplementary Material
Acknowledgments
The authors thank Anne Jackson for her help in obtaining summary statistics for the METSIM study. This research was supported by NIH grants HG000376 and DK062370 (MB).
Footnotes
Supplemental Data include additional figures and tables, and can be found with this article online.
Web Resources
We implemented metaUSAT in R. The software can be found in GitHub (https://github.com/RayDebashree/metaUSAT).
References
- Basu S, Zhang Y, Ray D, Miller M, Iacono W, McGue M. A rapid gene-based genome-wide association test with multivariate traits. Hum Hered. 2013;76:53–63. doi: 10.1159/000356016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolormaa S, Pryce J, Reverter A, Zhang Y, Barendse W, Kemper K, Tier B, Savin K, Hayes B, Goddard M. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet. 2014;10:e1004198. doi: 10.1371/journal.pgen.1004198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bush WS, Oetjens MT, Crawford DC. Unravelling the human genomephenome relationship using phenome-wide association studies. Nat Rev Genet. 2016;17:129–145. doi: 10.1038/nrg.2015.36. [DOI] [PubMed] [Google Scholar]
- Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, et al. metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics. 2016;32:1981–1989. doi: 10.1093/bioinformatics/btw052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. Am J Hum Genet. 2007;81:1158–1168. doi: 10.1086/522036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies R. Algorithm AS 155: The distribution of a linear combination of chi-square random variables. J R Stat Soc Series C Appl Stat. 1980;29:323–333. [Google Scholar]
- Diggle P, Heagerty P, Liang K-Y, Zeger SL. Analysis of longitudinal data. Oxford University Press; 2002. [Google Scholar]
- Ferreira M, Purcell S. A multivariate test of association. Bioinformatics. 2009;25:132–133. doi: 10.1093/bioinformatics/btn563. [DOI] [PubMed] [Google Scholar]
- Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, Ma C, Fontanillas P, Moutsianas L, McCarthy DJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536:41–47. doi: 10.1038/nature18642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T. R package version 1.0-5. 2016. mvtnorm: Multivariate Normal and t Distributions. [Google Scholar]
- He L, Kernogitski Y, Kulminskaya I, Loika Y, Arbeev K, Loiko E, Bagley O, Duan M, Yashkin A, Ukraintseva S, et al. Pleiotropic meta-analyses of longitudinal studies discover novel genetic variants associated with age-related diseases. Front Genet. 2016;7:179. doi: 10.3389/fgene.2016.00179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hebbring SJ. The challenges, advantages and future of phenome-wide association studies. Immunology. 2014;141:157–165. doi: 10.1111/imm.12195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, Sabatti C, Eskin E, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kathiresan S, Voight BF, Purcell S, Musunuru K, Ardissino D, Mannucci PM, Anand S, Engert JC, Samani NJ, Schunkert H, et al. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet. 2009;41:334–341. doi: 10.1038/ng.327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kettunen J, Tukiainen T, Sarin AP, Ortega-Alonso A, Tikkanen E, Lyytikäinen LP, Kangas AJ, Soininen P, Würtz P, Silander K, et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet. 2012;44:269–276. doi: 10.1038/ng.1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim J, Bai Y, Pan W. An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet Epidemiol. 2015;39:651–663. doi: 10.1002/gepi.21931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kristiansson K, Perola M, Tikkanen E, Kettunen J, Surakka I, Havulinna AS, Stančáková A, Barnes C, Widen E, Kajantie E, et al. Genome-wide screen for metabolic syndrome susceptibility loci reveals strong lipid gene contribution but no evidence for common genetic basis for clustering of metabolic syndrome traits. Circ Cardiovasc Genet. 2012;5:242–249. doi: 10.1161/CIRCGENETICS.111.961482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin DY, Sullivan P. Meta-analysis of genome-wide association studies with overlapping subjects. Am J Hum Genet. 2009;85:862–872. doi: 10.1016/j.ajhg.2009.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Tang Y, Zhang H. A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables. Comput Stat Data Anal. 2009;53:853–856. [Google Scholar]
- Liu Z, Lin X. Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics. 2017 doi: 10.1111/biom.12735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Majumdar A, Witte JS, Ghosh S. Semiparametric allelic tests for mapping multiple phenotypes: Binomial regression and mahalanobis distance. Genet Epidemiol. 2015;39:635–650. doi: 10.1002/gepi.21930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Brien P. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
- Pan W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol. 2009;33:497–507. doi: 10.1002/gepi.20402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pausch H, Emmerling R, Schwarzenbacher H, Fries R. A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. Genet Sel Evol. 2016;48:1. doi: 10.1186/s12711-016-0190-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Porter H, O’Reilly P. Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci Rep. 2017:7. doi: 10.1038/srep38837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ray D, Basu S. A novel association test for multiple secondary phenotypes from a case-control GWAS. Genet Epidemiol. 2017:41. doi: 10.1002/gepi.22045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ray D, Li X, Pan W, Pankow JS, Basu S. A Bayesian partitioning model for detection of multilocus effects in case-control studies. Hum Hered. 2015;79:69–79. doi: 10.1159/000369858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ray D, Pankow JS, Basu S. USAT: a unified score-based association test for multiple phenotype-genotype analysis. Genet Epidemiol. 2016;40:20–34. doi: 10.1002/gepi.21937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stančáková A, Javorský M, Kuulasmaa T, Haffner S, Kuusisto J, Laakso M. Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6416 Finnish men. Diabetes. 2009;58:1212–1221. doi: 10.2337/db08-1607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephens M. A unified framework for association analysis with multiple related phenotypes. PLoS One. 2013;8:e65245. doi: 10.1371/journal.pone.0065245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Surakka I, Horikoshi M, Mägi R, Sarin AP, Mahajan A, Lagou V, Marullo L, Ferreira T, Miraglio B, Timonen S, et al. The impact of low-frequency and rare variants on lipid levels. Nat Genet. 2015;47:589–597. doi: 10.1038/ng.3300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S, Kanoni S, Ganna A, Chen J, Buchkovich ML, et al. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu B, Pankow JS. On sample size and power calculation for variant set-based association tests. Ann Hum Genet. 2016;80:136–143. doi: 10.1111/ahg.12147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu M, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N, Smith JA, Yanek LR, Sun YV, Edwards TL, et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet. 2015;96:21–36. doi: 10.1016/j.ajhg.2014.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.