Joint analysis of multiple traits in rare variant association studies

Zhenchuan Wang; Xuexia Wang; Qiuying Sha; Shuanglin Zhang

doi:10.1111/ahg.12149

. Author manuscript; available in PMC: 2017 May 1.

Published in final edited form as: Ann Hum Genet. 2016 Mar 16;80(3):162–171. doi: 10.1111/ahg.12149

Joint analysis of multiple traits in rare variant association studies

Zhenchuan Wang ¹, Xuexia Wang ², Qiuying Sha ¹, Shuanglin Zhang ^1,^*

PMCID: PMC4836983 NIHMSID: NIHMS756704 PMID: 26990300

SUMMARY

The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, the majority of existing methods for the joint analysis of multiple traits test association between one common variant and multiple traits. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. Current statistical methods for rare variant association studies are for one single trait only. In this paper, we propose an Adaptive Weighting Reverse Regression (AWRR) method to test association between multiple traits and rare variants in a genomic region. AWRR is robust to the directions of effects of causal variants and is also robust to the directions of association of traits. Using extensive simulation studies, we compare the performance of AWRR with canonical correlation analysis (CCA), Single-TOW, and the Weighted Sum Reverse Regression (WSRR). Our results show that, in all of the simulation scenarios, AWRR is consistently more powerful than CCA. In most scenarios, AWRR is more powerful than Single-TOW and WSRR.

Keywords: rare variants, multiple traits, association studies, multiple correlated phenotypes, pleiotropy

Introduction

There is increasing evidence showing that pleiotropy, the effect of one variant on multiple traits, is a widespread phenomenon in complex diseases (Sivakumaran et al., 2011). Furthermore, in genetic association studies of complex diseases, multiple related traits are usually measured. For example, hypertension is evaluated using systolic and diastolic blood pressures, the Metabolic Syndrome is based on observing three of five criteria (Sattar et al., 2008), and neuropsychiatric disorders depend on a range of overlapping clinical characteristics (O'Reilly et al., 2012). Although most published genome-wide association studies (GWAS) analyze each of the related traits separately, the joint analysis of multiple traits not only can increase statistical power to detect genetic variants (Solovieff et al., 2013; Stephens, 2013; Yang & Wang, 2012; Zhou & Stephens, 2014), but can also be crucial to understand the genetic architecture of the disease of interest (Aschard et al., 2014). Thus, the joint analysis of multiple traits has recently become popular. Several statistical methods for the joint analysis of multiple traits have been developed. These methods can be roughly divided into three groups: regression methods, combining test statistics from univariate analysis, and dimension reduction methods. Regression methods include mixed effects models (Korte et al., 2012; Zhou & Stephens, 2014) and reverse regression models (O'Reilly et al., 2012; Yan et al., 2013). By modeling the covariance structure of correlated traits and dependence structure between individuals, mixed effects models not only can incorporate multiple correlated traits, but also can be robust to population stratification. Reverse regression models consider genotypes as the response variable and all the traits as independent variables, therefore, reverse regression models do not need to know the complex distributions of the traits and can be applied to a large number of mixed types of traits. For combining the test statistics from univariate analysis, one first obtains univariate test statistics by performing association tests for each trait individually and then combines the univariate test statistics by linear combinations (O'Brien, 1984; van der Sluis et al., 2013; Yang et al., 2010). The dimension reduction methods include canonical correlation analysis (CCA) (Tang & Ferreira, 2012), principal components of traits (PCT) (Aschard et al., 2014), and principal components of heritability (PCH) (Klei et al., 2008; Lange et al., 2004; Ott & Rabinowitz, 1999). CCA is to find a linear combination of traits and a linear combination of genotypes at multiple variants such that the correlation between the two linear combinations reaches its maximum. PCT is the principal component analysis to the traits. The PCT methods are usually based on the first PC or first few PCs of the traits (Feng et al., 2007; Klei et al., 2008). Aschard et al. (2014) showed that contrary to the widespread practice, tests based on only the first few PCs often have low power, whereas combining signals across all PCs can have greater power. PCH is to find a linear combination of multiple traits such that this linear combination has the maximum heritability.

Almost all of the aforementioned methods are to test association between one common variant and multiple traits. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants (Li & Leal, 2008). Recent studies show that complex diseases are caused by both common and rare variants (Bodmer & Bonilla, 2008; Kang et al., 2010; Pritchard, 2001; Pritchard & Cox, 2002; Stratton & Rahman, 2008; Teer & Mullikin, 2010; Walsh & King, 2007). Next-generation sequencing technology allows sequencing of the whole genome of large groups of individuals, and thus makes rare variant association studies feasible (Andres et al., 2007; Metzker, 2010). Recently, statistical methods for rare variant association studies with a single trait have been developed by summarizing genotype information from multiple variants. These methods include burden tests (Li & Leal, 2008; Madsen & Browning, 2009; Morgenthaler & Thilly, 2007; Price et al., 2010; Zawistowski et al., 2010), quadratic tests (Neale et al., 2011; Sha et al., 2012; Wu et al., 2011), and combined tests (Derkach et al., 2013; Lee et al., 2012; Sha & Zhang, 2014). Burden tests collapse rare variants in a genomic region into a single burden variable and then regress the trait on the burden variable to test for the cumulative effects of rare variants in the region. These tests implicitly assume that all rare variants are causal and that the directions of the effects are all the same. Quadratic tests include tests with statistics of quadratic form of score vector, as well as adaptive weighting methods. These tests are robust to the directions of the effects of causal variants and are less affected by neutral variants than burden tests. Burden tests can only outperform quadratic tests when most of the rare variants are causal and the directions of the effects of causal variants are all the same. Combined tests combine information from burden tests, quadratic tests, and possibly other tests aiming to have advantages of multiple tests.

In this article, we propose an adaptive weighting reverse regression (AWRR) method to test association between multiple traits and rare variants in a genomic region. In AWRR, we first propose adaptive weights to collapse genotypes. Then, we use the score test to test association based on the reverse regression, in which collapsed genotypes is treated as the response variable and multiple traits are treated as independent variables. Using extensive simulation studies, we compare the performance of AWRR with CCA, Single-TOW, and the Weighted Sum Reverse Regression (WSRR). In Single-TOW, we first calculate the TOW statistic (Sha et al., 2012) to test the association between each trait and variants in a genomic region and then the statistic of Single-TOW is the largest of TOW statistics. In WSRR, we first calculate the weighted sum (Madsen & Browning, 2009) of genotypes at variants in a genomic region and then the statistic of WSRR is the score test statistic under reverse regression model, in which the weighted sum of genotypes is the response variable and traits are predictor variables. Our results show that, in all of the simulation scenarios, AWRR is consistently more powerful than CCA. In most scenarios, AWRR is more powerful than Single-TOW and WSRR.

Method

We consider a sample with n unrelated individuals. Each individual has K (potentially correlated) traits and has been genotyped at M variants in a genomic region. Let y_ik denote the k^th trait value of the i^th individual. Let x_im denote the genotype score of the m^th variant of the i^th individual, where x_im is the number of minor alleles of the i^th individual carried at the m^th variant. We denote Y_i = (y_i1,…,y_iK)^T as the K traits for the i^th individual. We propose an adaptive weighting reverse regression (AWRR) method to test the null hypothesis H₀: none of the K traits are associated with the M variants in the genomic region. For constructing the test statistic of AWRR, we first collapse the M dimensional genotype (x_i1,…,x_iM) into a one dimensional number $x_{i} = \sum_{m = 1}^{M} w_{m} x_{i m}$ , where w_m is the adaptive weight for the m^th variant. The adaptive weight w_m should satisfy the properties that w_m should be large if the m^th variant has strong association with the K traits and w_m should have different signs for risk and protective variants. Then, the statistic of AWRR is the score test statistic under the reverse regression model $x_{i} = β_{0} + \sum_{k = 1}^{K} β_{k} y_{i k} + ε_{i}$ . In details, the AWRR method has the following steps.

We define a weight W_m for the m^th variant such that W_m will be large if the m^th variant has strong association with the K traits and W_m will be also large if the m^th variant is a rare variant. For these purposes, we propose $W_{m} = \frac{1}{\sqrt{p_{m} (1 - p_{m})}} T_{m}$ , where p_m is the minor allele frequency of the m^th variant and T_m is the score statistic to test the null hypothesis H₀:β₁=⋯=β_K = 0 under the reverse regression model $log \frac{p_{i m}}{1 - p_{i m}} = β_{0} + β_{1} y_{i 1} + \dots + β_{K} y_{i K}$ , where we assume a dominant model p_im = Pr(x_im=1) = Pr(x_im=2). In fact, for rare variants, x_im essentially is 0 or 1. The score statistic is given by $T_{m} = U_{m}^{T} V_{m}^{- 1} U_{m}$ , where $U_{m} = \sum_{i = 1}^{n} Y_{i} (x_{i m} - {\bar{x}}_{m})$ and $V_{m} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i m} - {\bar{x}}_{m})}^{2} \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) {(Y_{i} - \bar{Y})}^{T}$ , where ${\bar{x}}_{m} = \frac{1}{n} \sum_{i = 1}^{n} x_{i m}$ , and $\bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} Y_{i}$ . Under the null hypothesis, T_m follows a χ² distribution with degrees of freedom K. However, W_m does not consider the direction of the effects of causal variants.
In this step, we will define a direction of W_m. We first select a trait (the selected trait denoted as the k̃^th trait). We use sign(ρ(y_k̃,x_m)) to denote the direction of the associations of the m^th variant, where y_k̃ = (y_1k̃,…,y_nk̃)^T and x_m = (x_1m,…,x_nm)^T. If the k̃^th trait has no association with the M variants, the directions of the association will be random. In order to try to avoid random directions, we propose to choose the trait that has the strongest association with the M variants. Let $T_{T O W}^{k}$ denote the statistic of TOW (Sha et al., 2012) to test association between the k^th trait and the M variants. $T_{T O W}^{k}$ is defined as $T_{T O W}^{k} = \sum_{i = 1}^{n} (y_{i k} - {\bar{y}}_{k}) (x_{i}^{o} - {\bar{x}}^{o})$ , where $x_{i}^{o} = \sum_{m = 1}^{M} w_{m}^{o} x_{i m}$ and $w_{m}^{O} = \sum_{i = 1}^{n} (y_{i k} - {\bar{y}}_{k}) (x_{i m} - {\bar{x}}_{m}) / \sum_{i = 1}^{n} {(x_{i m} - {\bar{x}}_{m})}^{2}$ . We choose the k̃^th trait such that $T_{T O W}^{\tilde{k}} = {max}_{1 \leq k \leq K} T_{T O W}^{k}$ .
The final weight for the m^th variant is given by w_m = sign(ρ(y_k̃,x_m))W_m. Let $x_{i} = \sum_{m = 1}^{M} w_{m} x_{i m}$ . Then, we consider the reverse regression model $x_{i} = β_{0} + \sum_{k = 1}^{K} β_{k} y_{i k} + ε_{i}$ . We apply a score test to test the null hypothesis H₀:β₁ =⋯= β_K = 0. The score statistic is given by T_AWRR = u^TV⁻¹u, where $u = \sum_{i = 1}^{n} (x_{i} - \bar{x}) Y_{i}, V = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) {(Y_{i} - \bar{Y})}^{T}, \bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$ , and $\bar{Y} = \frac{1}{n} \sum_{i = 1}^{n} Y_{i}$ .
In this step, we evaluate the p-value of T_AWRR. Since w_m depends on the trait values and the genotype scores, T_AWRR does not follow a χ² distribution with degrees of freedom K. We use a permutation procedure to evaluate the p-value of T_AWRR. In each permutation, we randomly shuffle Y₁,…,Y_n and keep the genotypes of each individual unchanged. We repeat step 1 to step 3 based on each permutated data. Let $T_{A W R R}^{0}$ denote the test statistic of T_AWRR based on the original data and $T_{A W R R}^{per}$ denote the test statistic based on the permuted data. Then the p-value of the test T_AWRR is the proportion of the number of permutations with $T_{A W R R}^{per} \geq T_{A W R R}^{0}$ .

Comparison of Tests

We compare the performance of the proposed test AWRR with those of the canonical correlation analysis (CCA) (Tang & Ferreira, 2012), the Single-TOW method (Sha et al., 2012), and the Weighted Sum Reverse Regression (WSRR) method (Madsen & Browning, 2009).

CCA method

although the asymptotical distribution of the CCA statistic works well for common variants, it is very conservative for rare variants. Thus, we propose to use a permutation procedure to evaluate the p-value of the CCA statistic.

Single-TOW method

let $T_{T O W}^{k}$ denote the statistic of TOW (Sha et al., 2012) to test association between the k^th trait and the M variants. The statistic of Single-TOW is given by $T_{Single - T O W} = {max}_{1 \leq k \leq K} T_{T O W}^{k}$ . The p-value of T_Single-TOW is evaluated by a permutation procedure.

WSRR method

let $X_{i} = \sum_{m = 1}^{M} w_{m} x_{i m}$ , where $w_{m} = 1 / \sqrt{p_{m} (1 - p_{m})}$ and p_m is the minor allele frequency of the m^th variant. We consider the reverse regression model $X_{i} = β_{0} + \sum_{k = 1}^{K} β_{k} y_{i k} + ε_{i}$ . The statistic of WSRR, T_WSRR, is the score test statistic to test the null hypothesis H₀:β₁ =⋯=β_K = 0. Under the null hypothesis, T_WSRR follows a χ² distribution with degrees of freedom K.

Simulation

Our simulations follow that of Sha et al. (2012). In detail, the empirical Mini-Exome genotype data provided by the genetic analysis workshop 17 (GAW17) are used for simulation studies. This dataset contains genotypes of 697 unrelated individuals on 3205 genes. We conduct two sets of simulations. In the first set of simulations, we choose four genes: ELAVL4, MSH4, PDE4B, and ADAMTS4 with 10, 20, 30, and 40 variants, respectively. We merge the four genes to form a super gene (Sgene1) with 100 variants (Sha et al., 2012). In the second set of simulations, we choose ten genes: ELAVL4, FAM73A, PSMB4, FSHR, GMCL1, HNMT, GALNT13, NEUROD1, MYEOV2, and TWF2 with 10 variants in each of them. We merge the ten genes to form a super gene (Sgene2) with 100 variants. In our simulation studies, we generate genotypes based on the genotypes of 697 individuals in the Sgene1 and Sgene2. To generate a qualitative disease affection status, we use a liability threshold model based on a quantitative trait. For a qualitative trait, an individual is defined as affected if the individual’s corresponding quantitative trait is at least one standard deviation larger than the phenotypic mean. This yields a prevalence of 16% for the simulated disease in the general population. In the following, we describe how to generate a quantitative trait.

To evaluate the type I error, we generate K traits of an individual independent of the genotypes by using

Y = {(y_{1}, \dots, y_{K})}^{T} = \sqrt{ρ} B u + \sqrt{1 - ρ} ε,

where u = (u₁,…,u_{n_u})^T ~ MVN (0,I) is a vector of n_u independent standard normal latent variables, ε = (ε₁,…, ε_K)^T ~ MVN (0,I) is a vector of errors, B is a K×n_u loading matrix, the values of n_u and B are based on two variance models: (1) n_u = 1, B = (1,…,1)^T and (2) n_u = 2, $B = [\begin{matrix} e_{[K / 2]} & 0 \\ 0 & e_{K - [K / 2]} \end{matrix}]$ . Thus, Y ~ MVN(0,Σ), where Σ = ρBB^T + (1−ρ)I.

To evaluate power, we consider that all causal variants are rare (MAF<0.01). We randomly choose n_c rare variants as causal variants, where n_c is determined by the percentage of causal variants among rare variants. Denote n_r and n_p as the number of risk rare variants and protective rare variants, respectively, where n_r+n_p = n_c. Let $x_{q i}^{r}$ and $x_{j i}^{p}$ denote the genotypic scores of the q^th risk rare variant and the j^th protective rare variant for the i^th individual, respectively. Suppose that causal variants have impact on L traits among the K traits. Among the L traits, there are L_p traits positively correlated with risk variants and there are L_n traits negatively correlated with risk variants. Let h denote the heritability of all the n_c rare causal variants on each of the L traits. Generate n_c random numbers r₁,…,r_{n_c} from a uniform distribution between 0 and 1. The heritability of the i^th causal variant is given by $h_{i} = \frac{h r_{i}}{\sum_{j = 1}^{n_{c}} r_{j}}$ . Under this assumption, we simulated K traits by

y_{i k} = {\begin{matrix} \sum_{q = 1}^{n_{r}} β_{k q}^{r} x_{q i}^{r} - \sum_{j = 1}^{n_{p}} β_{k j}^{p} x_{j i}^{p} + ε_{i k}, & 1 \leq k \leq L_{p} \\ - (\sum_{q = 1}^{n_{r}} β_{k q}^{r} x_{q i}^{r} - \sum_{j = 1}^{n_{p}} β_{k j}^{p} x_{j i}^{p}) + ε_{i k}, & L_{p} + 1 \leq k \leq L, \\ ε_{i k}, & L < k \leq K \end{matrix}

where ε_i = (ε_i1,…,ε_iK)^T can be generated in the same way as generating traits of evaluating type I error, $β_{k q}^{r}$ and $β_{k j}^{p}$ are constants and their values depend on the heritability.

Results

For type I error evaluation, we only consider the first set of simulations, but consider different sample sizes, different significance levels, different variance models and different types of traits. In each simulation scenario, the p-values of AWRR, Single-TOW and CCA are estimated by 10,000 permutations (the p-values of WSRR are estimated by a χ² distribution) and the type I error rates of all of the four tests are evaluated using 10,000 replicated samples. For 10,000 replicated samples, the 95% confidence intervals (CIs) for the type I error rates at the nominal levels 0.05 and 0.01 are (0.046, 0.054) and (0.008, 0.012), respectively. The estimated type I error rates of the four tests are summarized in Tables 1, 2, S1 and S2. From these tables, we can see that only two estimated type I error rates of CCA are not within the CIs and these two type I error rates (one is 0.0126 for nominal level 0.01 in Table 1, and the other one is 0.05505 for nominal level 0.05 in Table S1) are very close to the upper bounds of the corresponding CIs, which indicates that the four tests are all valid.

Table 1.

The estimated type I error rates of four methods for quantitative traits under variance model 1. 10,000 replicates are used. This set of simulations is based on Sgene1.

	Sample size
		500	1000	2000
α = 0.05	CCA	0.0518	0.0519	0.04645
	Single-TOW	0.04995	0.05255	0.0506
	WSRR	0.0464	0.0506	0.0496
	AWRR	0.0519	0.0527	0.0531
α = 0.01	CCA	0.012	0.00845	0.0126
	Single-TOW	0.0117	0.00965	0.012
	WSRR	0.0081	0.009	0.0097
	AWRR	0.01075	0.0097	0.01135

Open in a new tab

Table 2.

The estimated type I error rates of four methods for qualitative traits under variance model 1. 10,000 replicates are used. This set of simulations is based on Sgene1.

	Sample size
		500	1000	2000
α = 0.05	CCA	0.052	0.0527	0.04985
	Single-TOW	0.0519	0.0534	0.05
	WSRR	0.0502	0.0493	0.0487
	AWRR	0.054	0.0505	0.05265
α = 0.01	CCA	0.0101	0.01115	0.00955
	Single-TOW	0.0109	0.01165	0.01115
	WSRR	0.0106	0.0092	0.0106
	AWRR	0.00955	0.00965	0.012

Open in a new tab

For power comparisons, we consider 10 traits and we assume that all causal variants are rare. For each type of traits and each variance model, we consider different values of heritability, different percentages of protective variants, and different percentages of causal variants. In each of the simulation scenarios, the p-values of AWRR, Single-TOW and CCA are estimated using 1,000 permutations (the p-values of WSRR are estimated by a χ² distribution) and the power of all of the four tests is evaluated using 500 replicated samples at a significance level of 0.05.

We first consider the first set of simulations for quantitative traits under variance model 1. Figure 1 provides the power comparisons of the four tests (WSRR, AWRR, CCA and Single-TOW) for the power as a function of heritability. This figure shows that WSRR is the least powerful one and AWRR is the most powerful one. It is little complicated to compare the power of Single-TOW with the power of CCA. When genotypes impact on only one trait, Single-TOW is more powerful than CCA; otherwise, CCA is more powerful than Single-TOW. Since Single-TOW only depends on the trait that has the strongest association with genotypes, it is more favorable for Single-TOW when genotypes impact on fewer traits. Power comparisons of the four tests for the power as a function of percentage of protective variants are given by Figure 2. This figure shows that, with an increasing percentage of protective variants, the power of WSRR decreases while the power of the other three methods does not change. Other patterns of the power comparisons are similar to those shown in Figure 1. The power comparisons of the four tests for the power as a function of the percentage of causal variants are given by Figure 3. As shown in this figure, with an increasing percentage of causal variants, the power of WSRR increases while the power of the other three methods does not change. WSRR is the least powerful method when the percentage of causal variants is small (≤ 0.15), while WSRR is the most powerful test when the percentage of causal variants is large (≥ 0.3). The patterns of the power comparisons of CCA, AWRR and Single-TOW are similar to those shown in Figure 1.

Power comparisons of the four tests (WSRR, AWRR, CCA and Single-TOW) for the power as a function of heritability for quantitative traits under variance model 1. The sample size is 1000 and ρ = 0.5. The percentage of the causal variants is 0.1. All causal variants are risk variants. The total number of traits is 10. This set of simulations is based on Sgene1.

Power comparisons of the four tests (WSRR, AWRR, CCA and Single-TOW) for the power as a function of percentage of protective variants for quantitative traits under variance model 1. The sample size is 1000, the percentage of causal variants is 0.2, the total heritability is 0.03, and ρ = 0.5. The total number of traits is 10. This set of simulations is based on Sgene1.

Power comparisons of the four tests (WSRR, AWRR, CCA and Single-TOW) for the power as a function of the percentage of causal variants for quantitative traits under variance model 1. The sample size is 1000 and ρ = 0.5, and the total heritability is 0.03. All causal variants are risk variants. The total number of traits is 10. This set of simulations is based on Sgene1.

Under the first set of simulations, we also compare the powers of the four methods for quantitative traits under variance model 2 and for qualitative traits under variance models 1 and 2. These results are given in Figures S1–S9. For each type of trait, the patterns of the power comparisons are similar under variance models 1 and 2. For qualitative traits, CCA is consistently less powerful than Single-TOW and AWRR because CCA is designed for quantitative traits. For qualitative traits, the powers of AWRR, Single-TOW, and CCA decrease with the increase of the percentage of protective variants, although decrease not as fast as that of WSRR. As pointed out by Wu et al. (2011) and Sha et al. (2012), the decrease in the powers of AWRR, Single-TOW, and CCA in the presence of both risk and protective variants is due to the fact that protective variants lower MAFs in cases and thus make observing rare variants in the cases more difficult. The larger decrease in power of WSRR is additionally driven by the sensitivity to the direction of the effect due to aggregation of genotypes.

Under the second set of simulations, we compare the powers of the four methods for quantitative traits under variance model 1. Results are given in Figures S10–S12. Comparing Figures S10–S12 with Figures 1–3, the patterns of the power comparisons under the second set of simulations are very similar to those under the first set of simulations. Under the second set of simulations, we also compare the powers of the four methods for quantitative traits under variance model 2 and for qualitative traits under variance models 1 and 2 (results are not shown). Results also show that the patterns of the power comparisons under the second set of simulations are very similar to those under the first set of simulations.

In summary, for all simulation scenarios, AWRR is consistently more powerful than CCA and the power of WSRR increases with an increasing percentage of causal variants or with a decreasing o percentage of protective variants. For quantitative traits, the powers of AWRR, CCA and Single-TOW are robust to the percentage of protective variants and to the percentage of causal variants, while for qualitative traits, the powers of AWRR, CCA and Single-TOW decrease with the increasing of the percentage of protective variants and are relatively robust to the percentage of causal variants.

Discussion

In this article, we proposed the AWRR method to perform joint analysis of multiple traits in rare variant association studies based on the following reasons: (1) the development of next-generation sequencing technology has made directly testing all rare variants feasible and (2) there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases and multiple related traits are usually measured in genetic association studies of complex diseases. We used extensive simulation studies to compare the performance of AWRR with CCA, WSRR and Single-TOW. Our results showed that AWRR has correct type I error rates, is robust to the directions of the association of causal variants for quantitative traits, and is robust to the percentage of causal variants. AWRR is consistently more powerful than CCA. AWRR is more powerful than Single-TOW and WSRR in most simulation scenarios.

Our simulation studies showed that the performance of each of AWRR, WSRR and Single-TOW depends strongly upon the number of traits impacted by genetic variants, the percentage of protective variants, and the percentage of causal variants. And no method demonstrates consistently good power. To increase the robustness of the test, we can combine AWRR, WSRR and Single-TOW aiming to have advantages of the three methods. Let p_AWRR, p_WSRR and p_Single-TOW denote the p-values of AWRR, WSRR, and Single-TOW, respectively. The combined test statistic can be defined as T_combined = min{p_AWRR,p_WSRR,p_Single-TOW}. However, the performance of the combined test needs further investigation.

In association studies based on unrelated individuals, it has been long recognized that population stratification can seriously confound association results (Knowler et al., 1988; Lander & Schork, 1994). Several methods have been developed to control for population stratification for association studies based on unrelated individuals. These methods include the GC approach (Devlin & Roeder, 1999; Devlin et al., 2001; Reich & Goldstein, 2001), the PC approach (Bauchet et al., 2007; Chen et al., 2003; Price et al., 2006; Zhang et al., 2003; Zhu et al., 2002), and the MLM approach (Kang et al., 2010; Zhang et al., 2010). Like most association tests based on unrelated individuals, AWRR is subject to bias due to population stratification. To make AWRR robust to population stratification, we can use the PC approach. Let P_i = (p_i1,…,p_iL)^T denote the first L PCs of the genotypes at a set of genomic markers for the i^th individual. In step 3 of AWRR, we can use the residuals of the regression x_i = α + β^TP_i + ε_i to replace x_i and use the residuals of the regression $y_{i k} = α_{k} + β_{k}^{T} P_{i} + ε_{i k}$ to replace y_ik. The performance of using the PC approach to control for population stratification in AWRR also needs further investigation.

The computation time required for running AWRR depends on the sample size, the number of variants in the genomic region, the number of traits, and the number of permutations. The running time of AWRR with 1000 permutations on the data set with 1000 individuals, 10 traits, and 100 variants in the genomic region on a laptop with four Intel Cores @ 2.00GHz and 4 GB memory is no more than 0.5s. To perform genome-wide studies, we can first select genomic regions that show evidence of association based on a small number of permutations (e.g. 1000), and then a large number of permutations can be used to test the selected regions.

Supplementary Material

Supp Info

NIHMS756704-supplement-Supp_Info.docx^{(524.3KB, docx)}

Acknowledgments

The Genetic Analysis workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. Preparation of the Genetic Analysis Workshop 17 Simulated Exome Data Set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org).

Footnotes

Conflict of Interest

The authors declare no conflict of interest.

References

Andres AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol. 2007;31:659–671. doi: 10.1002/gepi.20185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet. 2014;94:662–676. doi: 10.1016/j.ajhg.2014.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bauchet M, Mcevoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD. Measuring European population stratification with microarray genotype data. Am J Hum Genet. 2007;80:948–956. doi: 10.1086/513477. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. doi: 10.1038/ng.f.136. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen HS, Zhu X, Zhao H, Zhang S. Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet. 2003;67:250–264. doi: 10.1046/j.1469-1809.2003.00036.x. [DOI] [PubMed] [Google Scholar]
Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genet Epidemiol. 2013;37:110–121. doi: 10.1002/gepi.21689. [DOI] [PubMed] [Google Scholar]
Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol. 2001;60:155–166. doi: 10.1006/tpbi.2001.1542. [DOI] [PubMed] [Google Scholar]
Feng T, Zhang S, Sha Q. A method dealing with a large number of correlated traits in a linkage genome scan. BMC Proc. 2007;1(Suppl 1):S84. doi: 10.1186/1753-6561-1-s1-s84. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32:9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]
Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet. 1988;43:520–526. [PMC free article] [PubMed] [Google Scholar]
Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44:1066–1071. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [DOI] [PubMed] [Google Scholar]
Lange C, Van Steen K, Andrew T, Lyon H, Demeo DL, Raby B, Murphy A, Silverman EK, Macgregor A, Weiss ST, Laird NM. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol. 2004;3 doi: 10.2202/1544-6115.1067. Article17. [DOI] [PubMed] [Google Scholar]
Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, Team, N.G.E.S.P.-E.L.P. Christiani DC, Wurfel MM, Lin X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91:224–237. doi: 10.1016/j.ajhg.2012.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–321. doi: 10.1016/j.ajhg.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5:e1000384. doi: 10.1371/journal.pgen.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST) Mutat Res. 2007;615:28–56. doi: 10.1016/j.mrfmmm.2006.09.003. [DOI] [PubMed] [Google Scholar]
Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. Testing for an unusual distribution of rare variants. PLoS Genet. 2011;7:e1001322. doi: 10.1371/journal.pgen.1001322. [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]
O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, Coin LJ. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One. 2012;7:e34861. doi: 10.1371/journal.pone.0034861. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ott J, Rabinowitz D. A principal-components approach based on heritability for combining phenotype information. Hum Hered. 1999;49:106–111. doi: 10.1159/000022854. [DOI] [PubMed] [Google Scholar]
Price AL, Kryukov GV, De Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86:832–838. doi: 10.1016/j.ajhg.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. doi: 10.1086/321272. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. doi: 10.1093/hmg/11.20.2417. [DOI] [PubMed] [Google Scholar]
Reich DE, Goldstein DB. Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol. 2001;20:4–16. doi: 10.1002/1098-2272(200101)20:1<4::AID-GEPI2>3.0.CO;2-T. [DOI] [PubMed] [Google Scholar]
Sattar N, Mcconnachie A, Shaper AG, Blauw GJ, Buckley BM, De Craen AJ, Ford I, Forouhi NG, Freeman DJ, Jukema JW, Lennon L, Macfarlane PW, Murphy MB, Packard CJ, Stott DJ, Westendorp RG, Whincup PH, Shepherd J, Wannamethee SG. Can metabolic syndrome usefully predict cardiovascular disease and diabetes? Outcome data from two prospective studies. Lancet. 2008;371:1927–1935. doi: 10.1016/S0140-6736(08)60602-9. [DOI] [PubMed] [Google Scholar]
Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012;36:561–571. doi: 10.1002/gepi.21649. [DOI] [PubMed] [Google Scholar]
Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genet Epidemiol. 2014;38:494–501. doi: 10.1002/gepi.21834. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, Rudan I, Mckeigue P, Wilson JF, Campbell H. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89:607–618. doi: 10.1016/j.ajhg.2011.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens M. A unified framework for association analysis with multiple related phenotypes. PLoS One. 2013;8:e65245. doi: 10.1371/journal.pone.0065245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nat Genet. 2008;40:17–22. doi: 10.1038/ng.2007.53. [DOI] [PubMed] [Google Scholar]
Tang CS, Ferreira MA. A gene-based test of association using canonical correlation analysis. Bioinformatics. 2012;28:845–850. doi: 10.1093/bioinformatics/bts051. [DOI] [PubMed] [Google Scholar]
Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19:R145–R151. doi: 10.1093/hmg/ddq333. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van Der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9:e1003235. doi: 10.1371/journal.pgen.1003235. [DOI] [PMC free article] [PubMed] [Google Scholar]
Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11:103–105. doi: 10.1016/j.ccr.2007.01.010. [DOI] [PubMed] [Google Scholar]
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan T, Li Q, Li Y, Li Z, Zheng G. Genetic association with multiple traits in the presence of population stratification. Genet Epidemiol. 2013;37:571–580. doi: 10.1002/gepi.21738. [DOI] [PubMed] [Google Scholar]
Yang Q, Wang Y. Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. J Probab Stat. 2012;2012:652569. doi: 10.1155/2012/652569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34:444–454. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010;87:604–617. doi: 10.1016/j.ajhg.2010.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang S, Zhu X, Zhao H. On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet Epidemiol. 2003;24:44–56. doi: 10.1002/gepi.10196. [DOI] [PubMed] [Google Scholar]
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–360. doi: 10.1038/ng.546. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu X, Zhang S, Zhao H, Cooper RS. Association mapping, using a mixture model for complex traits. Genet Epidemiol. 2002;23:181–196. doi: 10.1002/gepi.210. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Info

NIHMS756704-supplement-Supp_Info.docx^{(524.3KB, docx)}

[R1] Andres AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol. 2007;31:659–671. doi: 10.1002/gepi.20185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet. 2014;94:662–676. doi: 10.1016/j.ajhg.2014.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Bauchet M, Mcevoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD. Measuring European population stratification with microarray genotype data. Am J Hum Genet. 2007;80:948–956. doi: 10.1086/513477. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. doi: 10.1038/ng.f.136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Chen HS, Zhu X, Zhao H, Zhang S. Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet. 2003;67:250–264. doi: 10.1046/j.1469-1809.2003.00036.x. [DOI] [PubMed] [Google Scholar]

[R6] Derkach A, Lawless JF, Sun L. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genet Epidemiol. 2013;37:110–121. doi: 10.1002/gepi.21689. [DOI] [PubMed] [Google Scholar]

[R7] Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]

[R8] Devlin B, Roeder K, Wasserman L. Genomic control, a new approach to genetic-based association studies. Theor Popul Biol. 2001;60:155–166. doi: 10.1006/tpbi.2001.1542. [DOI] [PubMed] [Google Scholar]

[R9] Feng T, Zhang S, Sha Q. A method dealing with a large number of correlated traits in a linkage genome scan. BMC Proc. 2007;1(Suppl 1):S84. doi: 10.1186/1753-6561-1-s1-s84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32:9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]

[R12] Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet. 1988;43:520–526. [PMC free article] [PubMed] [Google Scholar]

[R13] Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44:1066–1071. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [DOI] [PubMed] [Google Scholar]

[R15] Lange C, Van Steen K, Andrew T, Lyon H, Demeo DL, Raby B, Murphy A, Silverman EK, Macgregor A, Weiss ST, Laird NM. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol. 2004;3 doi: 10.2202/1544-6115.1067. Article17. [DOI] [PubMed] [Google Scholar]

[R16] Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, Team, N.G.E.S.P.-E.L.P. Christiani DC, Wurfel MM, Lin X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91:224–237. doi: 10.1016/j.ajhg.2012.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–321. doi: 10.1016/j.ajhg.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5:e1000384. doi: 10.1371/journal.pgen.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]

[R20] Morgenthaler S, Thilly WG. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST) Mutat Res. 2007;615:28–56. doi: 10.1016/j.mrfmmm.2006.09.003. [DOI] [PubMed] [Google Scholar]

[R21] Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Orho-Melander M, Kathiresan S, Purcell SM, Roeder K, Daly MJ. Testing for an unusual distribution of rare variants. PLoS Genet. 2011;7:e1001322. doi: 10.1371/journal.pgen.1001322. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] O'Brien PC. Procedures for comparing samples with multiple endpoints. Biometrics. 1984;40:1079–1087. [PubMed] [Google Scholar]

[R23] O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, Coin LJ. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One. 2012;7:e34861. doi: 10.1371/journal.pone.0034861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Ott J, Rabinowitz D. A principal-components approach based on heritability for combining phenotype information. Hum Hered. 1999;49:106–111. doi: 10.1159/000022854. [DOI] [PubMed] [Google Scholar]

[R25] Price AL, Kryukov GV, De Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010;86:832–838. doi: 10.1016/j.ajhg.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[R27] Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. doi: 10.1086/321272. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. doi: 10.1093/hmg/11.20.2417. [DOI] [PubMed] [Google Scholar]

[R29] Reich DE, Goldstein DB. Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol. 2001;20:4–16. doi: 10.1002/1098-2272(200101)20:1<4::AID-GEPI2>3.0.CO;2-T. [DOI] [PubMed] [Google Scholar]

[R30] Sattar N, Mcconnachie A, Shaper AG, Blauw GJ, Buckley BM, De Craen AJ, Ford I, Forouhi NG, Freeman DJ, Jukema JW, Lennon L, Macfarlane PW, Murphy MB, Packard CJ, Stott DJ, Westendorp RG, Whincup PH, Shepherd J, Wannamethee SG. Can metabolic syndrome usefully predict cardiovascular disease and diabetes? Outcome data from two prospective studies. Lancet. 2008;371:1927–1935. doi: 10.1016/S0140-6736(08)60602-9. [DOI] [PubMed] [Google Scholar]

[R31] Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012;36:561–571. doi: 10.1002/gepi.21649. [DOI] [PubMed] [Google Scholar]

[R32] Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genet Epidemiol. 2014;38:494–501. doi: 10.1002/gepi.21834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, Rudan I, Mckeigue P, Wilson JF, Campbell H. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89:607–618. doi: 10.1016/j.ajhg.2011.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Stephens M. A unified framework for association analysis with multiple related phenotypes. PLoS One. 2013;8:e65245. doi: 10.1371/journal.pone.0065245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Stratton MR, Rahman N. The emerging landscape of breast cancer susceptibility. Nat Genet. 2008;40:17–22. doi: 10.1038/ng.2007.53. [DOI] [PubMed] [Google Scholar]

[R37] Tang CS, Ferreira MA. A gene-based test of association using canonical correlation analysis. Bioinformatics. 2012;28:845–850. doi: 10.1093/bioinformatics/bts051. [DOI] [PubMed] [Google Scholar]

[R38] Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19:R145–R151. doi: 10.1093/hmg/ddq333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Van Der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9:e1003235. doi: 10.1371/journal.pgen.1003235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Walsh T, King MC. Ten genes for inherited breast cancer. Cancer Cell. 2007;11:103–105. doi: 10.1016/j.ccr.2007.01.010. [DOI] [PubMed] [Google Scholar]

[R41] Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] Yan T, Li Q, Li Y, Li Z, Zheng G. Genetic association with multiple traits in the presence of population stratification. Genet Epidemiol. 2013;37:571–580. doi: 10.1002/gepi.21738. [DOI] [PubMed] [Google Scholar]

[R43] Yang Q, Wang Y. Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. J Probab Stat. 2012;2012:652569. doi: 10.1155/2012/652569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34:444–454. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010;87:604–617. doi: 10.1016/j.ajhg.2010.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Zhang S, Zhu X, Zhao H. On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet Epidemiol. 2003;24:44–56. doi: 10.1002/gepi.10196. [DOI] [PubMed] [Google Scholar]

[R47] Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–360. doi: 10.1038/ng.546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods. 2014;11:407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] Zhu X, Zhang S, Zhao H, Cooper RS. Association mapping, using a mixture model for complex traits. Genet Epidemiol. 2002;23:181–196. doi: 10.1002/gepi.210. [DOI] [PubMed] [Google Scholar]

PERMALINK

Joint analysis of multiple traits in rare variant association studies

Zhenchuan Wang

Xuexia Wang

Qiuying Sha

Shuanglin Zhang

SUMMARY

Introduction

Method