Multivariate generalized linear model for genetic pleiotropy

Daniel J Schaid; Xingwei Tong; Anthony Batzler; Jason P Sinnwell; Jiang Qing; Joanna M Biernacka

doi:10.1093/biostatistics/kxx067

. 2017 Dec 16;20(1):111–128. doi: 10.1093/biostatistics/kxx067

Multivariate generalized linear model for genetic pleiotropy

Daniel J Schaid ^1,^✉, Xingwei Tong ², Anthony Batzler ³, Jason P Sinnwell ³, Jiang Qing ², Joanna M Biernacka ³

PMCID: PMC6296339 PMID: 29267957

Summary

When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that Inline graphic traits are associated, given that the null of associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.

Keywords: Constrained model, Estimating equations, Multivariate analysis, Sequential testing

1. Introduction

Genetic pleiotropy—a single gene influencing more than one trait—has been of keen interest to geneticists, and rightly so. Understanding pleiotropy can lead to greater biological insights. However, discovering pleiotropy can be challenging, because a gene can be associated with more than one trait for multiple reasons (Solovieff and others, 2013). Statistical methods to evaluate pleiotropy have been developed from different angles, ranging from comparison of univariate marginal associations of a genetic variant with multiple traits, to multivariate analyses with simultaneous regression of all traits on a genetic variant, to reverse regression of a genetic variant on all traits. Surveys of statistical methods for pleiotropy are provided elsewhere (Schaid and others, 2016; Schriner, 2012; Solovieff and others, 2013; Yang and Wang, 2012; Zhang and others, 2014), so we summarize the main approaches here.

Univariate analyses are often based on comparison of variant-specific p-values across multiple traits. Although simple and feasible for meta-analyses, this approach ignores correlation among the traits and is based on post-hoc analyses. More formal meta-analysis methods aggregate p-values to test whether any traits are associated with a variant, yet a significant association could be driven by just one trait. A slightly more sophisticated approach, also based on summary p-values, tests whether the distribution of p-values differs from the null distribution of no associations beyond those already detected (Cotsapas and others, 2011). Description of additional univariate methods are given elsewhere (Solovieff and others, 2013).

Multivariate methods have been popular for quantitative traits. Although different statistical methods have been proposed, some of them result in the same statistical tests. The following three approaches to analyze quantitative traits result in the same F-statistic to test whether any of the traits are associated with a genetic variant: (i) simultaneous regression of all traits on a single variant; (ii) regression of the minor allele dose on all traits; and (iii) canonical correlation of the matrix of traits with the minor allele dose [using plink.multivariate (Ferreira and Purcell, 2009)]. The regression of the dose of the minor allele on all traits is a convenient approach, particularly if some of the traits are binary. A slightly different approach is to account for the categorical nature of the dose of the minor allele: instead of using linear regression, use ordinal logistic regression of the dose on the traits [R MultiPhen package, (O’Reilly and others, 2012)]. An advantage of this approach is that it allows for binary traits, unlike most methods that assume traits are quantitative with a multivariate normal distribution. Recently, score tests for generalized linear models, based on estimating equations, have been developed as a way to simultaneously test multiple traits, some of which could be binary (Xu and Pan, 2015). Alternatively, joint analysis of a single quantitative trait and a single categorial trait has been developed, based on conditioning on the categorical trait (Wu and others, 2013). An approach somewhat between univariate and multivariate is based on reducing the dimension of the multiple traits by principal components (PC) and using a reduced set of PCs as either the dependent or independent variables in regression. A comparison of univariate and multivariate approaches found that multivariate methods based on multivariate normality { e.g., canonical correlation, linear regression of traits on minor allele dose, reverse regression, MultiPhen, Bayes methods [BIMBAM (Stephens, 2013), and SNPTEST (Marchini and others, 2007)]} all had similar power and were generally more powerful than univariate methods (Galesloot and others, 2014).

The power advantage of multivariate over univariate methods occurs when the direction of the residual correlation is opposite from that of the genetic correlation induced by the causal variant (Galesloot and others, 2014; Liu and others, 2009). In addition to the methods discussed above, a few new approaches have been proposed, but have not yet been compared with others. An interesting approach is to scale the different traits by their standard deviation, and then assume that the effect of a SNP is constant across all traits to in order to construct a test of association with 1 degree of freedom—so-called “scaled marginal models” (Roy and others, 2003; Schifano and others, 2013). Finally, an approach based on kernel machine regression extended the sequential kernel association test (Wu and others, 2010) to multiple traits, providing a simultaneous test of multiple traits with multiple genetic variants in a genomic region (Maity and others, 2012).

A limitation of all current approaches is that they test whether any traits are associated with a genetic variant, and small p-values could be driven by the association of the genetic variant with a single trait. Hence, post-hoc analyses are required to interpret the possibility of pleiotropy. This can be quite challenging when scaling up to a large number of genetic variants.

Because current analytic methods for pleiotropy limit biological understanding, we recently developed new statistical methods to evaluate pleiotropy, based on a likelihood ratio test for quantitative traits (Schaid and others, 2016). This method was developed as a sequential testing framework, such that one could test the null hypothesis that Inline graphic traits are associated, given that the null of associated traits was rejected. This approach allows one to determine the number of traits associated with a genetic variant, and which traits are associated, while accounting for correlations among the traits. A limitation of this past work is that it was based on the assumption of a multivariate normal distribution. Although robust to distributions with thicker tails than a normal distribution, this approach is too restrictive to other traits, such as binary or ordinal traits, or perhaps a mixture of different types of traits. For this reason, we extended our methods to account for a wide variety of types of traits, based on generalized linear models (to account for alternate types of traits) and generalized estimating equations (GEE, to account for correlation among the traits).

2. Methods

An outline of our approach follows. We first define a general linear model for each trait, and in turn define regression parameters for both the genotype of interest and adjusting covariates. These trait models allow us to compute covariances among the residuals to account for correlated traits. We describe how to estimate the parameters of the models, with special consideration for ordinal traits that require a few extra details. We then describe how to set up and test hypotheses regarding the number of regression coefficients constrained to zero, providing a way to consider combinatorial sets of constrained parameters. This framework provides a formal test of the null hypothesis of no pleiotropy (i.e., no more than one trait associated with a genotype), as well as an extension to perform sequential tests of the number of traits associated with a genotype.

2.1 Model and notation

Suppose that for each subject Inline graphic , a vector of traits is measured, where the traits could be a mixture of different types (e.g., some quantitive, some binomial, and some ordinal). Also assume a -dimensional vector of covariates for each subject and each trait, . For each trait , suppose that follows a generalized linear model

where the mean is modeled as a function of the covariates, Inline graphic with the regression parameters to be estimated. Here, is a known link function which has an inverse function ; the variance function is a known function of ; is the corresponding scale parameter that needs to be estimated or is known. For simplicity, we assume that since the statistical inferences below are easily extended to the general case Inline graphic .

We assume that the covariance matrix of Inline graphic has the form

where Inline graphic and is the correlation matrix that could be assumed to have a special structure (Diggle and others, 2009).

The regression parameters are organized as a vector for intercepts, Inline graphic , a vector for covariates, , and a collection of both, , which has dimension . Denote the true value of as , and let denote the vector of nuisance scale parameters. In the following sections, we will discuss the estimation of and hypothesis tests about .

2.2 Parameter estimation

To estimate Inline graphic and , we take advantage of standard procedures with standard software. Initial estimates of can be obtained by two different approaches. If all the traits are of the same type (e.g., all binomial), then standard software for GEE could be used, such as gee or glmgee packages in R software. A limitation of standard software for GEE, however, is that they do not allow for mixtures of different types of traits (e.g., quantitative mixed with binomial). To allow for a mixture of trait types, initial estimates could be obtained by performing separate regresssions for each trait. For example, using linear regression for quantitative traits and logistic regression for binomial traits. From these initial estimates, standardized residuals can be computed according to Inline graphic . From these residuals, one can estimate scale parameters by

and an empirical estimate of the correlation by

Using the intial estimates Inline graphic , , and , it is then possible to iteratively update these estimates to solve the generalized estimation equations

where Inline graphic , and In practice, we find that allowing an unspecified correlation structure can make it difficult to achieve convergence for the above iterative procedure, a limitation found in many GEE software packages. For this reason, we propose to obtain initial estimates of , , and and use these in the subsequent hypothesis testing framework.

2.3 Modification for ordinal traits

The above statistical methods are easily extended to the proportional odds model for ordinal traits, with a minor modification to allow for more than one intercept for each ordinal trait. For an ordinal trait with Inline graphic ordered categories and a single covariate , the proportional odds model is

Inline graphic for .

So, for our proposed methods, the vector of intercepts for all traits needs to be expanded to account for Inline graphic intercepts for each ordinal trait. Further alterations are needed to compute residuals, , because ordinal regression is modeling . Recall that for unordered categorical data with categories, indicator variables can be used to record an observed category for a subject, such as if category Inline graphic is observed, and 0 otherwise. Extending this to ordered categories, we can use the cumulative sum to record ordered categories, for ( 1 is ignored, because in this case always equals 1). This means that we expand from a single ordinal trait having values to a vector of traits , where each of the Inline graphic have values of 0 or 1. This type of coding corresponds with , as desired for ordinal regression. So, in summary, we expand each ordinal trait from to a vector , and then compute residuals , where the variance function is for the binary random variable . With these modifications, we can follow the above procedures to compute the residuals across all traits, the scale parameters Inline graphic , and the empirical estimate of the correlation . A bit more book keeping is required when computing the derivative matrix, , but the statistical methods follow through for analyzing ordinal traits. In general, we allow for any mixture of types of traits, such as binary, ordinal, or quantiative.

2.4 Hypothesis testing

We now consider testing the null hypothesis of no pleiotropy, which is formally stated as the following hypothesis test:

Inline graphic : Of the parmeters there exists at most one that is non-zero.

This null hypothesis is equivalent to testing whether one of the following Inline graphic sub-hypotheses holds:

Note that Inline graphic represents all , while for , allows while all other . To represent these hypotheses, we use . Let be a matrix such that tests whether all . This is the usual multivariate test. In this case, is created for binary and quantitative traits by taking the identity matrix of dimension Inline graphic , and then removing rows 1 through (to ignore the intercepts . To construct (, create an identity matrix of dimension , then remove rows 1 through and row (to ignore . This results in . Then, the null hypothesis of no pleiotropy is equivalent to : there exists one of , for . To account for ordinal traits, one must account for the additional intercepts for each ordinal trait.

Because we do not assume a specific multivariate distribution of the traits, we cannot work within the framework of likelihood ratio testing, as we had done for traits following a multivariate normal distribution (Schaid and others, 2016). Instead, we define the mean square error (MSE) of a model by Inline graphic , where

We use the MSE of models constrained under a null hypothesis to construct a test statistic. As shown in the supplementary material available at Biostatistics online, to test a sub-hypothesis Inline graphic , the statistic can be used, where denotes the difference between the unconstrained and constrained estimators , computed as , where is the unconstrained estimator, and where

When Inline graphic is true, , where is the rank of the idempotent matrix (Corollary 1 of supplementary material available at Biostatistics online). The statistics provide a way to test each sub-hypothesis . In the next sections, we propose a way to test the null hypothesis of no pleiotropy, and a sequential testing procedure to evaluate the number of traits associated with a genetic variant.

2.5 Testing pleiotropy

The usual multivariate null hypothesis of no associated traits, Inline graphic : , can be tested by the statistic , so reject if , where is the quantile of a distribution with degrees of freedom (df). Alternatively, the null hypothesis of no pleiotropy can be tested by simultaneously testing (i.e., no associated traits) and testing the null hypotheses that only one Inline graphic holds for (i.e, only one associated trait). This is achieved by the statistic As shown Corollary 3 of supplementary material available at Biostatistics online, if only one of the ’s is non-zero, which means that one holds true , , so reject the null hypothesis of no pleiotropy if Inline graphic . An advantage of this pleiotropy statistic is that it is simple to compute on a large number of genetic markers. However, if no traits are associated, the statistic is the minimum of correlated statistics, each with a distribution, so the distribution of under the null hypothesis of no associated traits is not well defined. To aid interpretation of how many traits, and which traits, are associated with a genetic marker, we propose a sequential test procedure.

2.6 Sequential testing

As we previously proposed (Schaid and others, 2016), sequential testing provides a rigorous procedure to evaluate the number of traits associated with a genetic variant. At the first stage, test the usual multivariate null hypothesis of no associated traits, Inline graphic : , using the statistic . If cannot be rejected, then the cannot be rejected. In contrast, if is rejected, we turn to the second stage to test the null hypothesis that one sub-hypothesis holds for . We denote this second stage hypothesis , with subscript 1 indicating the number of associated traits under the null hypothesis. For this, we ignore Inline graphic and use the test statisitc

As shown in Corollary 3 of supplementary material available at Biostatistics online, if only one of the Inline graphic ’s is non-zero, which means that one holds true , , so reject the null hypothesis if . In general, for stages , test : only components of are not zero. The statistic for stage is , where the indices are chosen from all possible ways of choosing unconstrained ’s among traits [ Inline graphic ]. That is, the indices correspond to the sub-hypothesis . The contrast matrix is contructed by constituting an identity matrix whose dimension is the number of estimated parameters, then deleting the rows that correspond to intercepts (to exclude intercepts), and then deleting rows with indices Inline graphic for ’s not constrained to equal zero. Reject if . If reject , continue this sequential testing by incrementing by 1. If fail to reject , stop testing and conclude there are traits associated with the genetic variant. The details of the statistical procedures of this general sequential testing procedure are provided in the supplementary material available at Biostatistics online as well as a proof that the Type-I error is controlled. This general sequential procedure provides a formal way to determine the number of traits associated with a genetic variant, and which traits are associated. To determine which traits are associated, the indices Inline graphic that generate the minimum for the statistic correspond to the traits that are associated, because these indices correspond to unconstrained ’s that provide the best model fit, while all other traits are inferred to be non-associated because their ’s are contrained to zero.

2.7 Simulations

To evaluate the adequacy of our developed methods, we performed simulations. For the null hypothesis of no pleiotropy, we performed two sets of simulations. To simulate traits with a specified covariance structure, we simulated random errors from a multivariate normal distribution with a covariance assumed to be a constant Inline graphic for all pairs of traits (i.e., exchangeable correlation structure) and variance of 1 for each trait. For binary and ordinal traits, we used thresholds to convert from a normally distributed trait to a binary or ordinal trait, with thresholds chosen to achieve the desired frequency for the levels of these categorical traits. By this approach, we were also able to simulate mixtures of quantitative and categorical traits that were correlated. For null hypotheses of no pleiotropy, we assumed either (i) all Inline graphic , the usual null for multivariate data or (ii) and all other (. The value of was chosen to achieve 80% power for the marginal effect of the first trait at a nominal Type-I error rate of 0.05. We assumed two different sample sizes, , and two different values of the number of traits, Inline graphic . For all simulations, a single SNP was simulated, assuming a minor allele frequency of 0.2. All simulations were repeated 1000 times.

2.8 Data application

Our multivariate methods were applied to a study of major depression, combining data from 725 Mayo Clinic patients who participated in two prior studies. All patients had severe depression, and recruitment of subjects and genotype assays are described in the original reports (Biernacka and others, 2015; Ji and others, 2013). For demonstration of our methods, the primary traits of interest were based on the baseline Hamilton Rating Scale for Depression (HAM-D). The HAM-D is a multiple item questionnaire used to provide an indication of depression. A patient is rated by a clinician according to 17 items, each item on a three- or five-point scale, with higher responses indicating more severe symptoms. The total of all 17 scores is used to measure depression, and a total score of 0–7 is considered to be normal, while scores of 20 or higher indicate moderate or worse depression. In our study, the majority of subjects answered the item “Loss of Insight” with the same response (acknowledges being depressed and ill), and because there was little information in this item, it was not used in the analyses. The remaining 16 items were analyzed as multivariate traits in a genome-wide association study (GWAS). The goal was to understand the genetic basis of the different symptoms of depression. The genetic data were based on the approximately 6.9 million measured and imputed single nucleotide polymorphisms (SNPs), having used the software Beagle to impute SNPs with the 1000 Genome cosmopolitan reference sample.

3. Results

3.1 Type-I error of

The statistic Inline graphic , which tests the usual multivariate null hypothesis of no associated traits, had well controlled Type-I error rates when all traits were binary (Table 1), or there was a 50:50 mixture of binary and quantitative traits (Table S1 of supplementary material available at Biostatistics online). When all traits are quantitative, our proposed methods reduce to the methods we presented elsewhere (Schaid and others, 2016), so simulations for purely quantitative traits were not performed here. In contrast, for ordinal traits, the Inline graphic statistic sometimes had slightly inflated Type-I error rates (Table S2 and S3 of supplementary material available at Biostatistics online). There are likely several reasons for this inflated error rate. First, for levels of an ordinal trait, one must estimate intercept parameters. Second, to compute the residual correlation matrix, an ordinal trait with Inline graphic levels contributes indicator variables, increasing the number of correlation parameters to estimate for each ordinal trait. It is likely that the additional variability of estimating multiple intercepts and multiple correlations slows down the rate of convergence of to a distribution. As an alternative, we evaluated the benefit of treating ordinal traits as Gaussian and using the identity link function. Tables S2 and S3 of Supplementary material available at Biostatistics online show that treating ordinal traits as Gaussian had better control of the Type-I error rates. To further explore the impact of sample size, we simulated sample sizes of 500, 1000, and 5000, for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Results in Figure 1 show the empirical versus the expected quantiles of p-values for Inline graphic . Figure 1 illustrates that the ordinal link function results in a closer fit to a distribution as sample size increases, and that treating ordinal traits as Gaussian gives a closer fit to a distribution than an ordinal link when sample sizes are not large.

Table 1.

Type-I error rates for Inline graphic and for binary traits, all

				Type-I error rate, nominal Type-I
No. subjects	No. traits	Fraction cases	Trait correlation	0.05		0.01
No. subjects	No. traits	Fraction cases	Trait correlation
500	4	0.2	0.2	0.046	0.005	0.011	0
			0.5	0.059	0.006	0.011	0.001
			0.8	0.052	0.006	0.007	0
		0.5	0.2	0.044	0.001	0.008	0
			0.5	0.051	0.006	0.009	0
			0.8	0.050	0.003	0.009	0.001
	10	0.2	0.2	0.050	0.006	0.011	0
			0.5	0.050	0.004	0.009	0
			0.8	0.052	0.006	0.014	0.001
		0.5	0.2	0.050	0.007	0.007	0.001
			0.5	0.052	0.003	0.008	0
			0.8	0.050	0.008	0.016	0.001
1000	4	0.2	0.2	0.044	0.003	0.006	0
			0.5	0.044	0.002	0.009	0
			0.8	0.034	0.006	0.005	0.002
		0.5	0.2	0.048	0.002	0.010	0.001
			0.5	0.050	0.002	0.010	0
			0.8	0.046	0.005	0.007	0
	10	0.2	0.2	0.054	0.004	0.007	0.001
			0.5	0.049	0.005	0.009	0
			0.8	0.045	0.006	0.012	0.001
		0.5	0.2	0.050	0.006	0.006	0
			0.5	0.049	0.003	0.010	0
			0.8	0.060	0.008	0.017 ^†	0.002

Open in a new tab

^†

Exceeds 95% upper confidence interval.

Fig. 1. — Observed versus expected quantiles of log(p-value) for statistic when there are no associated traits for 10 ordinal traits with equal correlations of 0.2, each trait with 5 equal probable categories, based on 10 000 simulations. Ordinal links on left panels and Gaussian analyses on right panels.

3.2 Type-I error of

When all Inline graphic ’s equal zero, tends to be very conservative (Table 1). This is not suprising, because under this null hypothesis, is based on the minimum of correlated statistics, and the asymptotic distribution of is unknown. In contrast, simulation results for testing pleiotropy with when only a single Inline graphic is non-zero are presented in Table 2 for when all traits are binary, in Table S4 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single binary trait has a non-zero , and in Table S5 of supplementary material available at Biostatistics online for a 50:50 mixture of binary and quantitative traits for when a single quantitative trait has a non-zero Inline graphic . The size of the was chosen such that the marginal power of a single trait was 0.80 for a nominal Type-I error rate of 0.05. For all these simulations, the Type-I error rate for is close to the nominal rate, illustrating that closely follows a distribution. The results in Tables 2 and Tables S4 and S5 of supplementary material available at Biostatistics online also show the power of testing the usual multivariate null hypothesis with Inline graphic when a single is non-zero. It is not surprising to see that power increased with increasing trait correlation and with fewer number of tested traits.

Table 2.

Simulation results for Inline graphic and for binary traits when

					Power and Type-I error rate, nominal type-I
No. subjects	No. traits	Fraction cases		Trait correlation	0.05		0.01
No. subjects	No. traits	Fraction cases		Trait correlation	power	Type-I	power	Type-I
500	4	0.2	0.51	0.2	0.521	0.041	0.317	0.005
				0.5	0.630	0.042	0.407	0.007
				0.8	0.803	0.045	0.614	0.013
		0.5	0.42	0.2	0.534	0.033	0.305	0.002
				0.5	0.631	0.040	0.374	0.009
				0.8	0.829	0.042	0.654	0.010
	10	0.2	0.51	0.2	0.418	0.026	0.188	0.002
				0.5	0.511	0.041	0.277	0.003
				0.8	0.677	0.037	0.469	0.013
		0.5	0.42	0.2	0.395	0.040	0.180	0.005
				0.5	0.493	0.045	0.257	0.007
				0.8	0.742	0.060	0.507	0.011
1000	4	0.2	0.37	0.2	0.599	0.033	0.374	0.005
				0.5	0.638	0.033	0.411	0.007
				0.8	0.840	0.058	0.662	0.011
		0.5	0.30	0.2	0.584	0.039	0.307	0.008
				0.5	0.648	0.039	0.396	0.007
				0.8	0.862	0.050	0.672	0.009
	10	0.2	0.37	0.2	0.453	0.039	0.244	0.008
				0.5	0.510	0.053	0.288	0.009
				0.8	0.763	0.053	0.548	0.012
		0.5	0.30	0.2	0.427	0.037	0.191	0.003
				0.5	0.538	0.042	0.298	0.007
				0.8	0.762	0.050	0.558	0.009

Open in a new tab

To evaluate the Type-I error rate of Inline graphic for ordinal traits, we simulated 5 equally correlated traits, with each trait having either 3 or 5 ordered categories. We allowed a single trait to have a non-zero , where the was the effect size per allele and chosen such that the marginal power for a single trait was 0.80 for a nominal Type-I error rate of 0.05. For the analyses, we used both an ordinal logistic link and treated ordinal traits as Gaussian. The results in Table 3 show that both procedures have well-controlled Type-I error rates and comparable power to test the null hypothesis of no association using the Inline graphic statistic.

Table 3.

Simulations for 5 ordinal traits with 3 or 5 categories each, when Inline graphic

				Ordinal link, nominal Type-I				Gaussian analysis, nominal Type-I
				0.05		0.01		0.05		0.01
No. categories	No. subjects	Trait correlation		power	Type-I	power	Type-I	power	Type-I	power	Type-I
3	500	0.2	0.41	0.585	0.049	0.340	0.005	0.593	0.047	0.360	0.005
		0.5		0.722	0.040	0.482	0.002	0.728	0.042	0.493	0.002
		0.8		0.966	0.059	0.880	0.004	0.967	0.057	0.879	0.004
	1000	0.2	0.29	0.595	0.038	0.363	0.009	0.593	0.035	0.366	0.009
		0.5		0.737	0.038	0.522	0.005	0.737	0.037	0.523	0.006
		0.8		0.955	0.054	0.861	0.005	0.956	0.051	0.853	0.005
5	500	0.2	0.39	0.535	0.033	0.309	0.007	0.518	0.029	0.295	0.007
		0.5		0.716	0.065 ^†	0.490	0.008	0.706	0.060	0.472	0.007
		0.8		0.962	0.058	0.899	0.012	0.960	0.049	0.888	0.008
	1000	0.2	0.28	0.648	0.048	0.413	0.010	0.647	0.046	0.410	0.008
		0.5		0.794	0.044	0.568	0.009	0.786	0.043	0.567	0.007
		0.8		0.991	0.051	0.964	0.014	0.989	0.051	0.957	0.012

Open in a new tab

^†

Exceeds 95% upper confidence interval.

3.3 Power for pleiotropy

Simulation results for power to detect pleiotropy are presented in Table 4 for five binary traits, with effect size Inline graphic chosen such that there is 80% power to detect the marginal effect of a single trait (Type-I error rate of 0.05). The results show that power for increases as either the trait correlation increases or the number of associated traits increases. Similar findings for ordinal traits are presented in Tables S6 and S7 of supplementary material available at Biostatistics online for five traits with three or five ordered categories each, respectively. The results for ordinal traits show similar power when analyzing ordinal traits with either an ordinal logistic link or treating them as Gaussian.

Table 4.

Power for 5 binary traits of which either 2 or 3 are associated with a SNP

					Power, nominal Type-I
No. associated traits	No. subjects	Fraction cases		Trait correlation	0.05		0.01
No. associated traits	No. subjects	Fraction cases		Trait correlation
2	500	0.2	0.51	0.2	0.869	0.384	0.679	0.157
				0.5	0.896	0.522	0.751	0.24
				0.8	0.965	0.762	0.911	0.509
		0.5	0.42	0.2	0.837	0.343	0.623	0.148
				0.5	0.890	0.485	0.727	0.227
				0.8	0.978	0.778	0.921	0.505
	1000	0.2	0.37	0.2	0.862	0.400	0.683	0.159
				0.5	0.869	0.464	0.701	0.209
				0.8	0.969	0.721	0.897	0.461
		0.5	0.30	0.2	0.826	0.331	0.611	0.132
				0.5	0.874	0.468	0.698	0.212
				0.8	0.958	0.737	0.895	0.462
3	500	0.2	0.51	0.2	0.955	0.783	0.894	0.566
				0.5	0.967	0.828	0.886	0.610
				0.8	0.986	0.910	0.957	0.761
		0.5	0.42	0.2	0.953	0.760	0.867	0.516
				0.5	0.940	0.754	0.828	0.507
				0.8	0.982	0.908	0.948	0.751
	1000	0.2	0.37	0.2	0.926	0.689	0.822	0.44
				0.5	0.922	0.746	0.813	0.498
				0.8	0.979	0.876	0.922	0.680
		0.5	0.30	0.2	0.909	0.654	0.781	0.398
				0.5	0.912	0.686	0.775	0.448
				0.8	0.969	0.841	0.905	0.654

Open in a new tab

Power for Inline graphic tests the null hypothesis that no traits are associated, and power for tests the null hypothesis that no more than 1 trait is associated.

3.4 Properties of sequential test

Following the simulation parameters in Table 4, we performed simulations to evaluate the properties of the sequential test. We performed simulations for five binary traits, allowing either two or three associated traits with effect size 0.3, which had 50% marginal power for a single trait association test with a sample size of 500, or 80% power for a sample size of 1000. The results in Table 5 illustrate the following main properties of the sequential test: (i) there was an increased chance of selecting at least one true Inline graphic as sample size increased, as the number of associated traits increased, or as the trait correlation increased; (ii) when the trait correlation increased, there was in increased chance of selecting both true and false ’s; (iii) the chance of selecting the exact true model depends on the number of associated traits, the trait correlation, the effect size, and the sample size; (iv) there is a trade-off for selecting true and false Inline graphic ’s that depends on the trait correlation. For three of five associated traits with high correlation ( there was a 56% chance of correctly selecting all three true ’s, although there was a 23% chance of selecting at least one false . For , there was a 27% chance of selecting all three true betas, but a lower 6% chance of selecting at least one false Inline graphic ; (v) the best scenario for correct model selection is a large sample size (determined by effect size), and moderately correlated traits (for increased power to select true betas, but correlation not too large to avoid also selecting false ’s).

Table 5.

Properties of sequential testing for five binary traits


No. associated traits	Trait correlation	No. selected coefficients	True ’s selected	False ’s selected	True ’s selected	False ’s selected
2	0.2	0	0.613	0.924	0.209	0.928
		1	0.329	0.074	0.499	0.065
		2	0.058	0.002	0.292	0.007
	0.5	0	0.528	0.902	0.181	0.900
		1	0.353	0.089	0.431	0.090
		2	0.119	0.005	0.388	0.010
		3	0.000	0.004	0.000	0.000
	0.8	0	0.410	0.868	0.069	0.917
		1	0.370	0.116	0.249	0.067
		2	0.220	0.015	0.682	0.009
		3	0.000	0.001	0.000	0.007
3	0.2	0	0.380	0.923	0.072	0.940
		1	0.372	0.070	0.238	0.057
		2	0.192	0.007	0.419	0.003
		3	0.056	0.000	0.271	0.000
	0.5	0	0.455	0.861	0.097	0.879
		1	0.264	0.128	0.176	0.109
		2	0.185	0.011	0.353	0.012
		3	0.096	0.000	0.374	0.000
	0.8	0	0.457	0.716	0.189	0.774
		1	0.215	0.180	0.079	0.086
		2	0.187	0.104	0.176	0.140
		3	0.141	0.000	0.556	0.000

Open in a new tab

Proportions out of 1000 simulations for selecting Inline graphic ’s according to true ( and false ( coefficients. Marginal power was 50% to detect a single trait for 500, and marginal power of 80% for 1000. Sequential testing was at .

3.5 Comparison with other methods

Other methods to evaluate pleiotropy focus on testing the null hypothesis that no traits are associated with a genetic variant, in contrast to our methods that formally test the null hypothesis of one or fewer associated traits, or that provide a sequential test for which traits are associated with a genetic variant. Furthermore, few methods allow for multiple binary traits or a mixture of types of traits. To compare our methods with existing methods, we focused on testing the null hypothesis that no traits are associated with a genetic variant, and compared our proposed Inline graphic statistic with two other methods: (i) MultiPhen (O’Reilly and others, 2012), which reverses the roles of traits and genetic variant by using ordinal logistic regression to regress the three categories of a genotype minor allele dosage on traits; (ii) a joint regression with outcome stratified samples proposed by Wu and others (2013). The Wu approach considers a quantitative trait and a categorical trait and creates a joint likelihood of the two traits by first modeling the probability of the categorical trait, conditional on the genetic variant, and then stratifying on the categorical trait and using linear regression within each strata to model the association of the quantitative trait with the genetic variant. To compare our Inline graphic statistic with the other methods, we simulated a single quantitative trait and a single binary trait, using the simulation methods described above. In this situation, our statistic and the MultiPhen ordinal regression statistic both have an asymptotic chi-square distribution with 2 df. In contrast, the stratified Wu approach has 3 df: one for the logistic regression of the binary trait on the genetic variant, and one for each of the two binary trait strata, for the regression of the quantitative trait on the genetic variant. This reveals the main difference or our approach versus Wu’s stratified approach. The stratified approach allows the effect of the genetic variant on the quantitative trait to vary across the binary trait strata.

The power of the three methods are illustrated in Table 6 for when the effect of a genetic variant on a quantitative trait is constant over the binary trait strata. In this scenario, our Inline graphic statistic and the MultiPhen approach had similar power, albeit slightly greater power for . Both of these methods had greater power than Wu’s stratified approach. The weaker power of Wu’s approach results from an extra degree of freedom when stratifying on the binary trait. For categorical traits with more than two levels, power is likely to diminish because of the extra parameters to test. In contrast, when the effect of a genetic variant on a quantitative trait varied over the binary trait strata, Table 7 illustrates that power of Wu’s method was greater than our Inline graphic statistic and the MultiPhen approach, with the power advantage depending on the magnitude of the heterogeneity across strata.

Table 6.

Power of methods when effect of genetic variant on quantitative trait is constant over binary trait strata

A. Genetic effect on only binary trait
No. subjects	Fraction cases	Trait correlation	Binary	Quantitative		MultiPhen	Wu stratified
No. subjects	Fraction cases	Trait correlation	OR			MultiPhen	Wu stratified
500	0.2	0.2	1.67	0	0.719	0.689	0.666
		0.5		0	0.743	0.707	0.688
		0.8		0	0.876	0.855	0.852
	0.5	0.2	1.52	0	0.644	0.652	0.596
		0.5		0	0.737	0.721	0.683
		0.8		0	0.868	0.855	0.828
1000	0.2	0.2	1.45	0	0.679	0.665	0.621
		0.5		0	0.747	0.723	0.698
		0.8		0	0.856	0.835	0.828
	0.5	0.2	1.35	0	0.651	0.628	0.593
		0.5		0	0.751	0.731	0.693
		0.8		0	0.873	0.859	0.840

B. Genetic effect on both binary and quantitative traits
500	0.2	0.2	1.1	0.18	0.514	0.490	0.442
		0.5	1.1	0.18	0.586	0.571	0.518
		0.8	1.1	0.18	0.750	0.735	0.695
	0.5	0.2	1.1	0.18	0.528	0.509	0.477
		0.5	1.1	0.18	0.636	0.629	0.563
		0.8	1.1	0.18	0.851	0.838	0.804
1000	0.2	0.2	1.1	0.18	0.873	0.865	0.834
		0.5	1.1	0.18	0.948	0.928	0.913
		0.8	1.1	0.18	0.989	0.984	0.983
	0.5	0.2	1.1	0.18	0.904	0.888	0.870
		0.5	1.1	0.18	0.957	0.950	0.945
		0.8	1.1	0.18	0.996	0.996	0.992

Open in a new tab

Table 7.

Power of methods when effect of genetic variant on quantitative trait varies over binary trait strata

A. OR 1.1 for binary trait. Quantitative trait effects are for binary stratum 0 and for binary stratum 1
No. subjects	Fraction cases	Trait correlation		MultiPhen	Wu stratified
500	0.2	0.2	0.098	0.093	0.121
		0.5	0.100	0.093	0.165
		0.8	0.143	0.125	0.276
	0.5	0.2	0.206	0.198	0.249
		0.5	0.264	0.246	0.351
		0.8	0.411	0.399	0.513
1000	0.2	0.2	0.131	0.125	0.249
		0.5	0.164	0.162	0.312
		0.8	0.201	0.203	0.495
	0.5	0.2	0.390	0.365	0.501
		0.5	0.512	0.500	0.643
		0.8	0.722	0.713	0.850

B. OR 1.1 for binary trait. Quantitative trait effects are for binary stratum 0 and for binary stratum 1
500	0.2	0.2	0.182	0.171	0.504
		0.5	0.225	0.212	0.618
		0.8	0.293	0.274	0.852
	0.5	0.2	0.695	0.677	0.899
		0.5	0.802	0.774	0.953
		0.8	0.924	0.922	0.993
1000	0.2	0.2	0.349	0.336	0.828
		0.5	0.428	0.393	0.912
		0.8	0.545	0.519	0.993
	0.5	0.2	0.945	0.940	0.994
		0.5	0.976	0.968	0.999
		0.8	1.000	0.999	1.000

Open in a new tab

3.6 Application to GWAS symptoms of depression

When we applied the multivariate Inline graphic statistic to the GWAS symptoms of depression, treating the 16 response items as ordinal response traits, and the dose of the minor allele as a predictor variable, we found the quantile–quantile plot of the GWAS results to suggest an excess of small p-values, consistent with our simulation results. For this reason, we treated the 16 response items as Gaussian random variables and found the resulting quantile–quantile plot to be more acceptable. Although none of the individual Inline graphic statistics for the SNPs achieved genome-wide significance (i.e., p-value < 5E8), the smallest p-value of 4.0E7 for the imputed SNP rs11635365 is intriguing, because this common variant (minor allele frequency of 0.38) is within an intron of a gene. In contrast to our multivariate analysis, the most common way to analyze the HAM-D assessment tool data is to sum all response items to create a single score. We illustrate the univariate associations of rs11635365 with each of the 16 response items in Table 8, along with the association of the sum of all 16 response items. This table illustrates that the multivariate Inline graphic statistic results in a much smaller p-value than that for the sum of 16 responses, as well as for any of the individual HAM-D items. Some causes of this difference are likely the different signs of the regression coefficients as well as the correlation structure among the response items. The 120 correlations for the HAM-D data ranged Inline graphic 0.08 to 0.55, with the majority of correlations between 0 and 0.2.

Table 8.

Univariate regression results for top-hit SNP rs11635365

Item	HAM-D item	Coefficient	Standard error	p-value
1	Initial insomnia		0.05	0.0270
2	Middle insomnia		0.04	0.2426
3	Delayed (late) insomnia		0.05	0.6954
4	Depressed mood		0.04	0.2541
5	Psychic anxiety		0.04	0.5307
6	Appetite		0.05	0.5137
7	Weight loss	0.08	0.04	0.0748
8	Guilt feelings and delusions		0.05	0.3907
9	Suicide		0.05	0.3725
10	Work and activities (interests)		0.04	0.0144
11	Somatic symptoms, general (energy)	0.06	0.03	0.0297
12	Libido		0.04	0.6366
13	Retardation		0.05	0.1021
14	Agitation		0.04	0.0004
15	Anxiety (somatic)		0.04	0.0166
16	Hypochondriasis		0.04	0.0005
Total	Sum of responses		0.27	0.0016

Open in a new tab

To illustrate use of the sequential test procedure, we illustrate results in Table 9 if one were to use a nominal Type-I error rate of 0.05 to decide when to stop sequential testing. Based on this threshold, the results in Table 9 show that HAM-D response items 7 (weight loss), 11 (somatic symptoms), 14 (agitation), and 16 (hypochondriasis) were the main drivers of the statistical association. It is interesting that item 11 was selected at the first step, despite this item not having the smallest marginal p-value. This is likely due to the signs of the regression coefficients and their correlation structure.

Table 9.

Results from sequential test of HAM-D response items with SNP rs11635365

Sequential step: null no. associated traits	p-value	HAM-D item indices that generate the minimum for the statistic ; the traits that provide best model fit under constrained hypothesis
0	4.0E7
1	4.9E5	11
2	1.5E3	11,14
3	2.0E2	11,14,16
4	1.7E1	7,11,14,16 (Stop, p-value > 0.05)

Open in a new tab

4. Discussion

Understanding genetic pleiotropy can provide insights to complex biological mechanisms of genes and aid development of pharmacologic targets. Yet, most statistical methods to assess pleiotropy have resorted to ad hoc comparison of univariate statistical tests or multivariate methods that test the null hypothesis of no trait associations. We recently developed a formal statistical test of pleiotropy based on likelihood ratio tests for multivariate normally distributed traits (Schaid and others, 2016), and now extend our approach in this current work to traits that can be modeled by generalized linear models. Our test of pleiotropy, and its corresponding sequential testing framework, allows for a variety of traits commonly encountered in human genetic studies, such as quantitative, binary, or ordinal traits as well as a mixture of different types of traits. Our simulations show that the proposed methods work well for binary and quantitative traits, or a mixture of these types of traits. In contrast, our methods can have inflated Type-I error rates for ordinal traits if the sample size is not large. To correct this, our simulation results show that using treating ordinal traits as Gaussian is a reasonable strategy, providing controlled Type-I error rates while having reasonable power.

We proposed a sequential testing procedure, where the null hypothesis of no associated traits could be tested first (using standard multivariate regression methods), and if significant, followed by sequentially testing the null of Inline graphic associated traits (, until the sequential test fails to reject the null hypothesis. This approach provides a way to assess the number of traits associated with a genetic variant, accounting for the correlations among the traits.

As we emphasized elsewhere (Schaid and others, 2016), it is possible to extend our approach to genetically related subjects by using an Inline graphic matrix to account for relationships. For pedigree data, contains diagonal elements where is the inbreeding coefficient for subject , and off-diagonal elements . The parameter is the kinship coefficient between individuals and , the probability that a randomly chosen allele at a given locus from individual Inline graphic is identical by descent to a randomly chosen allele from individual , conditional on their ancestral relationship. For data with population structure, is an estimate of genetic relationships (Schaid and others, 2013). Then, the covariance between subjects and , for traits and , is Inline graphic .

By using simulations to compare our Inline graphic statistic with MultiPhen ordinal regression and Wu’s stratified method, we found similar power between and MultiPhen. The main advantage of our approach is that it allows the multiple traits to depend on different sets of adjusting covariates, in contrast to MultiPhen which would require a common set of covariates, along with the traits, to serve as predictor variables in ordinal regression. The relative power of Wu’s stratified approach depended on whether the effect of the genetic variant on a quantitative trait varied over strata, with weaker power in the absence of heterogeneity, but greater power as heterogeneity increased. A limitation of the stratified approach would be extending it to multiple binary traits, resulting in highly stratified samples. In contrast, our approach easily handles multiple binary traits by testing the marginal effects of a genetic variant on the multiple traits.

When we applied our methods to a GWAS of symptoms of major depression, no SNP associations achieved genome wide significance, although the SNP with the smallest p-value of 4.0E Inline graphic 7 was interesting because this SNP is within an intron of the gene SH3GL3. This is a protein-coding gene on chromosome 4 that is preferentially expressed in brain and testis and interacts with the Huntington’s disease gene, HTT (Sittler and others, 1998). Huntington disease is inherited in an autosomal dominant manner, due primarily to genetic mutations in the HTT gene on chromosome 4. Although our subjects do not have Huntington disease, it is important to understand that some symptoms of Huntington disease include various emotional and psychiatric problems, of which the most common is depression; depression is not a reaction to having Huntington’s disease diagnosis, but rather appears to occur because of brain injury and brain changes from the disease. The characteristics of the interaction between SH3GL3 and HTT suggest that SH3GL3 could be involved in the selective neuronal cell death in patients with Huntington’s disease (Sittler and others, 1998). This raises the question of whether our results suggest that SH3GL3 also plays a role in major depression, independent of Huntington’s disease. Further studies, preferably functional studies, are required to evaluate whether genetic variants in SH3GL3 are associated with major depression, and in particular whether this gene is associated with certain features of depression, as suggested by our sequential analyses. To demonstrate how the sequential testing procedure can be used, we used a p-value threshold of 0.05 to stop the sequential testing. For analysis of GWAS data, this threshold might be too liberal, depending on how many SNPs are found to be associated with traits at prior steps in the sequential testing framework. In practice, it seems sensible to account for the number of tests at each sequential step by using a Bonferroni correction for the number of tests performed at each step.

Software: The software package “pleio” is available as an R package from the CRAN web site (https://cran.r-project.org/web/packages/pleio/index.html). This package contains our original pleio.q functions for quantitative traits and pleio.glm functions for the methods discussed in this current work.

Reproducible Research: The data and scripts used for analysis of the depression study are available at Github: https://github.com/sinnweja/rpleio/releases/tag/v1.0.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(299.9KB, docx)}

Supplementary Table

Click here for additional data file.^{(90.4KB, docx)}

Acknowledgments

This research was supported by the U.S. Public Health Service, National Institutes of Health, contract grants numbers GM065450, GM28157, and GM61388. We gratefully acknowledge use of the depression data from the Mayo Clinic Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study. Finally, we would like to thank the patients who participated in the PGRN-AMPS SSRI trial as well as the Mayo Clinic psychiatrists who cared for them. Conflict of Interest: None declared.

Funding

Funding for this work was provided by the U.S. National Institutes of Health, contract grant number GM065450.

References

Biernacka J. M., Sangkuhl K., Jenkins G., Whaley R. M., Barman P., Batzler A., Altman R. B., Arolt V., Brockmoller J., Chen C. H. and others. (2015). The International SSRI Pharmacogenomics Consortium (ISPC): a genome-wide association study of antidepressant treatment response. Translational psychiatry 5, e553. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cotsapas C., Voight B. F., Rossin E., Lage K., Neale B. M., Wallace C., Abecasis G. R., Barrett J. C., Behrens T., Cho J. and others. (2011). Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics 7(8), e1002254. [DOI] [PMC free article] [PubMed] [Google Scholar]
Diggle P., Heagerty P., Liang K-Y., Zeger S. (2009). Analysis of Longitudinal Data. Oxford: Oxford University Press. [Google Scholar]
Ferreira M. A., Purcell S. M. (2009). A multivariate test of association. Bioinformatics 25(1), 132–133. [DOI] [PubMed] [Google Scholar]
Galesloot T. E., van Steen K., Kiemeney L. A., Janss L. L., Vermeulen S. H. (2014). A comparison of multivariate genome-wide association methods. PloS one 9(4), e95923. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ji Y., Biernacka J. M., Hebbring S., Chai Y., Jenkins G. D., Batzler A., Snyder K. A., Drews M. S., Desta Z., Flockhart D. and others. (2013). Pharmacogenomics of selective serotonin reuptake inhibitor treatment for major depressive disorder: genome-wide associations and functional genomics. The pharmacogenomics journal 13(5), 456–463. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J., Pei Y., Papasian C. J., Deng H. W. (2009). Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genetic epidemiology 33(3), 217–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maity A., Sullivan P. F., Tzeng J. Y. (2012). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology 36(7), 686–695. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marchini J., Howie B., Myers S., McVean G., Donnelly P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913. [DOI] [PubMed] [Google Scholar]
O’Reilly P. F., Hoggart C. J., Pomyen Y., Calboli F. C., Elliott P., Jarvelin M. R., Coin L. J. (2012). MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7(5), e34861. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roy J., Lin X., Ryan L. M. (2003). Scaled marginal models for multiple continuous outcomes. Biostatistics 4(3), 371–383. [DOI] [PubMed] [Google Scholar]
Schaid D. J., McDonnell S. K., Sinnwell J. P., Thibodeau S. N. (2013). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology 37, 409–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schaid D., Tong X., Larrabee B., Kennedy R., Poland G., Sinnwell J. (2016). Statistical methods for testing genetic pleiotropy. Genetics 204, 483–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schifano E. D., Li L., Christiani D. C., Lin X. (2013). Genome-wide association analysis for multiple continuous secondary phenotypes. American journal of human genetics 92, 744–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schriner D. (2012). Moving toward system genetics through multiple trait analysis in genome-wide association studies. Frontiers in Genetics 16, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sittler A., Walter S., Wedemeyer N, Hasenbank R, Scherzinger E., Eickhoff H., Bates G. P., Lehrach H., Wanker E. E. (1998). SH3GL3 associates with the Huntingtin exon 1 protein and promotes the formation of polygln-containing protein aggregates. Molecular cell 2, 427–436. [DOI] [PubMed] [Google Scholar]
Solovieff N., Cotsapas C., Lee P. H., Purcell S. M , Smoller J. W. (2013). Pleiotropy in complex traits: challenges and strategies. Nature reviews. Genetics 14, 483–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens M. (2013). A unified framework for association analysis with multiple related phenotypes. PloS one 8, e65245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu M. C., Kraft P., Epstein M. P., Taylor D. M., Chanock S. J., Hunter D. J., Lin X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu C. O., Zheng G., Kwak M. (2013). A joint regression analysis for genetic association studies with outcome stratified samples. Biometrics 69, 417–426. [DOI] [PubMed] [Google Scholar]
Xu Z., Pan W. (2015). Approximate score-based testing with application to multivariate trait association analysis. Genetic epidemiology 39, 469–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Q., Wang Y. (2012). Methods for analyzing multivariate phenotypes in genetic association studies. Journal of probability and statistics 2012, 652569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y., Xu Z., Shen X., Pan W. (2014). Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. NeuroImage 96, 309–325. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(299.9KB, docx)}

Supplementary Table

Click here for additional data file.^{(90.4KB, docx)}

[B1] Biernacka J. M., Sangkuhl K., Jenkins G., Whaley R. M., Barman P., Batzler A., Altman R. B., Arolt V., Brockmoller J., Chen C. H. and others. (2015). The International SSRI Pharmacogenomics Consortium (ISPC): a genome-wide association study of antidepressant treatment response. Translational psychiatry 5, e553. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Cotsapas C., Voight B. F., Rossin E., Lage K., Neale B. M., Wallace C., Abecasis G. R., Barrett J. C., Behrens T., Cho J. and others. (2011). Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics 7(8), e1002254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Diggle P., Heagerty P., Liang K-Y., Zeger S. (2009). Analysis of Longitudinal Data. Oxford: Oxford University Press. [Google Scholar]

[B4] Ferreira M. A., Purcell S. M. (2009). A multivariate test of association. Bioinformatics 25(1), 132–133. [DOI] [PubMed] [Google Scholar]

[B5] Galesloot T. E., van Steen K., Kiemeney L. A., Janss L. L., Vermeulen S. H. (2014). A comparison of multivariate genome-wide association methods. PloS one 9(4), e95923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Ji Y., Biernacka J. M., Hebbring S., Chai Y., Jenkins G. D., Batzler A., Snyder K. A., Drews M. S., Desta Z., Flockhart D. and others. (2013). Pharmacogenomics of selective serotonin reuptake inhibitor treatment for major depressive disorder: genome-wide associations and functional genomics. The pharmacogenomics journal 13(5), 456–463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Liu J., Pei Y., Papasian C. J., Deng H. W. (2009). Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genetic epidemiology 33(3), 217–227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Maity A., Sullivan P. F., Tzeng J. Y. (2012). Multivariate phenotype association analysis by marker-set kernel machine regression. Genetic epidemiology 36(7), 686–695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Marchini J., Howie B., Myers S., McVean G., Donnelly P. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913. [DOI] [PubMed] [Google Scholar]

[B10] O’Reilly P. F., Hoggart C. J., Pomyen Y., Calboli F. C., Elliott P., Jarvelin M. R., Coin L. J. (2012). MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PloS one 7(5), e34861. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Roy J., Lin X., Ryan L. M. (2003). Scaled marginal models for multiple continuous outcomes. Biostatistics 4(3), 371–383. [DOI] [PubMed] [Google Scholar]

[B12] Schaid D. J., McDonnell S. K., Sinnwell J. P., Thibodeau S. N. (2013). Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genetic epidemiology 37, 409–418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Schaid D., Tong X., Larrabee B., Kennedy R., Poland G., Sinnwell J. (2016). Statistical methods for testing genetic pleiotropy. Genetics 204, 483–497. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] Schifano E. D., Li L., Christiani D. C., Lin X. (2013). Genome-wide association analysis for multiple continuous secondary phenotypes. American journal of human genetics 92, 744–759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Schriner D. (2012). Moving toward system genetics through multiple trait analysis in genome-wide association studies. Frontiers in Genetics 16, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] Sittler A., Walter S., Wedemeyer N, Hasenbank R, Scherzinger E., Eickhoff H., Bates G. P., Lehrach H., Wanker E. E. (1998). SH3GL3 associates with the Huntingtin exon 1 protein and promotes the formation of polygln-containing protein aggregates. Molecular cell 2, 427–436. [DOI] [PubMed] [Google Scholar]

[B17] Solovieff N., Cotsapas C., Lee P. H., Purcell S. M , Smoller J. W. (2013). Pleiotropy in complex traits: challenges and strategies. Nature reviews. Genetics 14, 483–495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Stephens M. (2013). A unified framework for association analysis with multiple related phenotypes. PloS one 8, e65245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Wu M. C., Kraft P., Epstein M. P., Taylor D. M., Chanock S. J., Hunter D. J., Lin X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929–942. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Wu C. O., Zheng G., Kwak M. (2013). A joint regression analysis for genetic association studies with outcome stratified samples. Biometrics 69, 417–426. [DOI] [PubMed] [Google Scholar]

[B21] Xu Z., Pan W. (2015). Approximate score-based testing with application to multivariate trait association analysis. Genetic epidemiology 39, 469–479. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Yang Q., Wang Y. (2012). Methods for analyzing multivariate phenotypes in genetic association studies. Journal of probability and statistics 2012, 652569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Zhang Y., Xu Z., Shen X., Pan W. (2014). Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. NeuroImage 96, 309–325. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multivariate generalized linear model for genetic pleiotropy

Daniel J Schaid

Xingwei Tong

Anthony Batzler

Jason P Sinnwell

Jiang Qing

Joanna M Biernacka

Summary

1. Introduction

2. Methods

2.1 Model and notation

2.2 Parameter estimation

2.3 Modification for ordinal traits

2.4 Hypothesis testing

2.5 Testing pleiotropy

2.6 Sequential testing

2.7 Simulations

2.8 Data application

3. Results

3.1 Type-I error of

Table 1.

Fig. 1.

3.2 Type-I error of

Table 2.

Table 3.

3.3 Power for pleiotropy

Table 4.

3.4 Properties of sequential test

Table 5.

3.5 Comparison with other methods

Table 6.

Table 7.

3.6 Application to GWAS symptoms of depression

Table 8.

Table 9.

4. Discussion

Supplementary Material

Acknowledgments

Funding

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases