Estimating and Testing Pleiotropy of Single Genetic Variant for Two Quantitative Traits

Qunyuan Zhang; Mary Feitosa; Ingrid B Borecki

doi:10.1002/gepi.21837

. Author manuscript; available in PMC: 2015 Sep 1.

Published in final edited form as: Genet Epidemiol. 2014 Jul 12;38(6):523–530. doi: 10.1002/gepi.21837

Estimating and Testing Pleiotropy of Single Genetic Variant for Two Quantitative Traits

Qunyuan Zhang ^1,^*, Mary Feitosa ¹, Ingrid B Borecki ¹

PMCID: PMC4169079 NIHMSID: NIHMS615100 PMID: 25044106

Abstract

Along with the accumulated data of genetic variants and biomedical phenotypes in the genome era, statistical identification of pleiotropy is of growing interest for dissecting and understanding genetic correlations between complex traits. We proposed a novel method for estimating and testing pleiotropic effect of a genetic variant on two quantitative traits. Based on a covariance decomposition and estimation, our method quantifies pleiotropy as the portion of between-trait correlation explained by the same genetic variant. Unlike most multiple-trait methods that assess potential pleiotropy (i.e., whether a variant contributes to at least one trait), our method formulates a statistic that tests exact pleiotropy (i.e., whether a variant contributes to both of two traits). We developed two approaches (a regression approach and a bootstrapping approach) for such test and investigated their statistical properties, in comparison with other potential pleiotropy test methods. Our simulation shows that the regression approach produces correct p-values under both the complete null (i.e., a variant has no effect on both two traits) and the incomplete null (i.e., a variant has effect on only one of two traits), but requires large sample sizes to achieve a good power, when the bootstrapping approach has a better power and produces conservative p-values under the complete null. We demonstrate our method for detecting exact pleiotropy using a real GWAS dataset. Our method provides an easy-to-implement tool for measuring, testing and understanding the pleiotropic effect of a single variant on the correlation architecture of two complex traits.

Keywords: pleiotropy, covariance decomposition, genetic correlation, regression, bootstrap

Introduction

Pleiotropy is a biological phenomenon that a single genetic variant affects two or more phenotypic traits. The term was introduced into the literature a century ago, and since then, has had an important influence on the fields of evolutionary biology, physiology and genetics [Stearns 2010]. In recent years, increasing numbers of polymorphic variants across the human genome have been associated with many complex traits, and there is growing interest in the identification of pleiotropic effects, especially in understanding genetic and molecular basis of the correlated architecture among complex traits.

Genome-wide association studies (GWAS) have produced very rich data on both genotypes and phenotypes, providing unprecedented opportunities for investigation of pleiotropy. However, statistical methods for identifying and characterizing pleiotropy are still quite insufficient and limited. A variety of multi-trait analysis methods, such as principal components based methods[Bensen, et al. 2003; Klei, et al. 2008], FBAT-GEE[Lange, et al. 2003], EGEE[Liu, et al. 2009], canonical correlation analysis (CCA)[Ferreira and Purcell 2009], combined multivariate (CMV) analysis[Medland and Neale 2010], univariate-statistic combined test[Yang, et al. 2010], PRIMe[Huang, et al. 2011], parameterized multi-trait mixed model (MTMM)[Korte, et al. 2012], and correlated meta-analysis[Province and Borecki 2013], have been proposed and can be used for initial screen of potential pleiotropy, such methods, however, are not strictly designed for testing exact pleiotropy. The presence of pleiotropy may increase the power of these methods in detecting overall association, but a significant test may not necessarily indicate pleiotropy, because most of these methods are based upon the null hypothesis that a variant affects none of the traits and do not include the incomplete null in which the variant affecting only one of the traits. Therefore the null is not correctly specified. A proper pleiotropy test should answer the question whether a variant contributes to two or more traits. Another limitation of most existing methods is the lack of well-defined parameter and estimator for the pleiotropic effect, thus help very little in assessing the magnitude of pleiotropy and understanding how pleiotropy influences the relationship between traits.

We propose in this paper a novel, easy-to-implement approach for estimating and testing exact pleiotropy of a variant on two correlated traits, in which the pleiotropic effect of a variant is estimated as the portion of between-trait correlation that can be explained by the variant, and then tested under the proper null hypothesis of no pleiotropy (i.e., the variant does not contribute to both traits), against the proper alternative hypothesis of pleitropy (i.e., the variant contributes to both traits). We investigate statistical properties of our method through simulation and demonstrate its application using real data. Referred as to Pleiotropy Estimation and Test (PET), the proposed method provides a novel tool for clearly characterizing and properly testing pleiotropy between two traits. We compare its performance to other possible approaches and assess power and sample size requirements.

Methods

Definitions and Models

In order to fit our method to a clearly defined scenario, we first define pleiotropy as independent effects of the same variant on two traits. Here independent effect is just a statistical definition. Biologically, it may include effects that propagates (or are passed) from a variant to one of the two traits through different paths without involving another trait. This definition can be described by two separate linear models:

Y_{1} = α_{1} + β_{1} X + ε_{1}

(1)

Y_{2} = α_{2} + β_{2} X + ε_{2}

(2)

where X is genotype data of a variant (usually coded as 0,1,2), Y₁ and Y₂ are observations of two normally distributed quantitative traits; α_i, β_i and ε_i are intercept, effect (of X on Y_i) and residual, respectively, with subscripts 1 and 2 indicating two traits. The residuals in the models include independent, random errors, as well as other unobserved genetic and environmental effects that may cause a covariance between ε₁ and ε₂ (denoted by $σ_{12}^{ε}$ ). Assuming the covariances between X and ε₁ and between X and ε₂ are 0, the covariance between Y₁ and Y₂ (denoted by $σ_{12}^{Y}$ ) can be decomposed into two components (Appendix A):

σ_{12}^{Y} = σ_{12}^{ε} + β_{1} β_{2} σ_{X}^{2}

(3)

The first component, $σ_{12}^{ε}$ , is the covariance between residuals of the two models; the second, $β_{1} β_{2} σ_{X}^{2}$ , is the covariance caused by pleiotropy. Here $σ_{X}^{2}$ is the variance of X.

From equation (3) we can see that pleiotropy is a source of covariation between two traits. Base on this notion, we define ρ, the portion of correlation between Y₁ and Y₂ that can be explained by the genetic effects (i.e., β₁ and β₂) of X, as a metric of pleiotropy, termed the pleiotropy correlation coefficient (PCC). It is the standardized between-trait covariance explained by β₁ and β₂, which can be expressed as a function β₁ and β₂:

ρ = \frac{β_{1} β_{2} σ_{X}^{2}}{σ_{1} σ_{2}}

(4)

where σ₁ and σ₂ are the standard deviations of Y₁ and Y₂, respectively.

To Estimate and test ρ, we have developed two approaches, a regression approach and a bootstrap approach, described below.

A Regression Approach

Given the sample data of X, Y₁ and Y₂, ρ can be estimated by simply replacing parameters in equation (4) with corresponding statistics.

\hat{ρ} = \frac{{\hat{β}}_{1} {\hat{β}}_{2} {\hat{σ}}_{X}^{2}}{{\hat{σ}}_{1} {\hat{σ}}_{2}}

(5)

When σ̂_X, σ̂₁ and σ̂₂ can be estimated from the sample data, the estimation of β₁ and β₂ may be biased if they are simply obtained by separate regressions of Y₁ and Y₂ on X, especially when Y₁ and Y₂ are strongly correlated. For example, even β₁ =0, a simple regression coefficient of Y₁ on X will not be 0 if β₂ ≠0 and $σ_{12}^{ε} \neq 0$ , because of the indirect effect produced by β₂ and passed to Y₁ from Y₂ through the correlation between Y₁ and Y₂.

Instead of estimating β₁ and β₂ separately, we propose to estimate β₂ β₂ as one parameter in a composite model of the product of Y₁ and Y₂.

Y^{'} = α^{'} + γ^{'} X + β^{'} X^{2} + ε^{'}

(6)

This model is obtained through a multiplication of models (1) and (2), where Y′ = Y₁ Y₂, α′ = α₁ α₂, β′ = β₂ β₂, γ′ = α₁ β₂ + α₂β₁ and ε′ is the composite residual (Appendix B). The composite parameter β′ can be estimated by a regression of Y₁Y₂ on X and X².

Significance of PPC can be determined by testing the null hypothesis, H₀: ρ = 0, versus the alternative hypothesis, H_A: ρ ≠0. According to the definition of ρ, H_A is equivalent to the hypothesis of β′ ≠0 (i.e., β₁ ≠0 and β₂ ≠0) and H₀ equivalent to β′ =0 (i.e., β₁ =0 and/or β₂ =0). Here H₀ is a compound null involving two types of possible null hypotheses, the complete null $H_{0}^{0} : β_{1} = β_{2} = 0$ , and the incomplete nulls $H_{0}^{′}$ (including $H_{0}^{1} : β_{1} = 0$ , β₂ ≠0 and $H_{0}^{2} : β_{1} \neq 0$ , β₂ = 0). Instead of testing these nulls separately, we propose to perform a universal test for all the nulls ( $H_{0}^{0}$ , $H_{0}^{1}$ and $H_{0}^{2}$ ) against the alternative (β₁ ≠0 and β₂ ≠0) through a test of β′ =0 vs. β′ ≠0 based on the model (6).

Since some features of the composite model (6) may not strictly satisfy the assumptions of regular regression (such as normality and homogeneity of ε′ and independence between parameters), we chose a robust regression to estimate and test β′, which is based on Huber’s method[Huber 1981] and implemented in the R package MASS.

A Bootstrap Approach

Alternatively, instead of estimating β₁β₂, we propose to estimate $β_{1} β_{2} σ_{X}^{2}$ (denoted by δ) as $\hat{δ} = {\hat{σ}}_{12}^{Y} - {\hat{σ}}_{12}^{ε}$ and then ρ as), where ${\hat{σ}}_{12}^{Y}$ , ${\hat{σ}}_{12}^{ε}$ , σ̂₁ and σ̂₂ are estimated covariance components in equations (3) and (5). When ${\hat{σ}}_{12}^{Y}$ is calculated as the sample covariance between two traits, ${\hat{σ}}_{12}^{ε}$ can be obtained through a bivariate model (Appendix C).

Similar to the test of β′, significance of PPC can be determined by testing the null hypothesis, H₀:δ =0, versus the alternative hypothesis, H_A:δ ≠0. According to the composition of δ (i.e., $δ = β_{1} β_{2} σ_{X}^{2}$ ), H_A is equivalent to the hypothesis of β₁ ≠0 and β₂ ≠0, whereas H₀ includes both the complete null $H_{0}^{0}$ and the incomplete null $H_{0}^{′}$ . Since the analytical distribution of δ̂ under compound null is unknown and commonly-used permutation test is not applicable, we propose to calculate p-value via bootstrapping. Given the data of X, Y₁ and Y₂, a two-tailed p-value is defined as 2 times the minimum of P(D_b >0) and P(D_b <0), where D_b is a set of δ̂ values obtained by bootstrapping and P means the observed percentage.

For the convenience of discussion, we refer to the proposed method as Pleiotropy Estimation and Test (PET), and refer to the regression version as PET-R and the bootstrap version as PET-B.

Simulation

To investigate statistical properties of the PET method, we simulated data under a variety of parameter configurations based on the models (1) and (2). For a given sample size N, we first simulated genotype data (X) of N subjects for a variant under Hardy-Weinberg equilibrium with a fixed or random minor allele frequency (MAF), and then generated two quantitative traits based on the models (1) and (2) with α_i =0 and different combinations of β₁ and β₂ for different hypotheses (i.e. $H_{0}^{0}$ , $H_{0}^{′}$ and H_A). To simulate the correlation between Y₁ and Y₂, ε₁ and ε₂ were sampled from a bivariate normal distribution with a mean vector of (0,0) and a 2×2 variance-covariance matrix, in which the variance components were set to 1 and the covariance components (r) were set to non-zero (fixed or random) values.

Methods for Comparison

As a comparison, we included in this paper other two multiple-trait analysis methods, canonical correlation analysis (CCA) and correlated meta-analysis (CMA). CCA is a multivariate approach for analyzing correlation between two groups of variables. We utilized it to test the overall association between a variant and two traits, by calculating Wilk’s statistic through an eigenanalysis of raw data and obtaining p-value based on a simplified F-approximation [Ferreira and Purcell 2009]. CMA is a meta-analysis approach that takes between-trait correlation into account and combines statistics from individual traits into a summarized statistic and tests its significance through a correlated multivariate normal distribution [Province and Borecki 2013]. Although CCA and CMA test the same statistical hypothesis of potential pleiotropy (i.e., $H_{0}^{0}$ vs $H_{0}^{′}$ and H_A), they require different data as input. When CCA requires individual subjects’ data and is performed for each single variant with no use of data from other variants, CMA uses summarized data (p-values) from each trait and requires large number of variants to estimate the between-trait correlation.

In addition, to understand the difference in power between single-trait and multiple-trait analyses, we also include in the paper two other simple tests based on single-trait regression analysis, one testing the association between a variant and a trait (based on one p-value, denoted by P1), another testing the association between a variant and both two traits (based on two separate p-values, denoted by P2). The P2 test is a commonly-used, simple approach for detecting pleiotropy, however, a significant P2 test (i.e., both two p-values are less than a given cutoff) may not indicate exact pleiotropy, because when both two p-values are significant, one of them can be caused by indirect effect of a variant through between-trait correlation, not by pleiotropy. Strictly, P2 can be considered as an exact pleiotropy test only when there is no residual correlation between two traits (i.e., r=0).

Since the statistical hypotheses to be tested in CCA, CMA, P1 and P2 are different from PET, we have no intention to make a competitive comparison between them. Our purpose of including these methods is to help understanding some important features of pleiotropy testing through a contrast.

Results

Type-I Error

We applied CCA, CMA, PET-R and PET-B to the data simulated under both complete null ( $H_{0}^{0}$ ) and incomplete null ( $H_{0}^{′}$ ), and investigated their type-1 error characteristics by Q-Q plots showing the comparison between the observed and uniformly distributed p-values. The two existing methods, CCA and CMA, produce uniformly distributed p-values under $H_{0}^{0}$ (Fig.1a,1c) but are significantly inflated when testing exact pleiotropy under $H_{0}^{′}$ (Fig.1b,1d). This is because, as mentioned earlier on most existing multi-phenotype analysis methods, that they are originally proposed to test potential pleiotropy (i.e., $H_{0}^{0}$ against both $H_{0}^{′}$ and H_A), not exact pleitropy (i.e., $H_{0}^{0}$ and $H_{0}^{′}$ against H_A). The PET-R method produces expected, non-inflated p-values under both $H_{0}^{0}$ (Fig.1e) and $H_{0}^{′}$ (Fig.1f), when PET-B produces expected p-values under $H_{0}^{′}$ (Fig.1g) and conservative p-values under $H_{0}^{0}$ (Fig.1h). The results indicate that PET has a good control for false positives and is more appropriate for testing exact pleiotropy.

In each plot, x-axis represents expected p-values and y-axis represents observed p-values, both in the negative log10 scale. The p-values are obtained by applying CCA, CMA, PET-R and PET-B to 2000 replications of data simulated for a sample size of N=5000, under complete null $H_{0}^{0}$ and incomplete null $H_{0}^{'}$ , separately. In each replication of simulation, MAF is randomly drawn between 0.2~0.5, r between 0.2~0.8; when β₁ =0 is used for all simulations, β₂ is set to 0 (for $H_{0}^{0}$ ) or randomly drawn between -0.3 to 0.3 (for $H_{0}^{'}$ ). The results presented here include: a) CCA under $H_{0}^{0}$ ; b) CCA under $H_{0}^{'}$ ; c) CMA under $H_{0}^{0}$ ; d) CMA under $H_{0}^{'}$ ; e) PET-R under $H_{0}^{0}$ ; f) PET-R under $H_{0}^{'}$ ; g) PET-B under $H_{0}^{0}$ ; h) PET-B under $H_{0}^{'}$ .

Power

We estimated the statistical power of CCA, CMA, PET-R, PET-B, P1 and P2 tests through simulation under the alternative hypothesis (H_A : β₁ ≠0 and β₂ ≠0) of exact pleiotropy. The estimation (Table 1) shows that when β₁ ≠0 and β₂ ≠0, in terms of detecting association, two-trait based testing methods (CCA and CMA) have higher power than single-trait based methods (P1 and P2); potential pleiotropy tests (CCA, CMA and P2) and single-trait association test (P1) higher than exact pleiotropy tests (PET-R and PET-B). These results indicate that detecting exact pleiotropy is more difficult and requires larger sample size than detecting potential pleiotropy or single-trait association. This property is due to the nature of PET, because it needs to distinguish H₀, $H_{0}^{1}$ and $H_{0}^{2}$ from H_A, which is different from other methods.

Table I.

Estimated power of CCA, CMA, P1, P2, PET-R and PET-B

α	N	CCA	CMA	P1	P2^*	PET-R	PET-B
0.05	500	0.684	0.780	0.655	0.503	0.079	0.472
	1000	0.946	0.974	0.919	0.846	0.083	0.844
	5000	>0.99	>0.99	>0.99	>0.99	0.111	>0.99
	10000	>0.99	>0.99	>0.99	>0.99	0.208	>0.99
	50000	>0.99	>0.99	>0.99	>0.99	0.444	>0.99

0.01	500	0.445	0.565	0.422	0.244	0.019	0.215
	1000	0.826	0.907	0.775	0.640	0.029	0.618
	5000	>0.99	>0.99	>0.99	>0.99	0.058	>0.99
	10000	>0.99	>0.99	>0.99	>0.99	0.130	>0.99
	50000	>0.99	>0.99	>0.99	>0.99	0.220	>0.99

Open in a new tab

Power for each method at different sample sizes and α levels are estimated through 2000 replications of simulation for a locus with MAF=0.5 and two traits with r=0.5, under an exact pleiotropy hypothesis H_A of β₁ = β =0.15 (resulting in an approximate heritability of 0.01 for each traits).

Significance is determined when both two single-trait test p-values are equal to or less than the α level.

Comparing the two PET approaches, PET-R’s power is significantly lower and thus requires very large sample size, making its application to real data unpractical; PET-B has a better power, which is acceptable in practice. For example, when sample size N=1000, a pleiotropic effect resulting in an approximate heritability of 1% for each traits (i.e.,β₁ = β₂ =0.15) can be detected by PET-B with a power of 0.844 or 0.618 at a significance level of 0.05 or 0.01 (Table 1).

Besides sample size (N), many other factors can affect the power of PET. Among these factors, correlation (r) between traits, MAF of variant, variant effects on traits (β₁ and β₂) and the difference between variant effects (|β₁ - β₂|) are four major ones. We investigated the power of PET-B through simulations with varied factors and observed that the power of PET increases with the increase of r, MAF, β₁ and β₂ (Fig. 2a,2b,2c); however, when the product of β₁ and β₂ is fixed, the power of PET increases with the decrease of |β₁ − β₂| (Fig.2d) and reaches its maximum when β₁ = β₂, indicating that it is relatively easier to detect a pleiotropy when a variant has similar effects on two traits.

Power for each scenario is estimated from 10000 replications of simulation with N=1000, and a) β₁ = β₂ =0.15, MAF=0.5, r=0.1~0.9; b) β₁ = β₂ =0.15, MAF=0.01~0.5, r=0.5; c) β₁ =0.15, β₂ =0.01~0.3, MAF=0.5, r =0.5; c) β₁β₂ =0.15², | β₁ - β₂ | =0~0.5, MAF=0.5, r =0.5.

Estimation of PCC

To investigate the performance of the estimation of PCC, we used simulation to compare the expected PCC values (calculated by equation (4) based on known parameters used in data simulation) and the estimated PCC values (estimated from simulated data using the PET-R and PET-B methods, respectively). We observed that PET-B produces significantly more accurate estimation of PCC than PET-R does. In our simulation, the PCC values estimated by PET-B have a strong correlation (r=0.991) with their expected values, and the correlation is much lower (r=0.879) when PCC is estimated by PET-R (Fig.3). Overall, the PCC is underestimated by PET-R, probably due to some features in model (6) that violate the assumptions usually required in regression analysis. For example, in model (6), γ′, β′ and ε′ all include the components β₁ and β₂ (see Appendix B), which may cause the under-estimation of β′. These results, combined with the power analysis (Table 1), suggest that PET-B should be a better choice over PET-R in practice.

The PCC values are based on 1000 replications of simulation with N=5000, β₁ and β₂ randomly drawn between 0~0.5, MAF between 0.2~0.5 and r between 0.1~0.9. For each replication, the expected PCC is calculated using the theoretical equation (4) in METHODS with known β₁ and β₂ (they are used for simulation); the estimated PCCs are obtained by applying a) PET-R and b) PET-B to the simulated data.

Application

To demonstrate how to detect pleiotropy in practice using the proposed method, we applied CMA and PET-B to a set of real GWAS data from the Family Heart Study (FHS) [Higgins, et al. 1996]. The data set contains 2705 subjects, about 2.5 million typed and imputed SNPs and we focus on two quantitative traits: waist circumference (WC) and the homeostatic model assessment (HOMA) which is an indicator of insulin resistance. The correlation coefficient for these traits is 0.542 (P<10^-16).

Since bootstrapping in the PET-B test is computationally intensive, we didn’t perform the PET-B analysis on all SNPs. Instead, we first computed p-values for all SNPs using the CMA method and then calculated the false discovery rates (FDRs) using the Benjamini-Hochberg procedure[Benjamini and Hochberg 1995]. Applying a cutoff of FDR<0.05, we identified 76 significant SNPs with potential pleiotropy. Because CMA is only for testing $H_{0}^{0}$ against $H_{0}^{′}$ and H_A, most of the 76 SNPs are expected to be associated with at least one of the two traits. To further distinguish exact pleiotropy from potential pleiotropy (i.e., distinguish the SNPs contributing to only one trait from those contributing to both traits), we performed the PET-B test (with a bootstrap N=10,000) on the 76 SNPs and identified 43 significant ones at a cutoff p-value < 0.01 (Suppl Table 1). Since PET is appropriate for testing $H_{0}^{′}$ against H_A, most of the 43 SNPs are expected to have pleiotropic effects on the two traits. We investigated single-trait test p-values of these SNPs and found that most of the 43 SNPs significant in the PET-B test have closer p-values in separate tests of WC and HOMA (Fig. 4), suggesting a significant PET test requires a SNP has similar effects on two traits. A large effect and a small (or zero) effect may result in very different single-trait test p-values and will be less likely identified by PET, as the smaller effect could be more likely explained as the indirect effect of the larger effect passed through trait correlation.

Scatter plot of p-values (in the −log10 scale) from separate association tests of two traits, WC and HOMA. The plot includes a total of 76 SNPs with potential pleiotropy identified by CMA, among which 43 SNPs are identified by PET-B to be pleiotropic and 33 non-peiotropic.

The estimated PCC values of the 43 SNPs vary in the range of 0.183%~0.28%, indicating that individual SNPs have small pleiotropic effects on WC and HOMA. These SNPs are located on chromosomes 6, 11, 12, 15 and 18, falling in five genes (GMDS, APIP, PDHX, CACNB3 and FAM174B) and an upstream region near the CDH19 gene (see Suppl Table 1 for more details). Some of these genes have obvious connection to both WC and HOMA. For example, GMDS is involved in glucose-metabolism pathway and expressed in a variety of tissues (including adipocyte and skeletal muscle). It has been reported to be associated with obesity-related traits testosterone [Derese 2011] and echocardiography [Imai, et al. 2011]. There is also evidence that the SNP rs9503038 (P=0.0086 in the PET test) in GMDS is an expression quantitative trait loci (eQTL) of gene EXOC4 (according the SCAN annotation [Levy, et al. 2011]). EXOC4 is involved in insulin-stimulated glucose transport and a candidate for the association with T2D and fasting glucose levels [de Heus 2012]. These facts strongly suggest a very possible pleiotropic effect of GMDS on WC and HOMA. Although these results still need a validation from further studies and more data, this application demonstrates that the PET analysis can provide more detailed and clearer information on pleiotropy.

Discussion

We have developed a novel method, PET, for estimating and testing pleiotropic effect of a variant on two complex traits. Compared with most existing multiple-trait analysis methods, the PET method has two unique features. A key and important feature is that, unlike most multiple-trait analysis methods testing a potential pleiotropy (i.e., the association between a variant and at least one trait), PET directly tests an exact pleiotropy (i.e., a variant has effects on both two traits). Therefore, when most multiple-trait analysis methods may provide an initial screen for potential pleitropy, PET provides a more detailed test for distinguishing exact pleiotropy from potential pleiotropy, which is clearer and more helpful in answering the question of whether a variant contributes to both two traits or not.

One of the most challenging issues of testing pleiotropy is the compound property of the null hypothesis. The pleiotropy test for a variant and two traits involves one alternative hypothesis of exact pleiotropy (H_A: β₁ ≠0 and β₂ ≠0), one complete null ( $H_{0}^{0} : β_{1} = β_{2} = 0$ ), and two incomplete nulls ( $H_{0}^{1} : β_{1} = 0$ , β₂ ≠0 and $H_{0}^{2} : β_{1} \neq 0$ , β₂ =0). Since the complete null can be tested by most existing multiple-trait analysis methods, there are more interests in testing the incomplete nulls vs. the alternative hypothesis. In practice there is no clear rules for choosing a particular null from $H_{0}^{0}$ , $H_{0}^{1}$ and $H_{0}^{2}$ , making it hard to construct a test for H_A. Although it is possible to construct likelihood ratio or generalized least squares (GLS) based procedures for separate tests of $H_{0}^{0}$ vs. H_A, $H_{0}^{1}$ vs. H_A and $H_{0}^{2}$ vs. H_A, it will introduce an extra ambiguity when making decision based on multiple p-values. An example of the ambiguity can be seen from a recently developed MTMM method[Korte, et al. 2012], in which common effect and interaction effects of a variant on multiple traits are tested separately and can be used to infer pleiotropy; however, when common effect is not significant but interaction is significant, there is no clear conclusion can be drawn on pleiotropy. Similarly, Flutre et al. recently have developed a Bayesian statistical framework for joint eQTL analysis in multiple tissues [Flutre, et al. 2013], which is featured by both joint and separate hypothesis testing and can be used for the identification of pleiotropy. However, when using such a separate testing strategy to detect pleiotropy, because of the screening all alternative hypotheses (e.g., H_A, $H_{0}^{1}$ and $H_{0}^{2}$ ) against the same null ( $H_{0}^{0}$ ), it will introduce the similar multiple testing and ambiguity issue, sometimes making the inference on pleiotropy difficult (especially when more than one p-values are significant with a conflict, for example, both tests of $H_{0}^{0}$ vs H_A and $H_{0}^{0}$ vs $H_{0}^{1}$ are significant, but a further test of H_A vs $H_{0}^{1}$ is not significant). To avoid such issues, we have proposed the PET test, which allows an universal test of an alternative hypothesis (H_A) against multiple nulls ( $H_{0}^{0}$ , $H_{0}^{1}$ and $H_{0}^{2}$ ), thus provides a single p-value for clearer testing of pleiotropy.

Another unique feature of PET is that it provides an estimation of the size of pleiotropy, which is clearly defined as the portion of between-trait correlation that can be explained a variant and can be measured by PPC. The estimation of PPC is very useful for understanding how and to what extend a pleiotropic variant affects the correlation structure of complex traits. The value of PPC can positive or negative. When two traits are positively correlated, a positive PPC indicates the same direction of a variant effects on traits and negative indicates different directions (reverse the interpretation for negatively correlated traits).

It should be noted that there are many different biological types of pleiotropy[Hodgkin 1998; Solovieff, et al. 2013], and thus statistical definitions of pleiotropy could be different, and different definitions may result in very different statistical methods. For example, pleiotropy can be defined by variant, gene or region. A pleitropic variant is a single variant that contribute to two (or more) traits and a pleiotropic gene (or region) may contribute to multiple traits through different variants but each individual variant may not be pleiotropic. The PET method is developed only for testing pleiotropic variants. Detecting pleitropic gene (or region) will need a different method. In addition, the PET method itself may need a modification to answer some more complicate and delicate questions regarding to pleitropic variants. For example, if there are two variants with strong linkage disequilibrium (LD), if the two variants contribute to two different traits separately (i.e., both are non-pleiotropic variants), direct application of PET to either variant may lead a significance of pleiotropy due to the LD between them. Another challenging issue is how to test pleiotropy when trait number increases (far more than two). From these facts and questions, we can see that our current version of PET is still limited. When pleiotropy test involves more variants and more traits, it needs to be improved through modification or incorporation with other techniques. A recently published Bayesian method for testing colocalisation between pairs of genetic associations has touched such questions [Giambartolomei, et al. 2013]. This method uses summary statistics to assess different association hypotheses while pairs of traits and variants are involved (e.g., hypotheses of no association with either trait, association with trait 1 but not with trait 2, association with two trait via two independent variants, or one shared variant, etc.). Of course, when this type of method is of great potential to be extended for more complicate pleiotropy analyses, there are some challenging issues needs to be improved, for instance, how to reduce the false positives and increase the power of detecting a true hypothesis when an inference is made based on multiple possible hypotheses, and how to improve a Bayes factor based test when the posterior probabilities of multiple hypotheses are not independent (due to the correlation between traits and/or LD between variants).

Finally, we want to point out that the PET analysis is different from another two widely used analyses, mediation model (MM) [MacKinnon, et al. 2007; Richiardi, et al. 2013] and Mendelian randomization (MR)[Smith and Ebrahim 2003; Thomas and Conti 2004]. Although they all are about modeling and interpreting the relationship between three (or more than three) variables (for the convenience of comparison, here we refer to three variables as A, B and C), they have different application goals. When PET is developed for detecting pleiotropy (i.e., whether A has direct effects on both B and C), MM tests mediation effect of A between B and C (i.e., whether B has a causal effect on A and then A has a causal effect on C) and MR investigates the casual effect of B on C by introducing an instrumental variable A (i.e., whether B has causal effect on C, given extra association information from A). In terms of model, a major difference is that both MM and MR require the definition of causality direction (from B to C, or from C to B) before analysis, but PET doesn’t, because the focus of PET is to assess how much correlation between B and C can be explained by A, regardless of the causality direction between B and C. Since they all are based on linear models, some statistical features of them are related and/or interacted. For example, in the presence of pleiotropic effects (of A on both B and C), the MR causality inference (on B and C) will be biased [Solovieff, et al. 2013]. Such interaction may suggest the use of a combination of these methods in practice. For example, before MR analysis, we may need to perform a PET test to make sure there is no pleiotropy and thus the MR result will be more likely to be unbiased. Of course, how these models and methods are connected to and interacted with each other is still an open question and requires more theoretical work.

Supplementary Material

Supp FigureS1

NIHMS615100-supplement-Supp_FigureS1.docx^{(46KB, docx)}

Acknowledgments

This work was supported by the National Institute of Health (NIH) grants 1R01DK8925601 (to I.B.B) and 5R01DK075681 (I.B.B). We thank Dr. Lihua Wang for assistance in SNP and gene annotation and interpretation.

Appendix A: Decomposition of Covariance of Two Traits

Given the two models: Y₁ = α₁ + β₁X + ε₁ and Y₂ = α₂ + β₂X + ε₂, the covariance between Y₁ and Y₂, Cov(Y₁, Y₂), can be decomposed as

\begin{array}{l} Cov (Y_{1}, Y_{2}) = Cov (α_{1} + β_{1} X + ε_{1}, α_{2} + β_{2} X + ε_{2}) \\ = Cov (α_{1}, α_{2}) + Cov (α_{1}, β_{2} X) + Cov (α_{1}, ε_{2}) \\ + Cov (β_{1} X, α_{2}) + Cov (β_{1} X, β_{2} X) + Cov (β_{1} X, ε_{2}) \\ + Cov (ε_{1}, α_{2}) + Cov (ε_{1}, β_{2} X) + Cov (ε_{1}, ε_{2}) \end{array}

Since α₁, α₂, β₁ and β₂ are constants and Cov(X, ε₁) =0 and Cov(X, ε₂) =0,

\begin{array}{l} Cov (Y_{1}, Y_{2}) = Cov (ε_{1}, ε_{2}) + Cov (β_{1} X, β_{2} X) \\ = Cov (ε_{1}, ε_{2}) + β_{1} β_{2} Var (X) \end{array}

Appendix B: Multiplication of Two Regression Models

Through the multiplication of the two models, Y₁ = α₁ + β₁X + ε₁ and Y₂ = α₂ + β₂X + ε₂, we have

\begin{array}{l} Y_{1} Y_{2} = (α_{1} + β_{1} X + ε_{1}) (α_{2} + β_{2} X + ε_{2}) \\ = α_{1} α_{2} + (α_{1} β_{2} + α_{2} β_{1}) X + β_{1} β_{2} X^{2} + (ε_{1} β_{2} X + ε_{2} β_{1} X + ε_{1} α_{2} + ε_{2} α_{1} + ε_{1} ε_{2}) \end{array}

Letting Y′ = Y₁ Y₂, α′ = α₁α₂, γ′ = α₁ β₂ + α₂ β₁, β′ = β₁ β₂, and the sum of all components carrying ε₁ or ε₂ be a composite residual ε′ = ε₁β₂X + ε₂β₁X + ε₁α₂ + ε₂α₁ + ε₁ε₂ the model of Y₁Y₂ can be re-written as

Y^{'} = α^{'} + γ^{'} X + β^{'} X^{2} + ε^{'}

Appendix C: Estimation of Residual Covariance of Two Traits

To estimate the covariance parameter ( $σ_{12}^{ε}$ ) in equation (4), Y₁ and Y₂ are simultaneously fitted in a bivariate model

Y = (\begin{matrix} Y_{1} \\ Y_{2} \end{matrix}) = μ + (\begin{matrix} X \\ X \end{matrix}) β_{X} + T β_{T} + I_{XT} β_{XT} + (\begin{matrix} ε_{1} \\ ε_{2} \end{matrix})

In the model, X, Y₁ and Y₂ have the same meanings as in models (1) and (2). T is the design vector indicating which trait an observation is for. I_XT is the trait-by-variant interaction design vector, constructed by taking the combination of X and T. μ, β_X, β_XT and ε_i are grand mean, main effect of X, interaction effects between X and traits, and residuals, respectively. Since Y₁ and Y₂ are two correlated traits, the residual ε is a random, two-segment vector with a mean of (0,0)^T and a 2×2 covariance matrix of

(\begin{matrix} σ^{2} & σ_{12}^{ε} \\ σ_{12}^{ε} & σ^{2} \end{matrix})

The covariance component $σ_{12}^{ε}$ can be estimated using a maximum likelihood method (or other methods such as GLS and simplified regression). In this article, we chose the maximum likelihood method implemented in the R lme function).

References

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57:289–300. [Google Scholar]
Bensen JT, Lange LA, Langefeld CD, Chang BL, Bleecker ER, Meyers DA, Xu J. Exploring pleiotropy using principal components. BMC Genet. 2003;4(Suppl 1):S53. doi: 10.1186/1471-2156-4-S1-S53. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Heus P. R squared effect-size measures and overlap between direct and indirect effect in mediation analysis. Behav Res Methods. 2012;44(1):213–21. doi: 10.3758/s13428-011-0141-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Derese MN. Mediation in the Belgian health care sector: analysis of a particular issue--the material scope of application of mediation. Med Law. 2011;30(2):225–37. [PubMed] [Google Scholar]
Ferreira MA, Purcell SM. A multivariate test of association. Bioinformatics. 2009;25(1):132–3. doi: 10.1093/bioinformatics/btn563. [DOI] [PubMed] [Google Scholar]
Flutre T, Wen X, Pritchard J, Stephens M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 2013;9(5):e1003486. doi: 10.1371/journal.pgen.1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian Test for Colocalisation Between Pairs of Genetic Association Studies Using Summary Statistics. 2013 doi: 10.1371/journal.pgen.1004383. arXiv:1350.4022. [DOI] [PMC free article] [PubMed] [Google Scholar]
Higgins M, Province M, Heiss G, Eckfeldt J, Ellison RC, Folsom AR, Rao DC, Sprafka JM, Williams R. NHLBI Family Heart Study: objectives and design. Am J Epidemiol. 1996;143(12):1219–28. doi: 10.1093/oxfordjournals.aje.a008709. [DOI] [PubMed] [Google Scholar]
Hodgkin J. Seven types of pleiotropy. Int J Dev Biol. 1998;42(3):501–5. [PubMed] [Google Scholar]
Huang J, Johnson AD, O’Donnell CJ. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics. 2011;27(9):1201–6. doi: 10.1093/bioinformatics/btr116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huber PJ. Robust Statistics. New York: Wiley; 1981. [Google Scholar]
Imai K, Jo B, Stuart EA. Using Potential Outcomes to Understand Causal Mediation Analysis: Comment on. Multivariate Behav Res. 2011;46(5):861–873. doi: 10.1080/00273171.2011.606743. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32(1):9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]
Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–71. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lange C, Silverman EK, Xu X, Weiss ST, Laird NM. A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003;4(2):195–206. doi: 10.1093/biostatistics/4.2.195. [DOI] [PubMed] [Google Scholar]
Levy JA, Landerman LR, Davis LL. Advances in mediation analysis can facilitate nursing research. Nurs Res. 2011;60(5):333–9. doi: 10.1097/NNR.0b013e318227efca. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J, Pei Y, Papasian CJ, Deng HW. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol. 2009;33(3):217–27. doi: 10.1002/gepi.20372. [DOI] [PMC free article] [PubMed] [Google Scholar]
MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614. doi: 10.1146/annurev.psych.58.110405.085542. [DOI] [PMC free article] [PubMed] [Google Scholar]
Medland SE, Neale MC. An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet. 2010;18(2):233–9. doi: 10.1038/ejhg.2009.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
Province MA, Borecki IB. A correlated meta-analysis strategy for data mining “omic” scans. Pac Symp Biocomput. 2013:236–46. [PMC free article] [PubMed] [Google Scholar]
Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42(5):1511–9. doi: 10.1093/ije/dyt127. [DOI] [PubMed] [Google Scholar]
Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stearns FW. One hundred years of pleiotropy: a retrospective. Genetics. 2010;186(3):767–73. doi: 10.1534/genetics.110.122549. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thomas DC, Conti DV. Commentary: the concept of ‘Mendelian Randomization’. Int J Epidemiol. 2004;33(1):21–5. doi: 10.1093/ije/dyh048. [DOI] [PubMed] [Google Scholar]
Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34(5):444–54. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp FigureS1

NIHMS615100-supplement-Supp_FigureS1.docx^{(46KB, docx)}

[R1] Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57:289–300. [Google Scholar]

[R2] Bensen JT, Lange LA, Langefeld CD, Chang BL, Bleecker ER, Meyers DA, Xu J. Exploring pleiotropy using principal components. BMC Genet. 2003;4(Suppl 1):S53. doi: 10.1186/1471-2156-4-S1-S53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] de Heus P. R squared effect-size measures and overlap between direct and indirect effect in mediation analysis. Behav Res Methods. 2012;44(1):213–21. doi: 10.3758/s13428-011-0141-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Derese MN. Mediation in the Belgian health care sector: analysis of a particular issue--the material scope of application of mediation. Med Law. 2011;30(2):225–37. [PubMed] [Google Scholar]

[R5] Ferreira MA, Purcell SM. A multivariate test of association. Bioinformatics. 2009;25(1):132–3. doi: 10.1093/bioinformatics/btn563. [DOI] [PubMed] [Google Scholar]

[R6] Flutre T, Wen X, Pritchard J, Stephens M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 2013;9(5):e1003486. doi: 10.1371/journal.pgen.1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian Test for Colocalisation Between Pairs of Genetic Association Studies Using Summary Statistics. 2013 doi: 10.1371/journal.pgen.1004383. arXiv:1350.4022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Higgins M, Province M, Heiss G, Eckfeldt J, Ellison RC, Folsom AR, Rao DC, Sprafka JM, Williams R. NHLBI Family Heart Study: objectives and design. Am J Epidemiol. 1996;143(12):1219–28. doi: 10.1093/oxfordjournals.aje.a008709. [DOI] [PubMed] [Google Scholar]

[R9] Hodgkin J. Seven types of pleiotropy. Int J Dev Biol. 1998;42(3):501–5. [PubMed] [Google Scholar]

[R10] Huang J, Johnson AD, O’Donnell CJ. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics. 2011;27(9):1201–6. doi: 10.1093/bioinformatics/btr116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Huber PJ. Robust Statistics. New York: Wiley; 1981. [Google Scholar]

[R12] Imai K, Jo B, Stuart EA. Using Potential Outcomes to Understand Causal Mediation Analysis: Comment on. Multivariate Behav Res. 2011;46(5):861–873. doi: 10.1080/00273171.2011.606743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32(1):9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]

[R14] Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–71. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Lange C, Silverman EK, Xu X, Weiss ST, Laird NM. A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003;4(2):195–206. doi: 10.1093/biostatistics/4.2.195. [DOI] [PubMed] [Google Scholar]

[R16] Levy JA, Landerman LR, Davis LL. Advances in mediation analysis can facilitate nursing research. Nurs Res. 2011;60(5):333–9. doi: 10.1097/NNR.0b013e318227efca. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Liu J, Pei Y, Papasian CJ, Deng HW. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol. 2009;33(3):217–27. doi: 10.1002/gepi.20372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614. doi: 10.1146/annurev.psych.58.110405.085542. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Medland SE, Neale MC. An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet. 2010;18(2):233–9. doi: 10.1038/ejhg.2009.133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Province MA, Borecki IB. A correlated meta-analysis strategy for data mining “omic” scans. Pac Symp Biocomput. 2013:236–46. [PMC free article] [PubMed] [Google Scholar]

[R21] Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42(5):1511–9. doi: 10.1093/ije/dyt127. [DOI] [PubMed] [Google Scholar]

[R22] Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]

[R23] Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Stearns FW. One hundred years of pleiotropy: a retrospective. Genetics. 2010;186(3):767–73. doi: 10.1534/genetics.110.122549. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Thomas DC, Conti DV. Commentary: the concept of ‘Mendelian Randomization’. Int J Epidemiol. 2004;33(1):21–5. doi: 10.1093/ije/dyh048. [DOI] [PubMed] [Google Scholar]

[R26] Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34(5):444–54. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Estimating and Testing Pleiotropy of Single Genetic Variant for Two Quantitative Traits

Qunyuan Zhang

Mary Feitosa

Ingrid B Borecki

Abstract

Introduction