Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Sep 1.
Published in final edited form as: Genet Epidemiol. 2014 Jul 12;38(6):523–530. doi: 10.1002/gepi.21837

Estimating and Testing Pleiotropy of Single Genetic Variant for Two Quantitative Traits

Qunyuan Zhang 1,*, Mary Feitosa 1, Ingrid B Borecki 1
PMCID: PMC4169079  NIHMSID: NIHMS615100  PMID: 25044106

Abstract

Along with the accumulated data of genetic variants and biomedical phenotypes in the genome era, statistical identification of pleiotropy is of growing interest for dissecting and understanding genetic correlations between complex traits. We proposed a novel method for estimating and testing pleiotropic effect of a genetic variant on two quantitative traits. Based on a covariance decomposition and estimation, our method quantifies pleiotropy as the portion of between-trait correlation explained by the same genetic variant. Unlike most multiple-trait methods that assess potential pleiotropy (i.e., whether a variant contributes to at least one trait), our method formulates a statistic that tests exact pleiotropy (i.e., whether a variant contributes to both of two traits). We developed two approaches (a regression approach and a bootstrapping approach) for such test and investigated their statistical properties, in comparison with other potential pleiotropy test methods. Our simulation shows that the regression approach produces correct p-values under both the complete null (i.e., a variant has no effect on both two traits) and the incomplete null (i.e., a variant has effect on only one of two traits), but requires large sample sizes to achieve a good power, when the bootstrapping approach has a better power and produces conservative p-values under the complete null. We demonstrate our method for detecting exact pleiotropy using a real GWAS dataset. Our method provides an easy-to-implement tool for measuring, testing and understanding the pleiotropic effect of a single variant on the correlation architecture of two complex traits.

Keywords: pleiotropy, covariance decomposition, genetic correlation, regression, bootstrap

Introduction

Pleiotropy is a biological phenomenon that a single genetic variant affects two or more phenotypic traits. The term was introduced into the literature a century ago, and since then, has had an important influence on the fields of evolutionary biology, physiology and genetics [Stearns 2010]. In recent years, increasing numbers of polymorphic variants across the human genome have been associated with many complex traits, and there is growing interest in the identification of pleiotropic effects, especially in understanding genetic and molecular basis of the correlated architecture among complex traits.

Genome-wide association studies (GWAS) have produced very rich data on both genotypes and phenotypes, providing unprecedented opportunities for investigation of pleiotropy. However, statistical methods for identifying and characterizing pleiotropy are still quite insufficient and limited. A variety of multi-trait analysis methods, such as principal components based methods[Bensen, et al. 2003; Klei, et al. 2008], FBAT-GEE[Lange, et al. 2003], EGEE[Liu, et al. 2009], canonical correlation analysis (CCA)[Ferreira and Purcell 2009], combined multivariate (CMV) analysis[Medland and Neale 2010], univariate-statistic combined test[Yang, et al. 2010], PRIMe[Huang, et al. 2011], parameterized multi-trait mixed model (MTMM)[Korte, et al. 2012], and correlated meta-analysis[Province and Borecki 2013], have been proposed and can be used for initial screen of potential pleiotropy, such methods, however, are not strictly designed for testing exact pleiotropy. The presence of pleiotropy may increase the power of these methods in detecting overall association, but a significant test may not necessarily indicate pleiotropy, because most of these methods are based upon the null hypothesis that a variant affects none of the traits and do not include the incomplete null in which the variant affecting only one of the traits. Therefore the null is not correctly specified. A proper pleiotropy test should answer the question whether a variant contributes to two or more traits. Another limitation of most existing methods is the lack of well-defined parameter and estimator for the pleiotropic effect, thus help very little in assessing the magnitude of pleiotropy and understanding how pleiotropy influences the relationship between traits.

We propose in this paper a novel, easy-to-implement approach for estimating and testing exact pleiotropy of a variant on two correlated traits, in which the pleiotropic effect of a variant is estimated as the portion of between-trait correlation that can be explained by the variant, and then tested under the proper null hypothesis of no pleiotropy (i.e., the variant does not contribute to both traits), against the proper alternative hypothesis of pleitropy (i.e., the variant contributes to both traits). We investigate statistical properties of our method through simulation and demonstrate its application using real data. Referred as to Pleiotropy Estimation and Test (PET), the proposed method provides a novel tool for clearly characterizing and properly testing pleiotropy between two traits. We compare its performance to other possible approaches and assess power and sample size requirements.

Methods

Definitions and Models

In order to fit our method to a clearly defined scenario, we first define pleiotropy as independent effects of the same variant on two traits. Here independent effect is just a statistical definition. Biologically, it may include effects that propagates (or are passed) from a variant to one of the two traits through different paths without involving another trait. This definition can be described by two separate linear models:

Y1=α1+β1X+ε1 (1)
Y2=α2+β2X+ε2 (2)

where X is genotype data of a variant (usually coded as 0,1,2), Y1 and Y2 are observations of two normally distributed quantitative traits; αi, βi and εi are intercept, effect (of X on Yi) and residual, respectively, with subscripts 1 and 2 indicating two traits. The residuals in the models include independent, random errors, as well as other unobserved genetic and environmental effects that may cause a covariance between ε1 and ε2 (denoted by σ12ε). Assuming the covariances between X and ε1 and between X and ε2 are 0, the covariance between Y1 and Y2 (denoted by σ12Y) can be decomposed into two components (Appendix A):

σ12Y=σ12ε+β1β2σX2 (3)

The first component, σ12ε, is the covariance between residuals of the two models; the second, β1β2σX2, is the covariance caused by pleiotropy. Here σX2 is the variance of X.

From equation (3) we can see that pleiotropy is a source of covariation between two traits. Base on this notion, we define ρ, the portion of correlation between Y1 and Y2 that can be explained by the genetic effects (i.e., β1 and β2) of X, as a metric of pleiotropy, termed the pleiotropy correlation coefficient (PCC). It is the standardized between-trait covariance explained by β1 and β2, which can be expressed as a function β1 and β2:

ρ=β1β2σX2σ1σ2 (4)

where σ1 and σ2 are the standard deviations of Y1 and Y2, respectively.

To Estimate and test ρ, we have developed two approaches, a regression approach and a bootstrap approach, described below.

A Regression Approach

Given the sample data of X, Y1 and Y2, ρ can be estimated by simply replacing parameters in equation (4) with corresponding statistics.

ρ^=β^1β^2σ^X2σ^1σ^2 (5)

When σ̂X, σ̂1 and σ̂2 can be estimated from the sample data, the estimation of β1 and β2 may be biased if they are simply obtained by separate regressions of Y1 and Y2 on X, especially when Y1 and Y2 are strongly correlated. For example, even β1 =0, a simple regression coefficient of Y1 on X will not be 0 if β2 ≠0 and σ12ε0, because of the indirect effect produced by β2 and passed to Y1 from Y2 through the correlation between Y1 and Y2.

Instead of estimating β1 and β2 separately, we propose to estimate β2 β2 as one parameter in a composite model of the product of Y1 and Y2.

Y=α+γX+βX2+ε (6)

This model is obtained through a multiplication of models (1) and (2), where Y′ = Y1 Y2, α′ = α1 α2, β′ = β2 β2, γ′ = α1 β2 + α2β1 and ε′ is the composite residual (Appendix B). The composite parameter β′ can be estimated by a regression of Y1Y2 on X and X2.

Significance of PPC can be determined by testing the null hypothesis, H0: ρ = 0, versus the alternative hypothesis, HA: ρ ≠0. According to the definition of ρ, HA is equivalent to the hypothesis of β′ ≠0 (i.e., β1 ≠0 and β2 ≠0) and H0 equivalent to β′ =0 (i.e., β1 =0 and/or β2 =0). Here H0 is a compound null involving two types of possible null hypotheses, the complete null H00:β1=β2=0, and the incomplete nulls H0 (including H01:β1=0, β2 ≠0 and H02:β10, β2 = 0). Instead of testing these nulls separately, we propose to perform a universal test for all the nulls ( H00, H01 and H02) against the alternative (β1 ≠0 and β2 ≠0) through a test of β′ =0 vs. β′ ≠0 based on the model (6).

Since some features of the composite model (6) may not strictly satisfy the assumptions of regular regression (such as normality and homogeneity of ε′ and independence between parameters), we chose a robust regression to estimate and test β′, which is based on Huber’s method[Huber 1981] and implemented in the R package MASS.

A Bootstrap Approach

Alternatively, instead of estimating β1β2, we propose to estimate β1β2σX2 (denoted by δ) as δ^=σ^12Yσ^12ε and then ρ as), where σ^12Y, σ^12ε, σ̂1 and σ̂2 are estimated covariance components in equations (3) and (5). When σ^12Y is calculated as the sample covariance between two traits, σ^12ε can be obtained through a bivariate model (Appendix C).

Similar to the test of β′, significance of PPC can be determined by testing the null hypothesis, H0:δ =0, versus the alternative hypothesis, HA:δ ≠0. According to the composition of δ (i.e., δ=β1β2σX2), HA is equivalent to the hypothesis of β1 ≠0 and β2 ≠0, whereas H0 includes both the complete null H00 and the incomplete null H0. Since the analytical distribution of δ̂ under compound null is unknown and commonly-used permutation test is not applicable, we propose to calculate p-value via bootstrapping. Given the data of X, Y1 and Y2, a two-tailed p-value is defined as 2 times the minimum of P(Db >0) and P(Db <0), where Db is a set of δ̂ values obtained by bootstrapping and P means the observed percentage.

For the convenience of discussion, we refer to the proposed method as Pleiotropy Estimation and Test (PET), and refer to the regression version as PET-R and the bootstrap version as PET-B.

Simulation

To investigate statistical properties of the PET method, we simulated data under a variety of parameter configurations based on the models (1) and (2). For a given sample size N, we first simulated genotype data (X) of N subjects for a variant under Hardy-Weinberg equilibrium with a fixed or random minor allele frequency (MAF), and then generated two quantitative traits based on the models (1) and (2) with αi =0 and different combinations of β1 and β2 for different hypotheses (i.e. H00, H0 and HA). To simulate the correlation between Y1 and Y2, ε1 and ε2 were sampled from a bivariate normal distribution with a mean vector of (0,0) and a 2×2 variance-covariance matrix, in which the variance components were set to 1 and the covariance components (r) were set to non-zero (fixed or random) values.

Methods for Comparison

As a comparison, we included in this paper other two multiple-trait analysis methods, canonical correlation analysis (CCA) and correlated meta-analysis (CMA). CCA is a multivariate approach for analyzing correlation between two groups of variables. We utilized it to test the overall association between a variant and two traits, by calculating Wilk’s statistic through an eigenanalysis of raw data and obtaining p-value based on a simplified F-approximation [Ferreira and Purcell 2009]. CMA is a meta-analysis approach that takes between-trait correlation into account and combines statistics from individual traits into a summarized statistic and tests its significance through a correlated multivariate normal distribution [Province and Borecki 2013]. Although CCA and CMA test the same statistical hypothesis of potential pleiotropy (i.e., H00 vs H0 and HA), they require different data as input. When CCA requires individual subjects’ data and is performed for each single variant with no use of data from other variants, CMA uses summarized data (p-values) from each trait and requires large number of variants to estimate the between-trait correlation.

In addition, to understand the difference in power between single-trait and multiple-trait analyses, we also include in the paper two other simple tests based on single-trait regression analysis, one testing the association between a variant and a trait (based on one p-value, denoted by P1), another testing the association between a variant and both two traits (based on two separate p-values, denoted by P2). The P2 test is a commonly-used, simple approach for detecting pleiotropy, however, a significant P2 test (i.e., both two p-values are less than a given cutoff) may not indicate exact pleiotropy, because when both two p-values are significant, one of them can be caused by indirect effect of a variant through between-trait correlation, not by pleiotropy. Strictly, P2 can be considered as an exact pleiotropy test only when there is no residual correlation between two traits (i.e., r=0).

Since the statistical hypotheses to be tested in CCA, CMA, P1 and P2 are different from PET, we have no intention to make a competitive comparison between them. Our purpose of including these methods is to help understanding some important features of pleiotropy testing through a contrast.

Results

Type-I Error

We applied CCA, CMA, PET-R and PET-B to the data simulated under both complete null ( H00) and incomplete null ( H0), and investigated their type-1 error characteristics by Q-Q plots showing the comparison between the observed and uniformly distributed p-values. The two existing methods, CCA and CMA, produce uniformly distributed p-values under H00 (Fig.1a,1c) but are significantly inflated when testing exact pleiotropy under H0 (Fig.1b,1d). This is because, as mentioned earlier on most existing multi-phenotype analysis methods, that they are originally proposed to test potential pleiotropy (i.e., H00 against both H0 and HA), not exact pleitropy (i.e., H00 and H0 against HA). The PET-R method produces expected, non-inflated p-values under both H00 (Fig.1e) and H0 (Fig.1f), when PET-B produces expected p-values under H0 (Fig.1g) and conservative p-values under H00 (Fig.1h). The results indicate that PET has a good control for false positives and is more appropriate for testing exact pleiotropy.

Figure 1. Q-Q plots of p-values under the null hypotheses obtained by different methods.

Figure 1

In each plot, x-axis represents expected p-values and y-axis represents observed p-values, both in the negative log10 scale. The p-values are obtained by applying CCA, CMA, PET-R and PET-B to 2000 replications of data simulated for a sample size of N=5000, under complete null H00 and incomplete null H0, separately. In each replication of simulation, MAF is randomly drawn between 0.2~0.5, r between 0.2~0.8; when β1 =0 is used for all simulations, β2 is set to 0 (for H00) or randomly drawn between -0.3 to 0.3 (for H0). The results presented here include: a) CCA under H00; b) CCA under H0; c) CMA under H00; d) CMA under H0; e) PET-R under H00; f) PET-R under H0; g) PET-B under H00; h) PET-B under H0.

Power

We estimated the statistical power of CCA, CMA, PET-R, PET-B, P1 and P2 tests through simulation under the alternative hypothesis (HA : β1 ≠0 and β2 ≠0) of exact pleiotropy. The estimation (Table 1) shows that when β1 ≠0 and β2 ≠0, in terms of detecting association, two-trait based testing methods (CCA and CMA) have higher power than single-trait based methods (P1 and P2); potential pleiotropy tests (CCA, CMA and P2) and single-trait association test (P1) higher than exact pleiotropy tests (PET-R and PET-B). These results indicate that detecting exact pleiotropy is more difficult and requires larger sample size than detecting potential pleiotropy or single-trait association. This property is due to the nature of PET, because it needs to distinguish H0, H01 and H02 from HA, which is different from other methods.

Table I.

Estimated power of CCA, CMA, P1, P2, PET-R and PET-B

α N CCA CMA P1 P2* PET-R PET-B
0.05 500 0.684 0.780 0.655 0.503 0.079 0.472
1000 0.946 0.974 0.919 0.846 0.083 0.844
5000 >0.99 >0.99 >0.99 >0.99 0.111 >0.99
10000 >0.99 >0.99 >0.99 >0.99 0.208 >0.99
50000 >0.99 >0.99 >0.99 >0.99 0.444 >0.99

0.01 500 0.445 0.565 0.422 0.244 0.019 0.215
1000 0.826 0.907 0.775 0.640 0.029 0.618
5000 >0.99 >0.99 >0.99 >0.99 0.058 >0.99
10000 >0.99 >0.99 >0.99 >0.99 0.130 >0.99
50000 >0.99 >0.99 >0.99 >0.99 0.220 >0.99

Power for each method at different sample sizes and α levels are estimated through 2000 replications of simulation for a locus with MAF=0.5 and two traits with r=0.5, under an exact pleiotropy hypothesis HA of β1 = β =0.15 (resulting in an approximate heritability of 0.01 for each traits).

*

Significance is determined when both two single-trait test p-values are equal to or less than the α level.

Comparing the two PET approaches, PET-R’s power is significantly lower and thus requires very large sample size, making its application to real data unpractical; PET-B has a better power, which is acceptable in practice. For example, when sample size N=1000, a pleiotropic effect resulting in an approximate heritability of 1% for each traits (i.e.,β1 = β2 =0.15) can be detected by PET-B with a power of 0.844 or 0.618 at a significance level of 0.05 or 0.01 (Table 1).

Besides sample size (N), many other factors can affect the power of PET. Among these factors, correlation (r) between traits, MAF of variant, variant effects on traits (β1 and β2) and the difference between variant effects (|β1 - β2|) are four major ones. We investigated the power of PET-B through simulations with varied factors and observed that the power of PET increases with the increase of r, MAF, β1 and β2 (Fig. 2a,2b,2c); however, when the product of β1 and β2 is fixed, the power of PET increases with the decrease of |β1β2| (Fig.2d) and reaches its maximum when β1 = β2, indicating that it is relatively easier to detect a pleiotropy when a variant has similar effects on two traits.

Figure 2. Power curves of PET-B at the 0.01 and 0.05 significance levels under different scenarios.

Figure 2

Power for each scenario is estimated from 10000 replications of simulation with N=1000, and a) β1 = β2 =0.15, MAF=0.5, r=0.1~0.9; b) β1 = β2 =0.15, MAF=0.01~0.5, r=0.5; c) β1 =0.15, β2 =0.01~0.3, MAF=0.5, r =0.5; c) β1β2 =0.152, | β1 - β2 | =0~0.5, MAF=0.5, r =0.5.

Estimation of PCC

To investigate the performance of the estimation of PCC, we used simulation to compare the expected PCC values (calculated by equation (4) based on known parameters used in data simulation) and the estimated PCC values (estimated from simulated data using the PET-R and PET-B methods, respectively). We observed that PET-B produces significantly more accurate estimation of PCC than PET-R does. In our simulation, the PCC values estimated by PET-B have a strong correlation (r=0.991) with their expected values, and the correlation is much lower (r=0.879) when PCC is estimated by PET-R (Fig.3). Overall, the PCC is underestimated by PET-R, probably due to some features in model (6) that violate the assumptions usually required in regression analysis. For example, in model (6), γ′, β′ and ε′ all include the components β1 and β2 (see Appendix B), which may cause the under-estimation of β′. These results, combined with the power analysis (Table 1), suggest that PET-B should be a better choice over PET-R in practice.

Figure 3. Scatter plots of expected PCC and the PCCs estimated by PET-R and PET-B.

Figure 3

The PCC values are based on 1000 replications of simulation with N=5000, β1 and β2 randomly drawn between 0~0.5, MAF between 0.2~0.5 and r between 0.1~0.9. For each replication, the expected PCC is calculated using the theoretical equation (4) in METHODS with known β1 and β2 (they are used for simulation); the estimated PCCs are obtained by applying a) PET-R and b) PET-B to the simulated data.

Application

To demonstrate how to detect pleiotropy in practice using the proposed method, we applied CMA and PET-B to a set of real GWAS data from the Family Heart Study (FHS) [Higgins, et al. 1996]. The data set contains 2705 subjects, about 2.5 million typed and imputed SNPs and we focus on two quantitative traits: waist circumference (WC) and the homeostatic model assessment (HOMA) which is an indicator of insulin resistance. The correlation coefficient for these traits is 0.542 (P<10-16).

Since bootstrapping in the PET-B test is computationally intensive, we didn’t perform the PET-B analysis on all SNPs. Instead, we first computed p-values for all SNPs using the CMA method and then calculated the false discovery rates (FDRs) using the Benjamini-Hochberg procedure[Benjamini and Hochberg 1995]. Applying a cutoff of FDR<0.05, we identified 76 significant SNPs with potential pleiotropy. Because CMA is only for testing H00 against H0 and HA, most of the 76 SNPs are expected to be associated with at least one of the two traits. To further distinguish exact pleiotropy from potential pleiotropy (i.e., distinguish the SNPs contributing to only one trait from those contributing to both traits), we performed the PET-B test (with a bootstrap N=10,000) on the 76 SNPs and identified 43 significant ones at a cutoff p-value < 0.01 (Suppl Table 1). Since PET is appropriate for testing H0 against HA, most of the 43 SNPs are expected to have pleiotropic effects on the two traits. We investigated single-trait test p-values of these SNPs and found that most of the 43 SNPs significant in the PET-B test have closer p-values in separate tests of WC and HOMA (Fig. 4), suggesting a significant PET test requires a SNP has similar effects on two traits. A large effect and a small (or zero) effect may result in very different single-trait test p-values and will be less likely identified by PET, as the smaller effect could be more likely explained as the indirect effect of the larger effect passed through trait correlation.

Figure 4. Single-trait association test p-values for 43 pleiotropic and 33 non-pleiotropic SNPs.

Figure 4

Scatter plot of p-values (in the −log10 scale) from separate association tests of two traits, WC and HOMA. The plot includes a total of 76 SNPs with potential pleiotropy identified by CMA, among which 43 SNPs are identified by PET-B to be pleiotropic and 33 non-peiotropic.

The estimated PCC values of the 43 SNPs vary in the range of 0.183%~0.28%, indicating that individual SNPs have small pleiotropic effects on WC and HOMA. These SNPs are located on chromosomes 6, 11, 12, 15 and 18, falling in five genes (GMDS, APIP, PDHX, CACNB3 and FAM174B) and an upstream region near the CDH19 gene (see Suppl Table 1 for more details). Some of these genes have obvious connection to both WC and HOMA. For example, GMDS is involved in glucose-metabolism pathway and expressed in a variety of tissues (including adipocyte and skeletal muscle). It has been reported to be associated with obesity-related traits testosterone [Derese 2011] and echocardiography [Imai, et al. 2011]. There is also evidence that the SNP rs9503038 (P=0.0086 in the PET test) in GMDS is an expression quantitative trait loci (eQTL) of gene EXOC4 (according the SCAN annotation [Levy, et al. 2011]). EXOC4 is involved in insulin-stimulated glucose transport and a candidate for the association with T2D and fasting glucose levels [de Heus 2012]. These facts strongly suggest a very possible pleiotropic effect of GMDS on WC and HOMA. Although these results still need a validation from further studies and more data, this application demonstrates that the PET analysis can provide more detailed and clearer information on pleiotropy.

Discussion

We have developed a novel method, PET, for estimating and testing pleiotropic effect of a variant on two complex traits. Compared with most existing multiple-trait analysis methods, the PET method has two unique features. A key and important feature is that, unlike most multiple-trait analysis methods testing a potential pleiotropy (i.e., the association between a variant and at least one trait), PET directly tests an exact pleiotropy (i.e., a variant has effects on both two traits). Therefore, when most multiple-trait analysis methods may provide an initial screen for potential pleitropy, PET provides a more detailed test for distinguishing exact pleiotropy from potential pleiotropy, which is clearer and more helpful in answering the question of whether a variant contributes to both two traits or not.

One of the most challenging issues of testing pleiotropy is the compound property of the null hypothesis. The pleiotropy test for a variant and two traits involves one alternative hypothesis of exact pleiotropy (HA: β1 ≠0 and β2 ≠0), one complete null ( H00:β1=β2=0), and two incomplete nulls ( H01:β1=0, β2 ≠0 and H02:β10, β2 =0). Since the complete null can be tested by most existing multiple-trait analysis methods, there are more interests in testing the incomplete nulls vs. the alternative hypothesis. In practice there is no clear rules for choosing a particular null from H00, H01 and H02, making it hard to construct a test for HA. Although it is possible to construct likelihood ratio or generalized least squares (GLS) based procedures for separate tests of H00 vs. HA, H01 vs. HA and H02 vs. HA, it will introduce an extra ambiguity when making decision based on multiple p-values. An example of the ambiguity can be seen from a recently developed MTMM method[Korte, et al. 2012], in which common effect and interaction effects of a variant on multiple traits are tested separately and can be used to infer pleiotropy; however, when common effect is not significant but interaction is significant, there is no clear conclusion can be drawn on pleiotropy. Similarly, Flutre et al. recently have developed a Bayesian statistical framework for joint eQTL analysis in multiple tissues [Flutre, et al. 2013], which is featured by both joint and separate hypothesis testing and can be used for the identification of pleiotropy. However, when using such a separate testing strategy to detect pleiotropy, because of the screening all alternative hypotheses (e.g., HA, H01 and H02) against the same null ( H00), it will introduce the similar multiple testing and ambiguity issue, sometimes making the inference on pleiotropy difficult (especially when more than one p-values are significant with a conflict, for example, both tests of H00 vs HA and H00 vs H01 are significant, but a further test of HA vs H01 is not significant). To avoid such issues, we have proposed the PET test, which allows an universal test of an alternative hypothesis (HA) against multiple nulls ( H00, H01 and H02), thus provides a single p-value for clearer testing of pleiotropy.

Another unique feature of PET is that it provides an estimation of the size of pleiotropy, which is clearly defined as the portion of between-trait correlation that can be explained a variant and can be measured by PPC. The estimation of PPC is very useful for understanding how and to what extend a pleiotropic variant affects the correlation structure of complex traits. The value of PPC can positive or negative. When two traits are positively correlated, a positive PPC indicates the same direction of a variant effects on traits and negative indicates different directions (reverse the interpretation for negatively correlated traits).

It should be noted that there are many different biological types of pleiotropy[Hodgkin 1998; Solovieff, et al. 2013], and thus statistical definitions of pleiotropy could be different, and different definitions may result in very different statistical methods. For example, pleiotropy can be defined by variant, gene or region. A pleitropic variant is a single variant that contribute to two (or more) traits and a pleiotropic gene (or region) may contribute to multiple traits through different variants but each individual variant may not be pleiotropic. The PET method is developed only for testing pleiotropic variants. Detecting pleitropic gene (or region) will need a different method. In addition, the PET method itself may need a modification to answer some more complicate and delicate questions regarding to pleitropic variants. For example, if there are two variants with strong linkage disequilibrium (LD), if the two variants contribute to two different traits separately (i.e., both are non-pleiotropic variants), direct application of PET to either variant may lead a significance of pleiotropy due to the LD between them. Another challenging issue is how to test pleiotropy when trait number increases (far more than two). From these facts and questions, we can see that our current version of PET is still limited. When pleiotropy test involves more variants and more traits, it needs to be improved through modification or incorporation with other techniques. A recently published Bayesian method for testing colocalisation between pairs of genetic associations has touched such questions [Giambartolomei, et al. 2013]. This method uses summary statistics to assess different association hypotheses while pairs of traits and variants are involved (e.g., hypotheses of no association with either trait, association with trait 1 but not with trait 2, association with two trait via two independent variants, or one shared variant, etc.). Of course, when this type of method is of great potential to be extended for more complicate pleiotropy analyses, there are some challenging issues needs to be improved, for instance, how to reduce the false positives and increase the power of detecting a true hypothesis when an inference is made based on multiple possible hypotheses, and how to improve a Bayes factor based test when the posterior probabilities of multiple hypotheses are not independent (due to the correlation between traits and/or LD between variants).

Finally, we want to point out that the PET analysis is different from another two widely used analyses, mediation model (MM) [MacKinnon, et al. 2007; Richiardi, et al. 2013] and Mendelian randomization (MR)[Smith and Ebrahim 2003; Thomas and Conti 2004]. Although they all are about modeling and interpreting the relationship between three (or more than three) variables (for the convenience of comparison, here we refer to three variables as A, B and C), they have different application goals. When PET is developed for detecting pleiotropy (i.e., whether A has direct effects on both B and C), MM tests mediation effect of A between B and C (i.e., whether B has a causal effect on A and then A has a causal effect on C) and MR investigates the casual effect of B on C by introducing an instrumental variable A (i.e., whether B has causal effect on C, given extra association information from A). In terms of model, a major difference is that both MM and MR require the definition of causality direction (from B to C, or from C to B) before analysis, but PET doesn’t, because the focus of PET is to assess how much correlation between B and C can be explained by A, regardless of the causality direction between B and C. Since they all are based on linear models, some statistical features of them are related and/or interacted. For example, in the presence of pleiotropic effects (of A on both B and C), the MR causality inference (on B and C) will be biased [Solovieff, et al. 2013]. Such interaction may suggest the use of a combination of these methods in practice. For example, before MR analysis, we may need to perform a PET test to make sure there is no pleiotropy and thus the MR result will be more likely to be unbiased. Of course, how these models and methods are connected to and interacted with each other is still an open question and requires more theoretical work.

Supplementary Material

Supp FigureS1

Acknowledgments

This work was supported by the National Institute of Health (NIH) grants 1R01DK8925601 (to I.B.B) and 5R01DK075681 (I.B.B). We thank Dr. Lihua Wang for assistance in SNP and gene annotation and interpretation.

Appendix A: Decomposition of Covariance of Two Traits

Given the two models: Y1 = α1 + β1X + ε1 and Y2 = α2 + β2X + ε2, the covariance between Y1 and Y2, Cov(Y1, Y2), can be decomposed as

Cov(Y1,Y2)=Cov(α1+β1X+ε1,α2+β2X+ε2)=Cov(α1,α2)+Cov(α1,β2X)+Cov(α1,ε2)+Cov(β1X,α2)+Cov(β1X,β2X)+Cov(β1X,ε2)+Cov(ε1,α2)+Cov(ε1,β2X)+Cov(ε1,ε2)

Since α1, α2, β1 and β2 are constants and Cov(X, ε1) =0 and Cov(X, ε2) =0,

Cov(Y1,Y2)=Cov(ε1,ε2)+Cov(β1X,β2X)=Cov(ε1,ε2)+β1β2Var(X)

Appendix B: Multiplication of Two Regression Models

Through the multiplication of the two models, Y1 = α1 + β1X + ε1 and Y2 = α2 + β2X + ε2, we have

Y1Y2=(α1+β1X+ε1)(α2+β2X+ε2)=α1α2+(α1β2+α2β1)X+β1β2X2+(ε1β2X+ε2β1X+ε1α2+ε2α1+ε1ε2)

Letting Y′ = Y1 Y2, α′ = α1α2, γ′ = α1 β2 + α2 β1, β′ = β1 β2, and the sum of all components carrying ε1 or ε2 be a composite residual ε′ = ε1β2X + ε2β1X + ε1α2 + ε2α1 + ε1ε2 the model of Y1Y2 can be re-written as

Y=α+γX+βX2+ε

Appendix C: Estimation of Residual Covariance of Two Traits

To estimate the covariance parameter ( σ12ε) in equation (4), Y1 and Y2 are simultaneously fitted in a bivariate model

Y=(Y1Y2)=μ+(XX)βX+TβT+IXTβXT+(ε1ε2)

In the model, X, Y1 and Y2 have the same meanings as in models (1) and (2). T is the design vector indicating which trait an observation is for. IXT is the trait-by-variant interaction design vector, constructed by taking the combination of X and T. μ, βX, βXT and εi are grand mean, main effect of X, interaction effects between X and traits, and residuals, respectively. Since Y1 and Y2 are two correlated traits, the residual ε is a random, two-segment vector with a mean of (0,0)T and a 2×2 covariance matrix of

(σ2σ12εσ12εσ2)

The covariance component σ12ε can be estimated using a maximum likelihood method (or other methods such as GLS and simplified regression). In this article, we chose the maximum likelihood method implemented in the R lme function).

References

  1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57:289–300. [Google Scholar]
  2. Bensen JT, Lange LA, Langefeld CD, Chang BL, Bleecker ER, Meyers DA, Xu J. Exploring pleiotropy using principal components. BMC Genet. 2003;4(Suppl 1):S53. doi: 10.1186/1471-2156-4-S1-S53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. de Heus P. R squared effect-size measures and overlap between direct and indirect effect in mediation analysis. Behav Res Methods. 2012;44(1):213–21. doi: 10.3758/s13428-011-0141-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Derese MN. Mediation in the Belgian health care sector: analysis of a particular issue--the material scope of application of mediation. Med Law. 2011;30(2):225–37. [PubMed] [Google Scholar]
  5. Ferreira MA, Purcell SM. A multivariate test of association. Bioinformatics. 2009;25(1):132–3. doi: 10.1093/bioinformatics/btn563. [DOI] [PubMed] [Google Scholar]
  6. Flutre T, Wen X, Pritchard J, Stephens M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 2013;9(5):e1003486. doi: 10.1371/journal.pgen.1003486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian Test for Colocalisation Between Pairs of Genetic Association Studies Using Summary Statistics. 2013 doi: 10.1371/journal.pgen.1004383. arXiv:1350.4022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Higgins M, Province M, Heiss G, Eckfeldt J, Ellison RC, Folsom AR, Rao DC, Sprafka JM, Williams R. NHLBI Family Heart Study: objectives and design. Am J Epidemiol. 1996;143(12):1219–28. doi: 10.1093/oxfordjournals.aje.a008709. [DOI] [PubMed] [Google Scholar]
  9. Hodgkin J. Seven types of pleiotropy. Int J Dev Biol. 1998;42(3):501–5. [PubMed] [Google Scholar]
  10. Huang J, Johnson AD, O’Donnell CJ. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics. 2011;27(9):1201–6. doi: 10.1093/bioinformatics/btr116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Huber PJ. Robust Statistics. New York: Wiley; 1981. [Google Scholar]
  12. Imai K, Jo B, Stuart EA. Using Potential Outcomes to Understand Causal Mediation Analysis: Comment on. Multivariate Behav Res. 2011;46(5):861–873. doi: 10.1080/00273171.2011.606743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32(1):9–19. doi: 10.1002/gepi.20257. [DOI] [PubMed] [Google Scholar]
  14. Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44(9):1066–71. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lange C, Silverman EK, Xu X, Weiss ST, Laird NM. A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics. 2003;4(2):195–206. doi: 10.1093/biostatistics/4.2.195. [DOI] [PubMed] [Google Scholar]
  16. Levy JA, Landerman LR, Davis LL. Advances in mediation analysis can facilitate nursing research. Nurs Res. 2011;60(5):333–9. doi: 10.1097/NNR.0b013e318227efca. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Liu J, Pei Y, Papasian CJ, Deng HW. Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet Epidemiol. 2009;33(3):217–27. doi: 10.1002/gepi.20372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614. doi: 10.1146/annurev.psych.58.110405.085542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Medland SE, Neale MC. An integrated phenomic approach to multivariate allelic association. Eur J Hum Genet. 2010;18(2):233–9. doi: 10.1038/ejhg.2009.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Province MA, Borecki IB. A correlated meta-analysis strategy for data mining “omic” scans. Pac Symp Biocomput. 2013:236–46. [PMC free article] [PubMed] [Google Scholar]
  21. Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42(5):1511–9. doi: 10.1093/ije/dyt127. [DOI] [PubMed] [Google Scholar]
  22. Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  23. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14(7):483–95. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Stearns FW. One hundred years of pleiotropy: a retrospective. Genetics. 2010;186(3):767–73. doi: 10.1534/genetics.110.122549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Thomas DC, Conti DV. Commentary: the concept of ‘Mendelian Randomization’. Int J Epidemiol. 2004;33(1):21–5. doi: 10.1093/ije/dyh048. [DOI] [PubMed] [Google Scholar]
  26. Yang Q, Wu H, Guo CY, Fox CS. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet Epidemiol. 2010;34(5):444–54. doi: 10.1002/gepi.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp FigureS1

RESOURCES