Abstract
Our motivation here is to calculate the power of three statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected), and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend analytic power and minimum-sample-size-necessary (MSSN) formulas for two categorical-based tests (Genotype, Linear Trend Test (LTT)) of genetic association to the pleiotropic model. We further compare MSSN of the Genotype and LTT tests with that of a MANOVA test (Pillai). We approximate MSSN for statistics by linear models using factorial design and ANOVA. With the ANOVA decomposition, we determine what factors most significantly change power/MSSN for all statistics. Finally, we determine what test statistics have smallest MSSN In this work, MSSN calculations are for two traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits.
Our key findings are that the Genotype test usually has smaller MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% versus top/bottom 10%) have higher sample size requirements. The Pillai test has much larger MSSN than both the Genotype and Trend tests as a result of sample selection.
With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Keywords: pleiotropy, multiple phenotypes, GWAS, non-centrality parameter, noncentrality parameter, statistics, method
Introduction
In his review article on one hundred years of pleiotropy, Stearns credits the Swiss geneticist Ludwig Plate as being the first to use the term in 1910 [1]. Stearns’ definition was “Pleiotropy refers to the phenomenon in which a single locus affects two or more apparently unrelated phenotypic traits and is often identified as a single mutation that affects two or more wild-type traits.” [1] We translate this definition into a mathematical model in the Methods section.
As of this writing, searching on the term ‘pleiotropy’ under “Topic” in the ISI Web of Science database yields over 11,000 publications. This number suggests that pleiotropy is both a common phenomenon and one that has been well-studied. A significant number of these publications (over 1,300, according to ISI Web of Science) deal with mouse, fly, plants, dogs, chickens, and other animals/organisms. There are a host of statistically powerful techniques available for gene mapping in these model organisms (see, e.g., [2] for mouse).
In humans, there are numerous examples of pleiotropic effects that are correlated with traits and/or diseases. Some examples include: colorectal cancer [3,4], Crohn’s Disease [3,5–10], Alzheimer’s Disease [11–19], and Marfan syndrome [20–22]. Papers by Baumgartner et al. [22] and Solovieff et al. [23] highlight some challenges regarding the study of pleiotropic traits in humans. One challenge is the computation of statistical power and/or minimum sample size necessary (MSSN) for genetic association, a critically important component of any gene mapping work. With these values, researchers may obtain a realistic estimate either of MSSN to establish genetic association, or of the probability of detecting genetic association for a collected sample. Power and MSSN calculations for single-phenotype tests of genetic association have been derived by Mitra [24] for the chi-square test of indepedence on alleles/genotypes and by several authors [25–28] for the linear trend test. From this point forward, we refer to the first and second tests as the Genotype test (since the data collected are genotypes on individuals) and LTT (for Linear Trend Test), respectively.
There have been a number of publications documenting ways to detect and analyze pleiotropic data, most recently for genome-wide association studies [23,29–58], and also reporting methods to determine power and or MSSN for association mapping [43,45,46,59–66]. If one broadens the search to allow for multiple phenotypes that may not be pleiotropic, the list of published methods increases [34,67–85]. Studying these methods, we note that the majority deal with data analysis. We comment that a number of authors who document power for their method do so by simulation (e.g., [45,47]) or for a specific data set (e.g., [6,60,86]).
The purpose of this work is the development of an analytic approach to computing statistical power for a fixed sample size and given significance level, or MSSN (in terms of affected and unaffected individuals) to achieve fixed power at a given significance level for a number of different statistical tests. Our method is threshold-based, in the sense that we transform individuals with quantitative phenotype-vector-values into either affected or unaffected individuals using thresholds. From this point forward, we will use the abbreviations QT for Quantitative Trait/Quantitative Phenotype, and Quantitative Trait Value (QTV) to refer to an individual's quantitative phenotype-vector-values.
Our method is a natural extension of the univariate threshold-selected quantitative trait association power and MSSN calculator (e.g., [87,88]) in that, when the number of phenotypes is one, our method reduces to the univariate method. Some suggested benefits of our method are that: (i) it is based on classical quantitative-genetics mapping methods for selected-sampling; (ii) the mathematics used is well-established and straightforward to implement.
We use a threshold approach because a number of pleiotropic diseases are defined this way. For example, Marfan syndrome and Tourette syndrome are comprised of multiple traits, each of which may be caused by a single gene on the chromosome [2]. The phenotypes caused by these disorders are also quantitative or continuously distributed. That is, individuals may exhibit these traits in varying degrees (e.g., mild to severe). We note that each trait may be defined by thresholds for different QTs. Thresholds are provided below. For all of the syndromes listed below, each of the conditions listed is necessary.
- For a person to be diagnosed with Tourette syndrome [91], he or she must:
- Have ≥ 2 motor tics (for example, blinking or shrugging the shoulders);
- Have ≥ 1 vocal tic (for example, humming, clearing the throat, or yelling out a word or phrase), although they might not always happen at the same time;
- Have these tics ((a) and (b)) for ≥ 1 year. The tics can occur many times a day (usually in bouts) nearly every day, or off and on;
- Have tics that start at ≤18 years of age;
- Have symptoms that are not due to taking medicine or other drugs or due to having another medical condition (for example, seizures, Huntington’s disease, or post-viral encephalitis).
Additionally, we make the distinction between pleiotropy and locus heterogeneity. In Tourette’s syndrome, there is a documented evidence of locus heterogeneity [92,93]. Hence, in a particular family, it may be that these traits are “caused by a single gene” with a high penetrance. However, this situation is not what we mean by pleiotropy. For pleiotropy, it must be the same gene causing changes in the multiple phenotypes across families/individuals.
We include a section on derivation of the power/MSSN for Multivariate ANalysis Of VAriance (MANOVA) using the Pillai trace statistic applied to the quantitative measures directly. Our reasons are that: (i) several published methods consider power and/or MSSN for pleiotropic phenotypes using quantitative measures [31,32,36,40,42,45,94–101]; (ii) while there is no uniformly most powerful test for MANOVA using equality of means as the null hypothesis, the Pillai trace statistic has high power under a number of different settings; and (iii) it is robust to several violations of assumptions in the MANOVA model [102,103]. We perform a comparison of the MSSN for the Pillai statistic and our statistics using specified genetic model parameter settings.
Finally, we develop software that performs power and/or MSSN calculations for detecting genetic association with: (i) LTT and Genotype test for threshold-defined phenotypes; and (ii) Pillai's trace statistic for the original phenotypes. We note that this software is an extension of software programs designed to compute power and/or MSSN considering a single locus and a single phenotype. In this work, MSSN calculations are for two traits (bivariate distributions) only. Our calculations may be extended to address any number of traits.
Methods
Note: Notation for much of this section may be found in the Appendix.
Test statistic for one-way Multi-variate ANalysis Of Variance (MANOVA)
Here, we present the test-statistic used to test our multiple null hypotheses when the data are quantitative. Several multivariate mean vectors in a one-way MANOVA may be compared statistically using Wilks’ lambda, Pillai’s trace, Roy’s largest root, or Hotelling-Lawley’s tests [102,103]. Though none of tests is uniformly most powerful, Pillai’s trace statistic is reported to have good power in many scenarios and is robust to deviations from assumptions specified in MANOVA [102]. As an indication of its popularity, Pillai's trace test is the default test in the manova function of the R statistical software package [106]. Wilks’ lambda is equivalent to the likelihood ratio test and it has similar power to Pillai’s statistic in many alternative settings [102,103].
Notation for Pillai statistic
Here, we define the null hypotheses for Pillai’s statistic, and the statistic itself.
g: The number of groups considered for each phenotype; here, g is the number of genotypes at a SNP locus, so that g = 3.
p: The number of phenotypes (response variables).
Definition of Pillai statistic
Here, we present the Pillai trace statistic. It is used to test our multiple null hypotheses when the data are quantitative. As an indication of its popularity, Pillai's trace test is the default test in the manova function of the R statistical software package [106]. Wilks’ lambda is equivalent to the likelihood ratio test and it has similar power to Pillai’s statistic in many alternative settings [102,103].
To begin, let data matrix, where is an ni × p data matrix, and yijk is the jth observation of the ith phenotype in the kth genotype group, the total number of observations being denoted by N = n1 + ⋯ + ng. Note that 1 ≤ i ≤ g, 1 ≤ j ≤ ni for the ith genotype group, and 1 ≤ k ≤ p. Also, ni is the number of individuals with the ith genotype.
Let X denote the N × g design matrix given by , where the matrices 1ni, 1 ≤ i ≤ g, are of size ni × 1 and are defined as . Also, let X′X and X′X be the diagonal g × g matrices given by and , respectively.
The Pillai trace test statistic is defined as: , and is based on the s = min(g − 1, p) eigenvalues {ϕ1 ≥ ⋯ ≥ ϕs} of E−1H, where:
Note that the matrix B̂ is the matrix B with parameters estimated from the data. The matrices C and A are stated below. The estimate of each µij is given by . The Pillai statistic has an F distribution with df1 = rCrA and df2 = s(N − rX + s − rA) degrees of freedom under the null hypothesis. Note rC, rA, and rX are the ranks of the matrices C, A, and X, respectively.
A. Null hypothesis
We can write a linear hypothesis in a one-way MANOVA as:
where is a g × p matrix for the p mean vectors. The matrices C and A are determined from a linear null hypothesis.
B. Power and sample size calculations
O’Brien and Shieh [107] summarize the calculation of the power for global effects in one-way MANOVA. The Pillai trace statistic under the alternative hypothesis has a non-central F distribution with df1 and df2 degrees of freedom and the non-centrality parameter (NCP) , where and is the ith largest eigenvalue of:
where or the limit of the ratio as N → ∞. We specify that the phenotype vectors in all groups have the common covariance matrix Σ. This common covariance matrix specification is necessary to derive the NCP. Note that for the threshold-based phenotypes, we need not make such an assumption.
C. Example NCP calculation for two phenotypes
Consider our null hypothesis H0: µ01 = µ11 = µ21, µ02 = µ12 = µ22 for three genotype groups (i = 0,1,2) with the bivariate phenotypes (j = 1,2), that is, p = 2 and g = 3. Thus, s = min(g − 1, p) = 2. These means are determined using the information in section above (Methods - Notation for quantitative trait).
Let A be the 2 × 2 identity matrix and let . Let , and let the covariance matrix of the bivariate phenotypes be denoted by with the correlation coefficient ρ. These matrices are specified so that we may test the null hypothesis H0: µ01 = µ11 = µ21, µ02 = µ12 = µ22 stated above. We can calculate the 2 × 2 matrix Φ* as:
The matrix Φ* is used to compute the eigenvalues , which in turn are used to compute the Pillai statistic and the NCP.
Let us define the terms Sij as:
where . We can simplify the matrix Φ* to be:
Note that:
and
Therefore, the NCP λ can be written as:
| (1) |
The power of the Pillai’s trace test is obtained by:
where fα,df1,df2 is the (1 − α) quantile of a central F distribution with df1 and df2 degrees of freedom, respectively, and F(df1, df2, λ) is a noncentral F random variable with NCP λ, and degrees of freedom df1 and df2, respectively. For our example, df1 = rCrA = 4 and df2 = s(N − rX + s − rA) = 2(N − 3).
Bivariate example
For the remainder of this work (excluding the Discussion), we focus on the bivariate distribution; that is, on pleiotropic diseases with two quantitative traits We do this because results are more easily interpreted, and because we can present graphs of functions such as the cdf.
MSSN calculations using factorial design
We ask the following question: what factors most substantially alter the calculated MSSN when testing for genetic association with a pleiotropic gene affecting two phenotypes?
To answer this question, we use a 24 × 33 factorial design (see [108]) on a total of seven design variables (factors) to approximate the calculated MSSN with functions of the design variables. These factors are listed in Table 1. Note that, we obtain 24 × 33 = 432 vectors of factor settings and therefore 432 MSSN calculations. One benefit of the factorial design is that we can look at multiple factors jointly over a broad range of settings and assess the factors that most change the outcome variable. For all MSSN calculations, we specify a fixed power is 0.80 and a significance level is 5 × 10−8.
Table 1.
Factors (and their settings) used in the MSSN calculations for genetic tests of association.
| Factor | Number of settings |
Setting-values | |
|---|---|---|---|
| pd | 2 | 0.05, 0.330 | |
|
|
2 | 0.05, 0.10 | |
| τ1 | 3 | −0.50, 0.00, 0.50 | |
|
|
2 | 0.025, 0.05 | |
| τ2 | 3 | −0.50, 0.00, 0.50 | |
| ρ | 3 | 0.00, 0.33, 0.67 | |
| Percent-Affected and Percent-Unaffected | 2 | 10%, 25% |
- pd = Disease allele frequency (DAF);
- = The variance for the first phenotype’s quantitative trait distribution;
- τ1 = The dominance-additivity ratio for the first phenotype;
- = The variance for the second phenotype’s quantitative trait distribution;
- τ2 = The dominance-additivity ratio for the second phenotype;
- ρ = The correlation among the two phenotypes, or ρ12. While we can consider negative correlations, for bivariate distributions, two phenotypes may always be parameterized so that the correlation is non-negative.
Approximation of calculated MSSN
After we compute all 216 MSSN for the Pillai test, and all 432 MSSN (we compute the number of affected individuals needed, and set the number of unaffected individuals to be equal to the number of affected individuals; that is, r = 1) for the Genotype and LTT tests, we perform a linear model analysis (i.e., ANOVA) on the seven main factors (Table 1) and all two-way interactions. The ANOVA calculations are performed using the methods developed for the R statistical software package [106].
Our rationale for performing the ANOVA with the factorial design is as follows: Equation (1) above and Equations (A8.1) and (A9.1) in the Appendix are closed-form equations that specify the NCPs (from which the MSSN may be calculated). Here, the MSSN is given by n = n(r, wk, gik), where i = affection status, k = genotype. Although they are analytic, it is difficult to identify the variables that are most important. Consequently, we approximate the exact function by a linear model (including all two-way interactions) n̂(r, wk, gik) = µ + µr + ⋯. We use 432 settings for our linear model approximation (216 for the Pillai statistic, since it is not dependent upon Percent-Affected and Percent-Unaffected settings) and report the factors that most explain the MSSN.
We note here and in the Results section that we do not attempt to make statistical inferences from our applications of the factorial design and ANOVA. Rather, we are use them as explanatory tools, specifically documenting the factors (main and interaction) that appear to have the most substantial effect on altering MSSN (i.e., those with the largest F-statistics), and then documenting quantitatively whether the results appear to be true. We can do this by computing MSSN considering different settings of the aforementioned factors and checking whether the different settings produce substantially different MSSN estimates.
Results
Factors that most significantly alter genetic association test MSSN
Genotype test
In Table 2, we report the results of our ANOVA for the Genotype test. Overall, this statistic had the smallest MSSN requirements on average for any setting of factor-settings in Table 1. This result is notable, since the Genotype test has 2 degrees of freedom (df), so one might expect the LTT to have lower MSSN values. Also, the Genotype test is applied to categorical data, and it is generally true that for quantitative data, quantitative-based tests such as Pillai's will require smaller MSSN than tests on categorical data. We discuss this point further in the Discussion.
Table 2.
Results of Analysis of Variance (ANOVA) for main effects and all two-way interactions - Genotype test.
| Factor | Df | SSQFactor | F Statistic | φ2 | |
|---|---|---|---|---|---|
| Percent-Affected | 1 | 1560721 | 33815.121 | 0.314 | |
| ρ | 2 | 1723434 | 18670.263 | 0.347 | |
|
|
1 | 612234 | 13264.884 | 0.123 | |
|
|
1 | 308685 | 6688.076 | 0.062 | |
| pd | 1 | 303127 | 6567.645 | 0.061 | |
| ρ × Percent-Affected | 2 | 103543 | 1121.697 | 0.021 | |
|
|
1 | 46967 | 1017.613 | 0.009 | |
| pd × Percent-Affected | 1 | 40969 | 887.657 | 0.008 | |
|
|
2 | 63336 | 686.134 | 0.013 | |
|
|
1 | 25551 | 553.597 | 0.005 | |
| τ1 | 2 | 46357 | 502.194 | 0.009 | |
|
|
1 | 22923 | 496.648 | 0.005 | |
|
|
2 | 31059 | 336.47 | 0.006 | |
| pd × ρ | 2 | 23991 | 259.896 | 0.005 | |
| τ2 | 2 | 16522 | 178.984 | 0.003 | |
| τ1 × Percent-Affected | 2 | 5191 | 56.235 | 0.001 | |
|
|
1 | 2162 | 46.851 | 0 | |
| τ2 × Percent-Affected | 2 | 2892 | 31.331 | 0.001 | |
| τ1 × ρ | 4 | 4434 | 24.018 | 0.001 | |
|
|
2 | 1723 | 18.67 | 0 | |
| τ1 × τ2 | 4 | 3101 | 16.796 | 0.001 | |
|
|
2 | 1041 | 11.28 | 0 | |
| τ2 × ρ | 4 | 1606 | 8.7 | 0 | |
|
|
1 | 282 | 6.11 | 0 | |
|
|
2 | 97 | 1.051 | 0 | |
|
|
2 | 74 | 0.799 | 0 | |
| pd × τ1 | 2 | 70 | 0.753 | 0 | |
| pd × τ2 | 2 | 2 | 0.02 | 0 | |
| Residuals | 379 | 17493 | |||
| Total | 4964426 | ||||
The values in the column labeled "Factor" are defined in Table 1. The column SSQFactor is the sum of squares for the given factor. The column labeled "φ2" lists each factor's proportion of the overall Sum of Squares. That is, . All values with the exception of the last column are computed using methods developed for the R statistical software package [106].
In Table 2, the factors are sorted from largest F-statistic to least. Also, we report the value φ2, the respective factor's proportion of the overall Sum of Squares. Specifically, (values provided in Table 2). Based on F-statistics and the φ2 values, we may infer that there are five main factors that most substantially affect the number of affected individuals needed to detect association. These are, in order of F-statistic (rounded to nearest integer from Table 2), Percent-Affected (F-Statistic = 33,815), ρ (Correlation) (F-Statistic = 18,670), , and , and pd (F-statistic = 6,568). Along with their two-way interaction-terms (a total of ten), these five factors account for 98% of the proportion of the total Sum of Squares (SSQTotal) (Table 2). The dominance-additivity ratios, τ1 and τ2 had relatively small impact on the calculated MSSN. This result suggests that the Genotype test is equally powerful when the QTL operate under either an additive or non-additive mode of inheritance.
That is, researchers need not focus on whether their traits of interest deviate from an additive mode of inheritance when performing MSSN calculations.
Given these results, we performed a regression analysis in which we used the five main-effect terms, and their two-way interaction. Results of the regression analysis are provided in Table 3. As main be seen in Table 3 and Equation (2) below, there are actually six "main" effects terms, since there are three settings for the correlation factor ρ, and hence we need two separate variables. Our goal is to compute the coefficients of the fitted sample size equation:
| (2) |
Table 3.
Coefficients for linear regression model using five most significant main factors - Genotype test.
| Factor and Setting | Coefficient Estimate |
Standard Error |
t-Statistic | |
|---|---|---|---|---|
| (Intercept) | 154.718 | 3.449 | 44.853 | |
| Percent-Affected = 25 | 139.272 | 3.688 | 37.767 | |
| ρ = 0.33 | 81.701 | 4.123 | 19.816 | |
| ρ = 0.67 | 185.045 | 4.123 | 44.882 | |
|
|
−43.27 | 3.688 | −11.734 | |
|
|
−38.942 | 3.688 | −10.56 | |
| pd = 0.33 | −21.689 | 3.688 | −5.882 | |
| Percent-Affected = 25, ρ = 0.33 | 31.973 | 3.688 | 8.67 | |
| Percent-Affected = 25, ρ = 0.67 | 75.548 | 3.688 | 20.487 | |
| Percent-Affected = 25, | −41.708 | 3.011 | −13.852 | |
| Percent-Affected = 25, | −29.137 | 3.011 | −9.677 | |
| Percent-Affected = 25, pd = 0.33 | −38.954 | 3.011 | −12.937 | |
| ρ = 0.33, | −25.374 | 3.688 | −6.881 | |
| ρ = 0.33, | −18.002 | 3.688 | −4.882 | |
| ρ = 0.33, pd = 0.33 | −17.221 | 3.688 | −4.67 | |
| ρ = 0.67, | −59.121 | 3.688 | −16.032 | |
| ρ = 0.67, | −41.421 | 3.688 | −11.233 | |
| ρ = 0.67, pd = 0.33 | −36.489 | 3.688 | −9.895 | |
|
|
30.763 | 3.011 | 10.217 | |
| , pd = 0.33 | 3.232 | 3.011 | 1.073 | |
| , pd = 0.33 | 8.949 | 3.011 | 2.972 |
Here, we present results of a linear regression using the five most significant factors from Table 2. We include all two-way interactions of these factors. An example description of the factors is as follows: 'ρ = 0.33' means, if the settings of correlation is 0.33, use the coefficient 81.701 (second column) when computing the fitted value. Otherwise, use 0. We compute coefficients for the other main factors in the same manner. For two-way interactions, consider the example 'Percent-Affected = 25, pd = 0.33'. Here, if the disease allele frequency setting is 0.33 and the Percent-Affected setting is 25, then the coefficient used for the fitted values is −38.954, 0 otherwise. All values are computed using methods (specifically, the lm and summary commands) developed for the R statistical software package [106]. All values in the last three columns are rounded to three decimal places.
Here, D is the number of factors (five in this case), and ψz is the number of df for the zth factor, 1 ≤ z ≤ D. Also, 1 ≤ d < f ≤ D, and βiβj = 0 if i, j are settings for the same factor. This form of the fitted equation is used for all test statistics (Genotype, LTT, Pillai).
From Table 3, we compute the fitted function as:
| (3) |
where:
Reviewing the coefficients in Equation (2), we observe that increasing the Percent-Affected from 10 to 25 percent produces a substantial increase in MSSN (approximately 139 individuals; coefficient for variable x1). The next largest coefficient is for the correlation term ρ in the variance-covariance matrix Σ. Increasing the correlation from 0 (uncorrelated phenotypes) to 0.33 produces an MSSN increase of approximately 82 individuals (coefficient for variable x2) and increasing the correlation from 0 to 0.67 produces an MSSN increase of 185. This coefficient is the single largest coefficient in the fitted equation (1). Coefficients for the other main-effects are smaller, but significantly non-zero.
For the interaction terms, the larger coefficient in Equation (3) in absolute value is for the pair (Percent − Affected, ρ). When Percent − Affected equals 25 and ρ equals 0.67, the increase in MSSN is approximately 76. With the exception of the pair ( , pd) and the pair ( , pd), coefficients for all other interaction terms are greater than 15 in absolute value (Equation (2), Table 3). These results are consistent with the F-statistic values in Table 2.
Finally, a review of the results in Table 3 suggests that MSSN is decreased the most when , since every coefficient that contains (with the exception of coefficients for the third to last- and second to last-rows of Table 3) is negative. This result is consistent with the fact that increasing the QTL variance increases the separation among the component MVN distributions, thereby making it easier to determine genotype from QT values.
In Figure 1, we present a plot of the fitted values (using Equation (3)) versus the analytic MSSN (n = nA + nU) determined using the NCP (Appendix; Equations (A8.1) and (A8.2)). The coefficients of the trend line, computed using the method in Excel, are consistent with the finding that the analytic MSSN are accurately approximated by a linear combination of the six variables x1, …, x6 (Table 3) and their two-way interactions. We base this conclusion on the fact that the trend line intercept is 0.0005 (close to 0) and the slope is exactly 1. From this we may conclude that, for the parameter settings considered in Table 1, only five of the seven factors are needed to approximate the analytic MSSN, and that among them, Percent-Affected/Unaffected and the correlation ρ make the greatest change. Since Percent-Affected/Unaffected is the only variable that researchers can control, to decrease MSSN requirements, one should decrease the Percent-Affected value to a 10% threshold (set x1 to 0 in Equation (3)). Doing so will decrease the fitted MSSN by approximately 139 individuals (coefficient of x1 in Equation (3)). In the Appendix, we compute analytic MSSN over a range of Percent-Affected/Unaffected values for the Genotype test and the LTT, and document that, as the Percent-Affected/Unaffected setting approaches 0%, so does the MSSN (Appendix; Figure A4).
Figure 1.
LTT
The results of the LTT test are very similar to those of the Genotype test, although the MSSN requirements are generally higher. We place results of our analyses in the Appendix (Table A2). Also, see the Discussion section.
Pillai test
We provide results of our ANOVA for the Pillai test in Table 4. Overall, this statistic had the largest MSSN requirements for any set of factor-settings in Table 1. Note that the factor "Percent-Affected/Unaffected" is not used when computing MSSN requirements for the Pillai statistic, because we use QTVs on all individuals, not just those whose values are above/below a threshold. Hence, we compute the ANOVA for a total of 432/2 = 216 vectors of settings from Table 2.
Table 4.
Results of Analysis of Variance (ANOVA) for main effects and all two-way interactions - Pillai test.
| Factor | Df | SSQFactor | F Statistic | φ2 | |
|---|---|---|---|---|---|
|
|
1 | 4626159 | 5804.04 | 0.679 | |
|
|
1 | 502017 | 629.836 | 0.074 | |
| ρ | 2 | 891480 | 559.231 | 0.131 | |
|
|
1 | 237057 | 297.415 | 0.035 | |
|
|
2 | 247063 | 154.984 | 0.036 | |
| τ1 × τ2 | 4 | 83801 | 26.284 | 0.012 | |
| pd | 1 | 16801 | 21.078 | 0.002 | |
|
|
2 | 22746 | 14.269 | 0.003 | |
| pd × ρ | 2 | 12947 | 8.122 | 0.002 | |
| τ1 | 2 | 9030 | 5.665 | 0.001 | |
| τ2 | 2 | 9030 | 5.665 | 0.001 | |
|
|
1 | 2308 | 2.896 | 0 | |
| τ1 × ρ | 4 | 6988 | 2.192 | 0.001 | |
| τ2 × ρ | 4 | 6988 | 2.192 | 0.001 | |
| pd × τ1 | 2 | 1403 | 0.88 | 0 | |
| pd × τ2 | 2 | 1403 | 0.88 | 0 | |
|
|
2 | 1243 | 0.78 | 0 | |
|
|
2 | 1243 | 0.78 | 0 | |
|
|
1 | 28 | 0.035 | 0 | |
|
|
2 | 16 | 0.01 | 0 | |
|
|
2 | 16 | 0.01 | 0 | |
| Residuals | 173 | 137891 | |||
| Total | 6817658 | ||||
The legend for this table is virtually identical to the legend for Table 2, with the exception that the "Percent-Affected" factor is not considered, because the Pillai statistic is computed on all individuals. All values with the exception of the last column are computed using methods developed for the R statistical software package [106].
As in Table 2, the factors considered in our ANOVA are sorted from largest F-statistic to least, and we report the φ2 values (listed in Table 4). Considering F-statistics and the φ2 values, we infer that there are three main terms that most substantially affect the MSSN to detect association. These are, in order of F-statistic (rounded to nearest integer), , and ρ (F-statistic = 559). The three two-order interactions of these terms are: and . These six main and interaction factors account for approximately 96% of the proportion of the total Sum of Squares (SSQTotal) (Table 4 -last column). These results suggest that a linear function of the top five factors (like Equation (3) for the Genotype test) provide a very close approximation to the actual MSSN for all 216 vectors of settings from Table 1.
Using the results in Table 4, we perform a regression analysis in which we select the three main-effect terms (a total of four variables, given the two settings of correlation) and their two-way interactions. We present results in Table 5.
Table 5.
Coefficients for linear regression model using three most significant main factors and all interactions - Pillai test.
| Factor | Coefficient Estimate |
Standard Error |
t-Statistic | |
|---|---|---|---|---|
| (Intercept) | 651.081 | 8.089 | 80.491 | |
|
|
−277.541 | 10.232 | −27.126 | |
|
|
−173.81 | 10.232 | −16.987 | |
| ρ = 0.33 | 150.831 | 10.852 | 13.898 | |
| ρ = 0.67 | 215.882 | 10.852 | 19.893 | |
|
|
132.513 | 10.232 | 12.951 | |
| , ρ = 0.33 | −78.614 | 12.531 | −6.273 | |
| , ρ = 0.67 | −165.614 | 12.531 | −13.216 | |
| , ρ = 0.33 | −6.512 | 12.531 | −0.52 | |
| , ρ = 0.67 | 39.915 | 12.531 | 3.185 |
In this table, we report linear regression analysis coefficients for the three most significant factors from Table 4. Also, we include all two-way interaction terms. Similar to Table 3, we have the following factor descriptions: ' ' means, if the setting of the first phenotype's QTL variance is 0.10, use the coefficient −277.541 (second column) when computing the fitted value. Otherwise, use 0. We compute coefficients for the other main factors in the same manner. Computation for the interaction factors is described in the legend for Table 3. All values are computed using methods (specifically, the lm and summary commands) developed for the R statistical software package [106].
From Table 5, we compute the fitted function as:
| (4) |
where:
Studying Equation (4), we note that changes in main factors result in changes of at least 174 individuals. For example, increasing from 0.05 to 0.10 reduces the MSSN by 278 individuals in Equation (3). Similarly, increasing the correlation ρ from 0 to 0.33 increases the MSSN by 151. For the interaction terms, the largest change is −166, occurring when is 0.10 and ρ is 0.67. The smallest change in MSSN occurs when is 0.05 and ρ is 0.33.
In Figure 2, we plot the fitted values (using Equation (4)) versus the analytic MSSN (n = nA + nU) determined using the Pillai NCP (Appendix). As with Figure 1, the coefficients of the trend line, computed using the method in Excel, are consistent with the finding that the analytic MSSN are accurately represented by a linear combination of all terms in Equation (4) (Trend line intercept is 0.0004, slope is 1.0). In contrast to the Genotype test results, for the Pillai test, we require only three of the six factors to approximate the analytic MSSN (Table 6 and Figure 3). Also, MSSN requirements are decreased most substantially by increasing the QTL variances and and decreasing the correlation ρ.
Figure 2.
Table 6.
Percentiles for MSSN ratios with different test statistics.
| Percentile | Ratio of MSSNs | ||||
|---|---|---|---|---|---|
| LTT/Genotype | Pillai/Genotype(10%) | Pillai/Genotype(25%) | Pillai/LTT(10%) | Pillai/LTT(25%) | |
| Minimum | 0.95 | 1.59 | 0.94 | 1.20 | 0.74 |
| Median | 1.35 | 3.41 | 1.95 | 2.62 | 1.65 |
| Mean | 1.26 | 3.37 | 1.98 | 2.66 | 1.64 |
| Maximum | 1.64 | 5.28 | 3.14 | 4.18 | 2.45 |
In this table, we use the abbreviations "LTT(x%)" and "Genotype(x%)" to mean the MSSNs for the LTT and Genotype tests, respectively, when the Percent-affected/unaffected settings are x (x = 10% or 25%). Also, each column's pair of tests corresponds to the same numbered column in Figure 3. For example, the first pair of tests is LTT and the Genotype test. The same pair is considered in the first column of Figure 3.
Figure 3.
What method produces the smallest MSSN requirements?
So far, we have answered the questions of what factors most substantially alter MSSN requirements and by how much for the Genotype test, Pillai test, and LTT test (Appendix) for the factor settings in Table 1. An equally important question is, what statistic produces the smallest analytic MSSN requirements for any vector of factor settings in Table 1? To answer this question, we compute the five sets of differences:
LTT(pd, , τ1, …, ρ, Percent − Aff) − Genotype(pd, , τ1, …, ρ, Percent − Aff);
Genotype(pd, , τ1, …, ρ, Percent − Aff = 10) − Pillai(pd, , τ1, …, ρ, Percent − Aff = 10);
Genotype(pd, , τ1, …, ρ, Percent − Aff = 25) − Pillai(pd, , τ1, …, ρ, Percent − Aff = 25);
LTT(pd, , τ1, …, ρ, Percent − Aff = 10) − Pillai(pd, , τ1, …, ρ, Percent − Aff = 10);
LTT(pd, , τ1, …, ρ, Percent − Aff = 25) − Pillai(pd, , τ1, …, ρ, Percent − Aff = 25);
Each of the differences in MSSN is computed as a function of the parameter settings. For example, if pd = 0.33, , τ1= 0.0, , τ2= 0.5, ρ = 0.0, Percent − Aff = 25, then difference (i) is:
Analytic MSSN for LTT for vector (0.33, 0.10, 0.0, 0.05, 0.5, 0.0, 25) − Analytic MSSN for Genotype test for vector (0.33, 0.10, 0.0, 0.05, 0.5, 0.0, 25).
The differences (ii) – (v) are computed with a fixed value for the last parameter (Percent − Aff). The reason is that the Pillai test is a function of only six parameters in Table 1; as noted previously, it is not a function of the Percent-Affected parameter. For each of the differences (i) – (v), we present the empirical distributions of the results in the form of box plots. These box plots may be found in Figure 3.
Note that the difference (i) is computed over 432 vectors, while differences (ii) – (v) are computed over 216 vectors.
Some of the key findings resulting from a study of this figure are that the Genotype test usually has the smallest sample size (previously mentioned), and that the Genotype test and the LTT test almost always require smaller analytic MSSN than the Pillai test. In fact, viewing the four right-most box-plots, the greatest difference among Pillai and any of the other test statistics, where Pillai requires a smaller sample size, is for the (LTT(25%) - Pillai test) box plot (the right-most one in Figure 3). The difference is 124 (Outlier for LTT(25%) - Pillai, Figure 3). In results not shown, this difference occurs for the vector of settings pd = 0.33, , τ1= −0.50, , τ2= 0.50, ρ = 0.67, Percent − Aff = 25. For this vector, the LTT analytic MSSN is 477, and the Pillai test analytic MSSN is 353.
In Table 6, we present differences in Figure 3 as ratios. Lehmann and Rosano [109], among others, defines these ratios as asymptotic relative efficiencies. We report the minimum, median, mean, and maximum ratios for all pairs of test statistics. In this way, we can compare results across columns. The smallest median and mean values, 1.35 and 1.26 respectively, are for the LTT/Genotype MSSN ratio. This results suggests that the MSSNs for these two test statistics are most similar. The largest median and mean values of 3.41 and 3.37 are for the Pillai/Genotype(10%) MSSN ratio. This result is consistent with the fact that the Genotype(10%) - Pillai MSSN box plot has the lowest Range of Differences (vertical axis) in Figure 3.
For all ratios below the median ratio of 1.35 for the LTT/Genotype MSSN ratio, every vector has the DAF setting pd = 0.05. This result suggests that LTT and Genotype test MSSNs are most similar for smaller disease allele frequencies.
Finally, we note that we have developed software to perform these calculations. This software will be made available online within the near future. Researchers who want stand-alone copies of the software may contact the first author.
Discussion
In this work, we present the method, the Genotype test, for computing asymptotic power and MSSN calculations for genetic association with pleiotropic traits. In our design, affection status is defined through thresholds. We include computations for power and MSSN for MANOVA by applying Pillai’s statistic.
The first observation we make is that we can specify a multivariate function to compute probabilities for pleiotropic phenotypes (Formulas (A1) and (A2) in the Appendix). Also, we derive categorical data from the QTVs and apply the Genotype test and LTT to the categorical data (Equation A4 in the Appendix). Furthermore, we compute analytic power and MSSN formulas for the Genotype test and LTT (Formulas (A8.1) and (A9.1)), as well as analytic power and MSSN formulas for the Pillai MANOVA test applied to all QTVs.
Our ANOVA results for the factorial designs indicate that, for the Genotype test, the factors that most substantially alter MSSN are correlation among the two QTs (ρ) and the Percent-Affected/Unaffected settings. From the results in Table 3 and Equation (3), we see that MSSN decreases with a decrease in the correlation and a change of the Percent-Affected/Unaffected setting from 25% to 10%. Changes in these two factors reduce MSSN for the LTT as well (results not shown). We comment that we use the ANOVA to provide an numerical approximation (with linear and two-way interaction terms) to the analytic formulas for MSSN. The factors we consider in the approximation are those with the largest F-statistic values.
For the Pillai test, Analytic MSSN is described accurately by settings in three factors and their interactions: , and ρ (Table 5 and Equation (4)). Increases in QTL variances reduce the MSSN, while a decrease in the correlation ρ produces a decrease in the MSSN.
When comparing all of the MSSNs for all tests, we see that the Genotype test usually requires the smallest MSSN to achieve 80% power at the 5E-08 significance level for the vector of settings in Table 1. We draw this conclusion by studying the box plots of MSSN differences for all pairs of test statistics. The only test statistic that has a smaller MSSN than the Genotype test for any significant portion of vector-settings is the LTT. In fact, for 110/432 (25%) of the vectors, the LTT has as small or smaller MSSN than the Genotype test. However, the maximum difference is 14 individuals, and the relative efficiency is never less than 95% (Table 6).
While this work focuses on sample size calculations, through use of the NCP, we can just as easily perform power calculations for a fixed sample size. The conclusions we draw about the three statistics are the same (e.g., Genotype test has largest power on average for different vectors of Factor settings, followed by LTT, etc) (data not shown).
What if the SNP we are studying is in linkage disequilibrium (LD) with the disease gene but not the gene itself [23]? In such circumstances, we use the method implemented by others (e.g., [87,88]) to perform power and MSSN calculations of threshold-selected quantitative trait loci that are in LD with a disease locus.
A final and very important issue to address is that fact that the Pillai test, applied to quantitative data for all individuals, has larger MSSN values than either the Genotype test or the LTT. Our explanation for this result is that our design focuses on MSSN calculations before any data are collected. Also, our focus is on gene mapping, not on tests of linearity. If one were conducting a population-based study, where phenotype and genotype values were collected on all individuals, and all three test statistics were applied to all individuals, then the Pillai statistic would typically have the smallest sample size requirement. Consider the following example of vector settings: pd = 0.05, , τ1 = 0.0, , τ2 = 0.50, ρ = 0.0, Percent-Affected-Phenotype 01 = (Top) 100%, Percent-Affected-Phenotype 02 = (Top) 50%, Percent-Unaffected-Phenotype 01 = (Lower) 100%, Percent-Unaffected-Phenotype 02 = (Lower) 50%. The parameter settings (with the exception of the Percent-Affected and Percent-Unaffected) are taken from Table 1.
Regarding the affection thresholds, imagine a square. If we draw a horizontal line through the square, cutting it in half, affected individuals are those subjects whose pair of QTVs are in the upper half of the square, and unaffected individuals are those subjects whose pair of QTVs are in the lower half. With these thresholds, we use all of the individuals for the Genotype test and LTT, as well as the Pillai test.
Applying our formulas, we compute that MSSNs are: 1471 for the Genotype test, 1387 for the LTT, and 326 for the Pillai test for 5E-08 significance level. The Pillai MSSN is much lower than for either of the categorical data-based tests.
Similarly, if we define affection by using a vertical line rather than a horizontal line, our MSSNs are: 836 for the Genotype test, 785 for the LTT, and 326 for the Pillai test (the Pillai statistic is not dependent upon threshold settings). That is, the Pillai MSSN is less than half of either of the categorical data-based tests
Another practical issue regarding the lower values for ‘Percent-Affected’ (like 10%) is that, for small or moderate MSSN, one may not observe individuals with phenotypes in these region. For small and moderate MSSNs, the thresholds may be theoretically desirable, but impractical. In such circumstances, one might have no choice but to increase the Percent-Affected threshold.
Supplementary Material
Acknowledgments
This study was supported by a grant from the National Institute of Mental Health (R01MH092293 to GAH) and the New Jersey Center for Tourette Syndrome and Associated Disorders (to GAH). The authors gratefully acknowledge the Associate Editor and two anonymous reviewers whose comments substantially improved the quality of our manuscript.
Appendix
Notation for quantitative trait model
y: (y1, y2, …, yp) = a set of p random quantitative-trait (QT) phenotype values; note that this means there are p phenotypes. From this point forward, we shall use the term phenotype to mean a continuous random variable, represented by the notation yi.
nA: Number of affected individuals.
nU: Number of unaffected individuals.
Note: We use the term affected throughout this work. We could also use the term case. We make the same statement for unaffected and control.
r: Ratio of to = nU/nA;
Indices
1 ≤ i ≤ p: Index for phenotype (see above);
0 ≤ k ≤ 2: Index for genotype at SNP locus; this value is the number of disease or increaser alleles in the SNP genotype.
Genetic model parameters
, 1 ≤ i ≤ p: Quantitative Trait Locus (QTL) Variance of the phenotype yi; that is, it is contribution to the variance of the population's ith quantitative trait from the QTL. Note that this quantity is the genetic component of the population phenotype variance (specified in this work as N(0,1)).
, 1 ≤ i ≤ p: Error variance of the phenotype yi; using Fisher’s partitioning [104], we have . Note that the error variance is the common (phenotype-specific) variance for each of the normal components that make up the ith mixture distribution.
δi, 1 ≤ i ≤ p: Dominance of the disease allele for the phenotype yi;
In this work, we restrict δi to the range −1 ≤ δi ≤ 1, although in theory the dominance may range between −∞ and ∞ [105].
pd: Frequency of the disease (“increaser”) allele at the SNP locus of interest;
p+: Frequency of the wild-type (“null”) allele at the SNP locus of interest; note that pd + p+ = 1.
NOTE: The parameters pd and p+ should not be confused with the number of phenotypes p.
ai, 1 ≤ i ≤ p: Additive term for the phenotype yi;
, 1 ≤ i ≤ p: Dominant-additive ratio for the phenotype yi;
mi, 1 ≤ i ≤ p: Mean term for the phenotype yi;
ρij: Correlation between the variables yi and yj.
wk, 0 ≤ k ≤ 2: Weight of kth (coded) genotype in the LTT;
From Fisher’s work [104,105], we can compute the means µik from the dominance δi and the disease allele frequency pd. Fisher shows:
δi = τiai,
mi = (−1)[(pd)2ai + 2pdp+δi − (p+)2ai],
-
µi0 = mi − ai
µi1 = mi + δi.
µi2 = mi + ai
πk, 0 ≤ k ≤ 2: Mixing proportion for the component-distribution , determined by the genotype frequencies at the trait locus; because we are studying pleiotropy, the mixing proportions are independent of the phenotype index i. Note is a univariate normal distribution with mean µik and variance .
Furthermore, as documented by Lynch and Walsh [105] (among others), the genetic variance may be decomposed into the sum of an additive variance component ( ), and a dominance variance component ( ). As Lynch and Walsh report:
, where α = [ai + δi(p+ − pd)];
.
From these equations, it is straightforward to see that the genetic variance for the ith phenotype is a function of the ai, the additive term for the phenotype yi, the disease allele frequency pd, and the dominance δi.
References
- 1.Stearns FW. One hundred years of pleiotropy: a retrospective. Genetics. 2010;186:767–773. doi: 10.1534/genetics.110.122549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Didion JP, de Villena FPM. Deconstructing Mus gemischus: advances in understanding ancestry, structure, and variation in the genome of the laboratory mouse. Mamm Genome. 2013;24:1–20. doi: 10.1007/s00335-012-9441-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khalili H, Gong J, Brenner H, Austin TR, Hutter CM, et al. Identification of a common variant with potential pleiotropic effect on risk of inflammatory bowel disease and colorectal cancer. Carcinogenesis. 2015 doi: 10.1093/carcin/bgv086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Cheng I, Kocarnik JM, Dumitrescu L, Lindor NM, Chang-Claude J, et al. Pleiotropic effects of genetic risk variants for other cancers on colorectal cancer risk: PAGE, GECCO and CCFR consortia. Gut. 2014;63:800–807. doi: 10.1136/gutjnl-2013-305189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Trbojevic Akmacic I, Ventham NT, Theodoratou E, Vuckovic F, Kennedy NA, et al. Inflammatory bowel disease associates with proinflammatory potential of the immunoglobulin G glycome. Inflamm Bowel Dis. 2015;21:1237–1247. doi: 10.1097/MIB.0000000000000372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Andreassen OA, Desikan RS, Wang Y, Thompson WK, Schork AJ, et al. Abundant genetic overlap between blood lipids and immune-mediated diseases indicates shared molecular genetic mechanisms. PLoS One. 2015;10:e0123057. doi: 10.1371/journal.pone.0123057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Chang D, Gao F, Slavney A, Ma L, Waldman YY, et al. Accounting for eXentricities: analysis of the X chromosome in GWAS reveals X-linked genes implicated in autoimmune diseases. PLoS One. 2014;9:e113684. doi: 10.1371/journal.pone.0113684. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li C, Yang C, Gelernter J, Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet. 2014;133:639–650. doi: 10.1007/s00439-013-1401-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lauc G, Huffman JE, Pucic M, Zgaga L, Adamczyk B, et al. Loci associated with N-glycosylation of human immunoglobulin G show pleiotropy with autoimmune diseases and haematological cancers. PLoS Genet. 2013;9:e1003225. doi: 10.1371/journal.pgen.1003225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ramos PS, Criswell LA, Moser KL, Comeau ME, Williams AH, et al. A comprehensive analysis of shared loci between systemic lupus erythematosus (SLE) and sixteen autoimmune diseases reveals limited genetic overlap. PLoS Genet. 2011;7:e1002406. doi: 10.1371/journal.pgen.1002406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Proitsi P, Lupton MK, Velayudhan L, Newhouse S, Fogh I, et al. Genetic predisposition to increased blood cholesterol and triglyceride lipid levels and risk of Alzheimer disease: a Mendelian randomization analysis. PLoS Med. 2014;11:e1001713. doi: 10.1371/journal.pmed.1001713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Proitsi P, Lupton MK, Velayudhan L, Hunter G, Newhouse S, et al. Alleles that increase risk for type 2 diabetes mellitus are not associated with increased risk for Alzheimer's disease. Neurobiol Aging. 2014;35:2883 e2883–2883 e2810. doi: 10.1016/j.neurobiolaging.2014.07.023. [DOI] [PubMed] [Google Scholar]
- 13.Evans S, Dowell NG, Tabet N, Tofts PS, King SL, Rusted JM. Cognitive and neural signatures of the APOE E4 allele in mid-aged adults. Neurobiol Aging. 2014;35:1615–1623. doi: 10.1016/j.neurobiolaging.2014.01.145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Adeosun SO, Hou X, Zheng B, Stockmeier C, Ou X, Paul I, Mosley T, Weisgraber K, Wang JM. Cognitive deficits and disruption of neurogenesis in a mouse model of apolipoprotein E4 domain interaction. J Biol Chem. 2014;289:2946–2959. doi: 10.1074/jbc.M113.497909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Douet V, Chang L, Cloak C, Ernst T. Genetic influences on brain developmental trajectories on neuroimaging studies: from infancy to young adulthood. Brain Imaging Behav. 2014;8:234–250. doi: 10.1007/s11682-013-9260-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.van Blitterswijk M, Baker MC, DeJesus-Hernandez M, Ghidoni R, Benussi L, et al. C9ORF72 repeat expansions in cases with previously identified pathogenic mutations. Neurology. 2013;81:1332–1341. doi: 10.1212/WNL.0b013e3182a8250c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bufill E, Blesa R, Augusti J. Alzheimer's disease: an evolutionary approach. J Anthropol Sci. 2013;91:135–157. doi: 10.4436/jass.91001. [DOI] [PubMed] [Google Scholar]
- 18.Jin SC, Pastor P, Cooper B, Cervantes S, Benitez BA, Razquin C, Goate A, Cruchaga C. Pooled-DNA sequencing identifies novel causative variants in PSEN1, GRN and MAPT in a clinical early-onset and familial Alzheimer's disease Ibero-American cohort. Alzheimers Res Ther. 2012;4:34. doi: 10.1186/alzrt137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Albin RL. Antagonistic pleiotropy, mutation accumulation, and human genetic disease. Genetica. 1993;91:279–286. doi: 10.1007/BF01436004. [DOI] [PubMed] [Google Scholar]
- 20.Sun QB, Zhang KZ, Cheng TO, Li SL, Lu BX, Zhang ZB, Wang W. Marfan syndrome in China: a collective review of 564 cases among 98 families. Am Heart J. 1990;120:934–948. doi: 10.1016/0002-8703(90)90213-h. [DOI] [PubMed] [Google Scholar]
- 21.Pyeritz RE. Pleiotropy revisited: molecular explanations of a classic concept. Am J Med Genet. 1989;34:124–134. doi: 10.1002/ajmg.1320340120. [DOI] [PubMed] [Google Scholar]
- 22.Baumgartner C, Matyas G, Steinmann B, Eberle M, Stein JI, Baumgartner D. A bioinformatics framework for genotype-phenotype correlation in humans with Marfan syndrome caused by FBN1 gene mutations. J Biomed Inform. 2006;39:171–183. doi: 10.1016/j.jbi.2005.06.001. [DOI] [PubMed] [Google Scholar]
- 23.Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mitra SK. On the limiting power function of the frequency chi-square test. Ann Math Stat. 1958;29:1221–1233. [Google Scholar]
- 25.Slager SL, Schaid DJ. Case-Control Studies of Genetic Markers: Power and Sample Size Approximations for Armitage’s Test for Trend. Hum Hered. 2001;52:149–153. doi: 10.1159/000053370. [DOI] [PubMed] [Google Scholar]
- 26.Chapman DG, Nam JM. Asymptotic Power of Chi Square Tests for Linear Trends in Proportions. Biometrics. 1968;24:315–327. [PubMed] [Google Scholar]
- 27.Freidlin B, Zheng G, Li Z, Gastwirth JL. Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
- 28.Menashe I, Rosenberg PS, Chen BE. PGA: power calculator for case-control genetic association analyses. BMC Genet. 2008;9:36. doi: 10.1186/1471-2156-9-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Barrenas F, Chavali S, Alves AC, Coin L, Jarvelin MR, et al. Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms. Genome Biology. 2012;13:R46. doi: 10.1186/gb-2012-13-6-r46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Chung D, Yang C, Li C, Gelernter J, Zhao H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet. 2014;10:e1004787. doi: 10.1371/journal.pgen.1004787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Darabos C, Harmon SH, Moore JH. Using the bipartite human phenotype network to reveal pleiotropy and epistasis beyond the gene. Pac Symp Biocomput. 2014:188–199. [PMC free article] [PubMed] [Google Scholar]
- 32.Darabos C, Moore JH. Genome-wide epistasis and pleiotropy characterized by the bipartite human phenotype network. Methods Mol Biol. 2015;1253:269–283. doi: 10.1007/978-1-4939-2155-3_14. [DOI] [PubMed] [Google Scholar]
- 33.Hartley SW, Sebastiani P. PleioGRiP: genetic risk prediction with pleiotropy. Bioinformatics. 2013;29:1086–1088. doi: 10.1093/bioinformatics/btt081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.He Q, Avery CL, Lin DY. A general framework for association tests with multivariate traits in large-scale genomics studies. Genet Epidemiol. 2013;37:759–767. doi: 10.1002/gepi.21759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Huang J, Johnson AD, O'Donnell CJ. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics. 2011;27:1201–1206. doi: 10.1093/bioinformatics/btr116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li Q, Hu J, Ding J, Zheng G. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations. Biostatistics. 2014;15:284–295. doi: 10.1093/biostatistics/kxt045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Liley J, Wallace C. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS Genet. 2015;11:e1004926. doi: 10.1371/journal.pgen.1004926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, et al. The Next PAGE in understanding complex traits: design for the analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study. Am J Epidemiol. 2011;174:849–859. doi: 10.1093/aje/kwr160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Park SH, Lee JY, Kim S. A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes. Bmc Systems Biology. 2011;5(Suppl 2):S13. doi: 10.1186/1752-0509-5-S2-S13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Seoane JA, Campbell C, Day IN, Casas JP, Gaunt TR. Canonical correlation analysis for gene-based pleiotropy discovery. PLoS Comput Biol. 2014;10:e1003876. doi: 10.1371/journal.pcbi.1003876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, et al. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89:607–618. doi: 10.1016/j.ajhg.2011.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wu B, Pankow JS. Statistical Methods for Association Tests of Multiple Continuous Traits in Genome-Wide Association Studies. Ann Hum Genet. 2015;79:282–293. doi: 10.1111/ahg.12110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Yan T, Li Q, Li Y, Li Z, Zheng G. Genetic association with multiple traits in the presence of population stratification. Genet Epidemiol. 2013;37:571–580. doi: 10.1002/gepi.21738. [DOI] [PubMed] [Google Scholar]
- 45.Zhang Q, Feitosa M, Borecki IB. Estimating and testing pleiotropy of single genetic variant for two quantitative traits. Genet Epidemiol. 2014;38:523–530. doi: 10.1002/gepi.21837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pendergrass SA, Verma A, Okula A, Hall MA, Crawford DC, Ritchie MD. Phenome-Wide Association Studies: Embracing Complexity for Discovery. Hum Hered. 2015;79:111–123. doi: 10.1159/000381851. [DOI] [PubMed] [Google Scholar]
- 47.Schifano ED, Li L, Christiani DC, Lin X. Genome-wide association analysis for multiple continuous secondary phenotypes. Am J Hum Genet. 2013;92:744–759. doi: 10.1016/j.ajhg.2013.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Peterson CB, Bogomolov M, Benjamini Y, Sabatti C. Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies. Genet Epidemiol. 2016;40:45–56. doi: 10.1002/gepi.21942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ray D, Pankow JS, Basu S. USAT: A Unified Score-Based Association Test for Multiple Phenotype-Genotype Analysis. Genet Epidemiol. 2016;40:20–34. doi: 10.1002/gepi.21937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Vsevolozhskaya OA, Zaykin DV, Barondess DA, Tong X, Jadhav S, Lu Q. Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol. 2016;40:210–221. doi: 10.1002/gepi.21955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Majumdar A, Haldar T, Witte JS. Determining Which Phenotypes Underlie a Pleiotropic Signal. Genet Epidemiol. 2016;40:366–381. doi: 10.1002/gepi.21973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Baurecht H, Hotze M, Rodriguez E, Manz J, Weidinger S, Cordell HJ, Augustin T, Strauch K. Compare and Contrast Meta Analysis (CCMA): A Method for Identification of Pleiotropic Loci in Genome-Wide Association Studies. PLoS One. 2016;11:e0154872. doi: 10.1371/journal.pone.0154872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol. 2016;40:304–314. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Denny JC, Bastarache L, Roden DM. Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. Annu Rev Genomics Hum Genet. 2016;17:353–373. doi: 10.1146/annurev-genom-090314-024956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hall MA, Moore JH, Ritchie MD. Embracing Complex Associations in Common Traits: Critical Considerations for Precision Medicine. Trends Genet. 2016;32:470–484. doi: 10.1016/j.tig.2016.06.001. [DOI] [PubMed] [Google Scholar]
- 56.Liang X, Wang Z, Sha Q, Zhang S. An Adaptive Fisher's Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Sci Rep. 2016;6:34323. doi: 10.1038/srep34323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Park H, Li X, Song YE, He KY, Zhu X. Multivariate Analysis of Anthropometric Traits Using Summary Statistics of Genome-Wide Association Studies from GIANT Consortium. PLoS One. 2016;11:e0163912. doi: 10.1371/journal.pone.0163912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Verma A, Leader JB, Verma SS, Frase A, Wallace J, et al. INTEGRATING CLINICAL LABORATORY MEASURES AND ICD-9 CODE DIAGNOSES IN PHENOME-WIDE ASSOCIATION STUDIES. Pac Symp Biocomput. 2016;21:168–179. [PMC free article] [PubMed] [Google Scholar]
- 59.Wang X, Byars SG, Stearns SC. Genetic links between post-reproductive lifespan and family size in Framingham. Evol Med Public Health. 2013;2013:241–253. doi: 10.1093/emph/eot013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Knowles EE, McKay DR, Kent JW, Jr, Sprooten E, Carless MA, et al. Pleiotropic locus for emotion recognition and amygdala volume identified using univariate and bivariate linkage. Am J Psychiatry. 2015;172:190–199. doi: 10.1176/appi.ajp.2014.14030311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Schifano ED, Li L, Christiani DC, Lin X. Genome-wide Association Analysis for Multiple Continuous Secondary Phenotypes. Am J Hum Genet. 2013;92:744–759. doi: 10.1016/j.ajhg.2013.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Curran JE, McKay DR, Winkler AM, Olvera RL, Carless MA, et al. Identification of pleiotropic genetic effects on obesity and brain anatomy. Hum Hered. 2013;75:136–143. doi: 10.1159/000353953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Hokanson JE, Langefeld CD, Mitchell BD, Lange LA, Goff DC, Jr, Haffner SM, Saad MF, Rotter JI. Pleiotropy and heterogeneity in the expression of atherogenic lipoproteins: the IRAS Family Study. Hum Hered. 2003;55:46–50. doi: 10.1159/000071809. [DOI] [PubMed] [Google Scholar]
- 64.Miscimarra L, Stein C, Millard C, Kluge A, Cartier K, et al. Further evidence of pleiotropy influencing speech and language: analysis of the DYX8 region. Hum Hered. 2007;63:47–58. doi: 10.1159/000098727. [DOI] [PubMed] [Google Scholar]
- 65.Morton NE, Lalouel JM. Resolution of linkage for irregular phenotype systems. Hum Hered. 1981;31:3–7. doi: 10.1159/000153168. [DOI] [PubMed] [Google Scholar]
- 66.Njajou OT, Alizadeh BZ, Aulchenko Y, Zillikens MC, Pols HA, Oostra BA, Swinkels DW, van Duijn CM. Heritability of serum iron, ferritin and transferrin saturation in a genetically isolated population, the Erasmus Rucphen Family (ERF) Study. Hum Hered. 2006;61:222–228. doi: 10.1159/000094777. [DOI] [PubMed] [Google Scholar]
- 67.Li Z, Mottonen J, Sillanpaa MJ. A robust multiple-locus method for quantitative trait locus analysis of non normally distributed multiple traits. Heredity (Edinb) 2015 doi: 10.1038/hdy.2015.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Lee D, Williamson VS, Bigdeli TB, Riley BP, Fanous AH, Vladimirov VI, Bacanu SA. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics. 2015;31:1176–1182. doi: 10.1093/bioinformatics/btu816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Yuan Z, Zhang X, Li F, Zhao J, Xue F. Comparing partial least square approaches in a gene- or region-based association study for multiple quantitative phenotypes. Hum Biol. 2014;86:51–58. doi: 10.3378/027.086.0106. [DOI] [PubMed] [Google Scholar]
- 70.Fu G, Saunders G, Stevens J. Holm multiple correction for large-scale gene-shape association mapping. BMC Genet. 2014;15(Suppl 1):S5. doi: 10.1186/1471-2156-15-S1-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Yoo YJ, Sun L, Bull SB. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis. Front Genet. 2013;4:233. doi: 10.3389/fgene.2013.00233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Ma L, Clark AG, Keinan A. Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet. 2013;9:e1003321. doi: 10.1371/journal.pgen.1003321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Fan R, Lo SH. A robust model-free approach for rare variants association studies incorporating gene-gene and gene-environmental interactions. PLoS One. 2013;8:e83057. doi: 10.1371/journal.pone.0083057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Clarke GM, Rivas MA, Morris AP. A flexible approach for the analysis of rare variants allowing for a mixture of effects on binary or quantitative traits. PLoS Genet. 2013;9:e1003694. doi: 10.1371/journal.pgen.1003694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Zhang F, Guo X, Wu S, Han J, Liu Y, Shen H, Deng HW. Genome-wide pathway association studies of multiple correlated quantitative phenotypes using principle component analyses. PLoS One. 2012;7:e53320. doi: 10.1371/journal.pone.0053320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Korte A, Vilhjalmsson BJ, Segura V, Platt A, Long Q, Nordborg M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012;44:1066–1071. doi: 10.1038/ng.2376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li M, Ye C, Fu W, Elston RC, Lu Q. Detecting genetic interactions for quantitative traits with U-statistics. Genet Epidemiol. 2011;35:457–468. doi: 10.1002/gepi.20594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Yang F, Tang Z, Deng H. Bivariate association analysis for quantitative traits using generalized estimation equation. J Genet Genomics. 2009;36:733–743. doi: 10.1016/S1673-8527(08)60166-6. [DOI] [PubMed] [Google Scholar]
- 79.Kent JW., Jr Analysis of multiple phenotypes. Genet Epidemiol. 2009;33(Suppl 1):S33–39. doi: 10.1002/gepi.20470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hu Y, Jason S, Wang Q, Pan Y, Zhang X, Zhao H, Li C, Sun L. Regression-based approach for testing the association between multi-region haplotype configuration and complex trait. BMC Genet. 2009;10:56. doi: 10.1186/1471-2156-10-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Fang M, Liu S, Jiang D. Bayesian composite model space approach for mapping quantitative trait Loci in variance component model. Behav Genet. 2009;39:337–346. doi: 10.1007/s10519-009-9259-y. [DOI] [PubMed] [Google Scholar]
- 82.Wei Z, Li M, Rebbeck T, Li H. U-statistics-based tests for multiple genes in genetic association studies. Ann Hum Genet. 2008;72:821–833. doi: 10.1111/j.1469-1809.2008.00473.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Servin B, Stephens M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 2007;3:e114. doi: 10.1371/journal.pgen.0030114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Fan R, Jung J, Jin L. High-resolution association mapping of quantitative trait loci: a population-based approach. Genetics. 2006;172:663–686. doi: 10.1534/genetics.105.046417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71:1330–1341. doi: 10.1086/344696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Tyler AL, McGarr TC, Beyer BJ, Frankel WN, Carter GW. A genetic interaction network model of a complex neurological disease. Genes Brain Behav. 2014;13:831–840. doi: 10.1111/gbb.12178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19:149–150. doi: 10.1093/bioinformatics/19.1.149. [DOI] [PubMed] [Google Scholar]
- 88.Gordon D, Haynes C, Blumenfeld J, Finch SJ. PAWE-3D: visualizing power for association with error in case-control genetic studies of complex traits. Bioinformatics. 2005;21:3935–3937. doi: 10.1093/bioinformatics/bti643. [DOI] [PubMed] [Google Scholar]
- 89.The Marfan Foundation. 2016 URL http://www.marfan.org/dx/rules.
- 90.Loeys BL, Dietz HC, Braverman AC, Callewaert BL, De Backer J, et al. The revised Ghent nosology for the Marfan syndrome. J Med Genet. 2010;47:476–485. doi: 10.1136/jmg.2009.072785. [DOI] [PubMed] [Google Scholar]
- 91.American Psychiatric Association. DSM-IV-TR; Diagnostic and Statistical Manual of Mental Disorders. Fourth. Washington, DC: 2000. Text Revision. [Google Scholar]
- 92.Boghosian-Sell L, Comings DE, Overhauser J. Tourette syndrome in a pedigree with a 7;18 translocation: identification of a YAC spanning the translocation breakpoint at 18q22.3. Am J Hum Genet. 1996;59:999–1005. [PMC free article] [PubMed] [Google Scholar]
- 93.Diaz-Anzaldua A, Riviere JB, Dube MP, Joober R, Saint-Onge J, et al. Chromosome 11-q24 region in Tourette syndrome: association and linkage disequilibrium study in the French Canadian population. Am J Med Genet A. 2005;138A:225–228. doi: 10.1002/ajmg.a.30928. [DOI] [PubMed] [Google Scholar]
- 94.Saidou AA, Thuillet AC, Couderc M, Mariac C, Vigouroux Y. Association studies including genotype by environment interactions: prospects and limits. BMC Genet. 2014;15:3. doi: 10.1186/1471-2156-15-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Xiao J, Wang X, Hu Z, Tang Z, Xu C. Multivariate segregation analysis for quantitative traits in line crosses. Heredity (Edinb) 2007;98:427–435. doi: 10.1038/sj.hdy.6800960. [DOI] [PubMed] [Google Scholar]
- 96.Liu J, Liu Y, Liu X, Deng HW. Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet. 2007;81:304–320. doi: 10.1086/519495. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Kraft P, de Andrade M. Group 6: Pleiotropy and multivariate analysis. Genet Epidemiol. 2003;25(Suppl 1):S50–56. doi: 10.1002/gepi.10284. [DOI] [PubMed] [Google Scholar]
- 98.Bensen JT, Lange LA, Langefeld CD, Chang BL, Bleecker ER, Meyers DA, Xu J. Exploring pleiotropy using principal components. BMC Genet. 2003;4(Suppl 1):S53. doi: 10.1186/1471-2156-4-S1-S53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Lebreton CM, Visscher PM, Haley CS, Semikhodskii A, Quarrie SA. A nonparametric bootstrap method for testing close linkage vs. pleiotropy of coincident quantitative trait loci. Genetics. 1998;150:931–943. doi: 10.1093/genetics/150.2.931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Almasy L, Dyer TD, Blangero J. Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol. 1997;14:953–958. doi: 10.1002/(SICI)1098-2272(1997)14:6<953::AID-GEPI65>3.0.CO;2-K. [DOI] [PubMed] [Google Scholar]
- 101.Jiang C, Zeng ZB. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995;140:1111–1127. doi: 10.1093/genetics/140.3.1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Warne RT. A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists. Pract Assessment Res and Eval. 2014;19:1–10. [Google Scholar]
- 103.Olson CL. On choosing a test statistic in multivariate analysis of variance. Psychol Bull. 1976;83:579–586. [Google Scholar]
- 104.Fisher RA. The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb. 1918;52:399–433. [Google Scholar]
- 105.Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates; 1998. [Google Scholar]
- 106.R Development Core Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: URL http://www.R-project.org/ [Google Scholar]
- 107.O'Brien RG, Shieh G. Pragmatic, Unifying Algorithm Gives Power Probabilities for Common F Tests of the Multivariate General Linear Hypothesis (Technical Report) 1999 URL http://www.bio.ri.ccf.org/UnifyPow.
- 108.Box GEP, Hunter GS, Hunter WG. Statistics for Experimenters: Design, Discovery, and Innovation. Second. Hoboken, New Jersey, USA: J. Wiley and Sons; 2005. [Google Scholar]
- 109.Lehmann EL, Romano JP. Testing Statistical Hypotheses. Third. New York: Springer; 2010. [Google Scholar]
- 110.Ott J. Analysis of Human Genetic Linkage. Third. Baltimore, MD: The John Hopkins University Press; 1999. [Google Scholar]
- 111.Chen HY, Li M. Improving power and robustness for detecting genetic association with extreme-value sampling design. Genet Epidemiol. 2011;35:823–830. doi: 10.1002/gepi.20631. [DOI] [PubMed] [Google Scholar]
- 112.Balaam LN. Fundamentals of Biometry. London, England: George Allen & Unwin; 1972. [Google Scholar]
- 113.Finney DJ. An Experimental Study of Certain Screening Processes. J Roy Stat Soc B. 1966;28:88–109. [Google Scholar]
- 114.Curnow RN. Optimal Programmes for Varietal Selection. J Roy Stat Soc B. 1961;23:282–318. [Google Scholar]
- 115.Cochran WG. Improvement by Means of Selection: Proc Second Berkeley Symp on Math Stat and Prob. Berkeley, CA: University of California Press; 1951. pp. 449–470. [Google Scholar]
- 116.Genz AC, Malik AA. An adaptive algorithm for numeric integration over an N-dimensional rectangular region. J Comput Appl Math. 1980;6:295–302. [Google Scholar]
- 117.Berntsen J, Espelid TO, Genz A. An adaptive algorithm for the approximate calculation of multiple integrals. ACM Trans Math Soft. 1991;17:437–451. [Google Scholar]
- 118.Cochran WG. Some Methods for Strengthening the Common χ2 Tests. Biometrics. 1954;10:417–451. [Google Scholar]
- 119.Armitage P. Tests for Linear Trends in Proportions and Frequencies. Biometrics. 1955;11:375–386. [Google Scholar]
- 120.Ahn K, Haynes C, Kim W, Fleur RS, Gordon D, Finch SJ. The effects of SNP genotyping errors on the power of the Cochran-Armitage linear trend test for case/control association studies. Ann Hum Genet. 2007;71:249–261. doi: 10.1111/j.1469-1809.2006.00318.x. [DOI] [PubMed] [Google Scholar]
- 121.Buonaccorsi JP, Laake P, Veierod MB. On the power of the Cochran-Armitage test for trend in the presence of misclassification. Stat Methods Med Res. 2011;23:218–243. doi: 10.1177/0962280211406424. [DOI] [PubMed] [Google Scholar]
- 122.Marquard V, Beckmann L, Heid IM, Lamina C, Chang-Claude J. Impact of genotyping errors on the type I error rate and the power of haplotype-based association methods. BMC Genet. 2009;10:3. doi: 10.1186/1471-2156-10-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123.Zheng G, Tian X. The impact of diagnostic error on testing genetic association in case-control studies. Stat Med. 2005;24:869–882. doi: 10.1002/sim.1976. [DOI] [PubMed] [Google Scholar]
- 124.Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
- 125.Sasieni PD. From genotypes to genes: doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
- 126.Gordon D, Haynes C, Yang Y, Kramer PL, Finch SJ. Linear trend tests for case-control genetic association that incorporate random phenotype and genotype misclassification error. Genet Epidemiol. 2007;31:853–870. doi: 10.1002/gepi.20246. [DOI] [PubMed] [Google Scholar]
- 127.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128.Czika W, Weir BS. Properties of the Multiallelic Trend Test. Biometrics. 2004;60:69–74. doi: 10.1111/j.0006-341X.2004.00166.x. [DOI] [PubMed] [Google Scholar]
- 129.Faul F, Erdfelder E, Buchner A, Lang AG. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav Res Methods. 2009;41:1149–1160. doi: 10.3758/BRM.41.4.1149. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



