Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2012 Feb 21;7(2):e31242. doi: 10.1371/journal.pone.0031242

Nonparametric Evaluation of Quantitative Traits in Population-Based Association Studies when the Genetic Model is Unknown

Frank Konietschke 1, Ondrej Libiger 2, Ludwig A Hothorn 2,3,*
Editor: Frank Emmert-Streib4
PMCID: PMC3283745  PMID: 22363593

Abstract

Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.

Introduction

Genome-wide association studies involving large population-based samples have become a common strategy employed for the identification of common variants that affect a particular trait or play a role in disease. A majority of these studies involve comparing allele frequencies of di-allelic markers (e.g., SNPs) in cases and controls (see e.g., [1]). Formally, this is often accomplished via the Cochran-Armitage trend test [2] as implemented in the publicly available software PLINK [3]. Since the mode of inheritance at a given locus is often unknown, a maximum-test based on three mode-specific standardized Cochran Armitage trend tests was proposed [4].

Alternatively, continuous endpoints (i.e., quantitative traits), such as uromodulin [1] or TNFa protein [5] are commonly analyzed via a linear regression model using genotype scores Inline graphic adjusted for covariates [6]. In order to maintain statistical validity, this approach requires that three important assumptions be met: 1. additive mode of inheritance; 2. normally distributed errors, and; 3. homogeneous variances. However, in reality one or more of these assumptions are often violated. The underlying mode of inheritance is often unknown. In addition, the assumption of normality is violated in studies that involve pQTL data [5], continuous endpoints with outliers (e.g., [6]), ordered categorical data (e.g., [7]), or phenotypes with values below the detection limit (e.g., [5]). Although transformation of the endpoints into an approximate normal distributed variable allows the use of standard approaches in the generalized linear model, the transformation is data-dependent, i.e. the choice of log-, log+constant, Box-Cox-transformation for pQTLs [6] might result in different conclusions. In particular, the re-transformation on the original scale is not unique. Nonparametric regression models, e.g. quantile regression [6], are an interesting option, however, up to now, only available for an additive mode of inheritance.

Nonparametric approaches do not require normality. However, the often used nonparametric Kruskal-Wallis test [1],[8],[9] achieves suboptimal power when the locus is governed by a specific mode of inheritance. This occurs because it is a global test of heterogeneity in the endpoint values among the three genotype groups. It is also not robust against variance heterogeneity. Jonckheere-Tepstra test [10], an analog of the Kruskal-Wallis test for near-to-linear ordered restricted alternatives, shares many characteristics with the Kruskal-Wallis test, while being particularly sensitive to an additive mode of inheritance. Furthermore, Kruskal-Wallis and Jonckheere-Tepstra nonparametric procedures are global testing procedures based on global ranks whose distribution is only available under the global null hypothesis. Therefore, it is not possible to compute confidence intervals for the genetic effects of interest using these approaches. In summary, none of these classic nonparametric approaches i) allow the identification of the most likely mode of inheritance via estimation of related simultaneous confidence intervals, ii) are sensitive not only to an additive mode of inheritance, or iii) are robust against variance heterogeneity. Our proposed testing procedure, on the other hand, can be extended to provide this crucial information for interpreting the biological relevance of the association results.

Recommendations given in the ‘Strengthening the Reporting of Genetic Association studies’ report [11] include providing estimators of an adequate effect size and their confidence intervals. For example, reporting odds ratios for additive, recessive and dominant models and their marginal confidence limits (as in e.g., [12]) provides a percentage measure of clinical relevance (distance from the lower/upper confidence limit to one, the value associated with the null hypothesis). While traditional significance testing usually deals with differences between population means, there is an increasing focus in medicine on the probability of one treatment being more successful than another on a per-individual basis [13]. The relative effect size [14]

graphic file with name pone.0031242.e002.jpg (1)

represents a measure of how often a randomly chosen subject receiving treatment X will outperform a randomly chosen subject receiving treatment Y [13], i.e. the probability that a randomly selected subject in the control reveals a smaller response value than a randomly selected subject in the treatment group. In case of ordered categorical data, Inline graphic is also called ordinal effect size measure [15].

We describe a Behrens-Fisher version of multiple contrast test for relative effects [16],[14] based on the maximum test principle [4]. This is a purely nonparametric testing procedure that is valid when the three assumptions mentioned in the previous paragraph are not met. Furthermore, our proposed approach simultaneously tests association under the assumption of the three basic modes of inheritance, additive (add), recessive (rec) and dominant (dom) for both continuous and discontinuous distributions. We generalize the relative effect p for an adequate formulation of genetic effects, and provide multiple contrast tests and simultaneous confidence intervals, which allow the simultaneous testing of the three genetic models of inheritance.

Methods

Motivating example

A real data example with the right skewed distributed phenotype total cholesterol was selected from the Bogalusa Heart Study (BHS) [17]. This longitudinal study included genotype information on 525 unrelated individuals of European descent at 545,821 SNPs where twelve clinically-relevant quantitative traits were measured for each study participant. We applied the nonparametric multiple contrast test to a one-way layout for SNP rs7738656 in the gene C6orf170/GJA1 and the phenotype total cholesterol, which was published in Table 1 of the original paper as significant for an unimputed SNP [18]. The jittered boxplots in Figure 1 show an unbalanced design, variance heterogeneity and a rather skewed distribution with some extreme values, particularly for the homozygote minor allele genotype group. Therefore, the question arises whether the parametric analysis in the original publication using a linear regression model assuming an additive mode of inheritance and normally distributed errors with homogeneous variances is appropriate.

Table 1. 95-%Simultaneous confidence intervals for Inline graphic and Inline graphic for the SNP rs7738656.

Model Effect-Estimator 95%-Simultaneous Intervals Adjusted p-Value
Dominant −0.240 [−0.378; −0.082] 0.0058
Additive −0.250 [−0.387; −0.092] 0.0043
Recessive −0.053 [−0.119; 0.015] 0.13

Figure 1. Boxplot: Total cholesterol vs. genotype rs7738656.

Figure 1

Nonparametric model and genetic effects

Let Inline graphic, Inline graphic and Inline graphic denote the genotypes, where Inline graphic is the high risk allele and Inline graphic is any of the other alleles. For convenience, abbreviate the genotypes with Inline graphic, Inline graphic, and Inline graphic. The related data are given by Inline graphic, where Inline graphic and Inline graphic denotes the subject within genotype level Inline graphic, Inline graphic. The data Inline graphic are assumed to be independent. The total sample size is Inline graphic. We assume that the phenotypes Inline graphic follow an arbitrary distribution Inline graphic, i.e.

graphic file with name pone.0031242.e023.jpg (2)

This general model (2) does not contain any parameters that could be used to describe a difference between the distributions. Therefore, the distribution functions Inline graphic are used to define purely nonparametric treatment effects on an individual basis for each genotype level by

graphic file with name pone.0031242.e025.jpg (3)

These effects are also called unweighted relative effects [19], [20]. If Inline graphic, then the values from Inline graphic tend to be smaller than those from Inline graphic. In case of Inline graphic, none of the observations tend to be smaller or larger. Therefore, these effects can be as easily interpreted as the usual means in parametric models. Let Inline graphic denote the vector of the unweighted relative effects.

For the formulation of nonparametric genetic effects, let

graphic file with name pone.0031242.e031.jpg

denote the so-called Marcus-type contrast matrix [21]. In case of a balanced design, Inline graphic reduces to

graphic file with name pone.0031242.e033.jpg

Each row vector Inline graphic of Inline graphic corresponds to one of the three genetic models. In case of a dominant mode of inheritance, the distributions Inline graphic and Inline graphic are identical, therefore a relative genetic effect for this mode can be expressed by

graphic file with name pone.0031242.e038.jpg

which denotes the difference between the pooled effect Inline graphic among the samples Inline graphic and Inline graphic and Inline graphic. Thus, in case of “no dominant effect”, Inline graphic, or equivalently, Inline graphic. Analogously, in case of a recessive mode of inheritance, the distributions Inline graphic and Inline graphic are identical, thus, a relative recessive effect can be expressed by

graphic file with name pone.0031242.e047.jpg

which denotes the difference between the pooled effect Inline graphic and Inline graphic. “No recessive mode of inheritance” means Inline graphic, or, equivalently, Inline graphic. In addition, the relative genetic effect for an additive mode of inheritance can be expressed by

graphic file with name pone.0031242.e052.jpg

Thus, the case of no global effect is characterized by Inline graphic, or, equivalently, Inline graphic.

A multiple contrast test approach for the three genetic models

To test the individual hypothesis Inline graphic, where Inline graphic, define a test statistic Inline graphic, which denotes, as usual, a studentized estimator Inline graphic of Inline graphic with its estimated standard error (details see Technical details). The three test statistics Inline graphic and Inline graphic are collected in the vector

graphic file with name pone.0031242.e062.jpg

The multiple contrast test and the simultaneous confidence intervals for Inline graphic are based on the asymptotic multivariate normality of Inline graphic, i.e. the correlation among the three test statistics Inline graphic is accounted for. Instead of using critical values coming from a standard normal distribution (or t-distribution), we use critical values from the multivariate normal distribution Inline graphic, where Inline graphic denotes the estimated correlation matrix. This means, the individual hypothesis Inline graphic is rejected at multiple level Inline graphic of significance, if

graphic file with name pone.0031242.e070.jpg (4)

where Inline graphic denotes the Inline graphic-equicoordinate quantile from Inline graphic. Simultaneous confidence intervals for the three genetic effects Inline graphic and Inline graphic are given by

graphic file with name pone.0031242.e076.jpg (5)

i.e. point estimator Inline graphic quantile Inline graphic estimated standard error. Note that the individual test decisions and the simultaneous confidence intervals are compatible, i.e. it can not occur that an individual hypothesis has been rejected, but the corresponding simultaneous confidence interval includes the value from the null hypothesis. These confidence intervals, however, may not be range preserving, i.e. the lower bounds may be smaller than Inline graphic and the upper bounds can be larger than 1. Range preserving confidence intervals can be easily constructed by using the delta method [16], [22], with the Fisher transformation. The global hypothesis Inline graphic will be rejected, if

graphic file with name pone.0031242.e081.jpg (6)

i.e. if any of the three individual hypotheses have been rejected. For small sample sizes, the quantiles from the multivariate normal distribution are replaced by quantiles coming from a multivariate t-distribution (details see Technical details).

Evaluation of score phenotypes

Particularly in psychiatric epidemiology, different mental scores are often used as phenotypes, see e.g. [7]. Some mental scores are based on only few categories, e.g. Inline graphic, others represent sums of sub-scores with a wider range of count values. The definition of the unweighted relative effect Inline graphic defined in (3) includes ordered categorical data. For an arbitrary monotone transformation Inline graphic of the data, it can be seen that

graphic file with name pone.0031242.e085.jpg
graphic file with name pone.0031242.e086.jpg

Thus, the effect measure is invariant under monotone transformations of the data. On the other hand, if the data are transformed by a monotonic decreasing function Inline graphic

graphic file with name pone.0031242.e088.jpg
graphic file with name pone.0031242.e089.jpg

This means that the effect measure Inline graphic is reflected at Inline graphic in case of a monotonic decreasing transformation of the data. Therefore, Inline graphic is an adequate measure for ordered categorical data, because the information is independent from the chosen scale of the scores.

Evaluation of phenotypes with values below a detection limit

Sometimes phenotypes with values below a detection limit occur, see e.g. [5]. Since the ordinal effect size is appropriate for tied values, all data below the detection limit should be fixed on a particular value. Note that this approach is only exact when a unique detection limit exists, i.e. the problem is more complex when different centers in a meta-analysis framework have different detection limits. However, due to the ranking of the groups, the problem of choosing the “best value” will not occur.

Results

Evaluation of the example

The new nonparametric multiple contrast test Inline graphic was used for the statistical analysis of the motivating example above. Rank estimators for the three unweighted relative effects are given by Inline graphic, Inline graphic and Inline graphic, respectively. Assuming AA is the risk allele, a decrease to AG and GG occur. Table 1 summarizes the results of simultaneous Marcus-type comparisons.

The upper confidence limit of the additive model is most distant to Inline graphic, or, compatible to that, reveals the smallest p-value of Inline graphic. The 95%-simultaneous confidence intervals indicate a positive association with the high risk allele Inline graphic for the phenotype total cholesterol. The related parametric approach results a much smaller p-value of Inline graphic for the additive mode of inheritance (Inline graphic in the original publication [17] with an adjustment against covariates). This example illustrates the impact of the underlying assumptions being violated, in particular the assumption of normally distributed errors with homogeneous variances. The global rank Kruskal-Wallis test on heterogeneity reveals a p-value of only Inline graphic.

In summary, using the multiple contrast tests yields specific information regarding the genetic mode of inheritance as well as simultaneous confidence intervals.

Simulations

We evaluated the empirical type-I error rates and the powers of nonparametric multiple contrast tests via extensive simulation studies. All simulations were performed using the publicly available software R (version 2.12.1; www.r-project.org). Every simulation step was repeated 10,000 times.

The trait genotypes for Inline graphic subjects were randomly drawn from a multinomial distribution with cell probabilities given by allele frequencies at trait locus Inline graphic, allele frequencies at trait marker Inline graphic and linkage disequilibrium delta Inline graphic. Phenotypic values for the quantitative traits were generated from normal and log-normal distributions, choosing Inline graphic for the residual variance, and varying the percentage of variance explained by the quantitative trait Inline graphic for an additive, dominant, or recessive mode of inheritance. Log-normal phenotypes were generated by first drawing normal phenotypic values Inline graphic and then by applying the transformation method Inline graphic, where Inline graphic denotes the quantile function of the log-normal distribution, and Inline graphic denotes the standard normal distribution function. If Inline graphic, no variance is explained by the quantitative trait, thus, Inline graphic for all parameter settings. Low values of allele frequencies at trait marker (Inline graphic) result in strongly unbalanced designs. In addition, different values of Inline graphic, Inline graphic, and Inline graphic form specific multimodal distributions under the alternative. Figure 2 displays examples of simulated normal and log-normal data for different values of Inline graphic, Inline graphic, Inline graphic and Inline graphic.

Figure 2. Simulated normal (left) and log-normal (right) data for different values of variance explained by the quantitative trait .

Figure 2

Inline graphic , Inline graphic , Inline graphic , Inline graphic and an additive, dominant and recessive mode of inheritance.

Since the expectation of a multimodal distribution is the weighted sum of the single expectations, the parameter settings on Inline graphic and Inline graphic are an important issue in the investigation of power analyses.

Results

We simulated the nonparametric multiple contrast tests Inline graphic as defined in (6) as well as its transformed approach by using the Fisher-transformation (Fisher). Two different types of contrasts will be examined throughout the simulation studies: (i) all-pairwise comparisons by using the contrast matrix

graphic file with name pone.0031242.e130.jpg

to be sensitive against any heterogeneity (All-Pairs) and (ii) the Marcus-type contrast matrix Inline graphic to be sensitive against exactly the three basic genetic modes of inheritance (Marcus). For each kind of contrast, the nonparametric multiple contrast tests are compared with the parametric multiple contrast tests for homoscedastic normal samples proposed by [23] as well as for heteroscedastic normal samples by [24] (denoted by Bretz and Hasler). For all-pairs comparisons, these four multiple contrast test procedures are compared with the nonparametric multiple test procedures by Steel [25] (Steel), the permutative Nemenyi-test [26] for all-pairs comparisons (Nemenyi), the Kruskal-Wallis test (KW) and the usual ANOVA-F-test (ANOVA). For Marcus-type comparisons, the four multiple contrast tests are compared with the nonparametric permutative Nemenyi-test for Marcus-type comparisons and the usual linear regression analysis (Reg). Figure 3 displays the type-I error simulation results (i.e. Inline graphic; Inline graphic) for different values of Inline graphic, an additive, dominant, and recessive mode of inheritance, and linkage disequilibrium delta Inline graphic for both normal and log-normal distributed phenotypes (Inline graphic).

Figure 3. Type-I error (.

Figure 3

Inline graphic ) simulation results for all-pairs (left) and Marcus-type (right) comparisons using normal (upper row) and log-normal (lower row) distributions ( Inline graphic ).

It follows from Figure 3, that, under normality, all considered procedures control the type-I error at level Inline graphic for both all-pairs and Marcus-type comparisons. In the case of extremely unbalanced designs (Inline graphic), however, the new multiple contrast tests Inline graphic and Fisher tend to be quite liberal. This is due to the fact that these procedures do not use a pooled variance estimator. We observed this for both normally and log-normally distributed data. In the case of larger sample sizes (Inline graphic), this effect disappears. The parametric multiple contrast test by Hasler tends to be very liberal when the normality assumption is violated.

To investigate the power of the different procedures mentioned above, different parameter settings on the variance explained by the quantitative trait (Inline graphic) and different values on Inline graphic and Inline graphic were examined. Figure 4 displays the simulation results for Inline graphic.

Figure 4. Power-simulation results (.

Figure 4

Inline graphic ) for all-pairs (left) and Marcus-type (right) comparisons using normal (upper row) and log-normal (lower row) distributions ( Inline graphic ). The variance explained by the quantitative trait was set to Inline graphic.

Figure 4 shows that the power of all investigated procedures depends on the parameter settings of Inline graphic and Inline graphic. The combination of these parameters leads to specific values of weights in the multimodal distributions of the phenotypes as displayed in Figure 2. We observe that for a given Inline graphic and Inline graphic, the power of the tests is smaller for larger Inline graphic, although the data are almost balanced in such settings. This occurs, because the weighting parameters of the multimodal distributions are likely in case of smaller allele frequencies at trait marker. In case of Inline graphic, the bimodal distributions consist of a dominated and a dominating part, which results in a smaller expectation of the phenotypic values in all considered cases. In case of extremely unbalanced designs (Inline graphic), the power of the new procedures is quite low; in general, their power is not estimable due to their liberality in such settings. For normal distributions, the powers of all the parametric and nonparametric procedures are nearly identical in case of Inline graphic. When the normality assumption is violated, the nonparametric procedures have a considerably higher power than the parametric procedures. The power of the new multiple contrast tests are likely to be identical to the power of the Kruskal-Wallis test. The Kruskal-Wallis test, however, can only be used for testing the global null hypothesis, and cannot provide any information regarding genetic association. Further, comparing the results of the all-pairs and Marcus-type comparisons, we observe that all the four multiple contrast tests exhibit higher power when using the Marcus-type contrast matrix compared to using the Tukey-type contrast matrix Inline graphic. Simple linear regression analysis should not be used, because (i) the genotypic values are not metric numbers and thus the results depend on the chosen numbers for the three genotype scores and (ii), in all simulations the regression does not provide a considerably higher power than the multiple contrast test procedures. The same conclusions can be drawn for the simulation results obtained by Inline graphic, which are displayed in Figure 5.

Figure 5. Power-simulation results (.

Figure 5

Inline graphic ) for all-pairs (left) and Marcus-type (right) comparisons using normal (upper row) and log-normal (lower row) distributions ( Inline graphic ). The variance explained by the quantitative trait was set to Inline graphic.

Software

For a convenient application of the developed procedures, the R-software package nparcomp was developed and is available from CRAN. It contains various functions for the analysis of two independent samples (npar.t.test), as well as functions for the computation of nonparametric multiple contrast tests and simultaneous confidence intervals based on global ranks and pairwise ranks. For example, the function nparcomp computes simultaneous confidence intervals and adjusted p-values for relative effects in arbitrary contrast settings based on pairwise ranks. Moreover, one-sided and two-sided confidence intervals and adjusted p-values are computed using multivariate normal-approximation, multivariate Inline graphic-approximation, Logit-approximation and Probit-approximation described in [16].

Discussion

A nonparametric approach to evaluate the association between a di-allelic marker and a non-normal distributed quantitative trait is proposed for simple population-based studies. Using a Marcus-type multiple contrast test for relative effects allows model-specific testing of either dominant, additive or recessive mode of inheritance. Furthermore, an all-pairwise comparisons contrast test is proposed as an alternative to the Kruskal-Wallis heterogeneity test. Procedures for obtaining related simultaneous confidence intervals or multiplicity-adjusted p-values are provided. The advantage of obtaining confidence intervals is their interpretability in terms of stochastic order for studies with individuals according to [13]. Although related software is freely available using the R library nparcomp, the routine analysis of hundreds of thousands of SNPs can not be recommended. The computing time would be excessive and the amount of detailed information difficult to manage. For some selected candidate SNPs this approach can be easily performed for a number of phenotypes. If still an analysis on a genome-wide level is intended, an appropriate multiplicity adjustment of the simultaneous confidence is recommended, such as the false coverage statement rate [27].

Adjustment against multiple covariates is an important issue in unbiased testing association. The adjustment against population stratification, e.g. by principle components [28], or subject-specific baseline values, e.g. age, are relevant. For example [6] adjusted the relationship between an eQTL and the genotype scores against the covariates age, kind of tissue (kidney cortex or medulla), ancestry (CEU or not) and gender (males or females). Nonparametric analysis of covariance is challenging [29], particularly to adjust against covariates due to possible population stratification. This is a topic of future work.

Technical details

To estimate the unknown relative effect Inline graphic defined in (3), let

graphic file with name pone.0031242.e165.jpg

denote the empirical distribution function, where Inline graphic according to Inline graphic, respectively. An unbiased estimator of Inline graphic as used in (3) is obtained by replacing the unknown distributions Inline graphic and Inline graphic by their empirical counterparts Inline graphic and Inline graphic. The estimators

graphic file with name pone.0031242.e173.jpg (7)

can be easily computed with the ranks Inline graphic of the observations Inline graphic. Here, Inline graphic denotes the rank of Inline graphic among all Inline graphic observations in the combined sample Inline graphic. Thus, an estimator of Inline graphic is given by

graphic file with name pone.0031242.e181.jpg

The ranks used for the estimation of Inline graphic are also called pseudo-ranks in the literature [20]. In case of a balanced design (Inline graphic), the pseudo-ranks are identical to the usual global ranks. Let Inline graphic denote the vector of the three estimators. Thus, rank estimators of the three relative genetic effects Inline graphic, Inline graphic, and Inline graphic are given by Inline graphic, Inline graphic, and Inline graphic, respectively.

It was shown that Inline graphic is asymptotically multivariate normal with mean Inline graphic and covariance matrix Inline graphic [16], [22]. Due to the quite involved structure of Inline graphic, let the estimator of Inline graphic be denoted by Inline graphic [16], [22]. To test each individual hypothesis Inline graphic on no genetic association, where Inline graphic, let Inline graphic denote the variance estimator of Inline graphic and define the test statistic

graphic file with name pone.0031242.e201.jpg

The three test statistics Inline graphic are collected in the vector

graphic file with name pone.0031242.e203.jpg

The distribution of Inline graphic can be approximated by a multivariate Inline graphic distribution, where Inline graphic denotes a Welch-Satterthwaite degree of freedom, non-centrality vector Inline graphic, and estimated correlation matrix Inline graphic [16]. The individual hypothesis Inline graphic will be rejected at multiple level Inline graphic, if

graphic file with name pone.0031242.e211.jpg

where Inline graphic denotes the Inline graphic equicoordinate quantile of Inline graphic. Approximate Inline graphic simultaneous confidence intervals for the three genetic effects Inline graphic, Inline graphic, and Inline graphic are obtained from

graphic file with name pone.0031242.e219.jpg (8)

The global hypothesis Inline graphic will be rejected, if

graphic file with name pone.0031242.e221.jpg

Range preserving confidence intervals are given by

graphic file with name pone.0031242.e222.jpg (9)

where

graphic file with name pone.0031242.e223.jpg (10)

Alternatively, a pairwise rankings version is available which can be easily derived from two-sample tests. They behave similarly to the global rankings approach, but they can lead to paradoxical results.

Pairwise rankings version

As mentioned in the previous section, the three genetic effects Inline graphic, Inline graphic, denote generalized two-sample relative effects, which were estimated with global ranks of the data Inline graphic. Thus, the effects can be modified such that pairwise ranks are used for estimation. Let

graphic file with name pone.0031242.e227.jpg (11)

denote the two-sample relative effect between the genotype levels Inline graphic and Inline graphic. If Inline graphic, then the values from Inline graphic tend to be larger than those from Inline graphic. In case of Inline graphic, none of the observations tend to be smaller or larger. Thus, the case of no association can be expressed by Inline graphic. The relative dominant genetic effect on association describes the difference between the distribution Inline graphic and the combined sample Inline graphic. Thus, a two-sample relative dominant effect can be described by

graphic file with name pone.0031242.e237.jpg

and denotes a linear combination of Inline graphic. The relative recessive effect describes the difference between the combined sample Inline graphic and Inline graphic. Thus, a relative effect on a recessive mode of inheritance is given by

graphic file with name pone.0031242.e241.jpg

Finally, the relative two-sample effect on an additive mode of inheritance can be expressed by

graphic file with name pone.0031242.e242.jpg

The effects Inline graphic can be estimated by using the pairwise rank estimators Inline graphic defined in (7) by

graphic file with name pone.0031242.e245.jpg
graphic file with name pone.0031242.e246.jpg
graphic file with name pone.0031242.e247.jpg

Multiple contrast tests for the hypotheses Inline graphic and simultaneous confidence intervals for the effects Inline graphic, where Inline graphic, can be derived in the same way as described in the previous section. We note that the effects Inline graphic may be intransitive, i.e. it may occur that Inline graphic resulting in paradoxical results [30], [31]. Therefore, we recommend using the global ranking version.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: The work was supported by DFG Br 655/16-1 (Frank Konietschke) and DFG HO-1687/9 (Ludwig A. Hothorn). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Padmanabhan S, Melander O, Johnson T, Di Blasio AM, Lee WK, et al. Genome-wide association study of blood pressure extremes identifies variant near umod associated with hypertension. Plos Genetics. 2010;6:e1001177. doi: 10.1371/journal.pgen.1001177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sasieni P. From genotypes to genes: Doubling the sample size. Biometrics. 1997;53:1253–1261. [PubMed] [Google Scholar]
  • 3.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, et al. Plink: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Freidlin B, Zheng G, Li Z, Gastwirth J. Trend tests for case-control studies of genetic markers: Power, sample size and robustness. Human Heredity. 2002;53:146–152. doi: 10.1159/000064976. [DOI] [PubMed] [Google Scholar]
  • 5.Melzer D, Perry JRB, Hernandez D, Corsi AM, Stevens K, et al. A genome-wide association study identifies protein quantitative trait loci (pqtls). Plos Genetics. 2008;4:e1000072. doi: 10.1371/journal.pgen.1000072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wheeler HE, Metter EJ, Tanaka T, Absher D. Sequential use of transcriptional profiling, expression quantitative trait mapping, and gene association implicates mmp20 in human kidney aging. Plos Genetics. 2009;5:e1000685. doi: 10.1371/journal.pgen.1000685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fanous AH, Chen XN, Wang X, Amdur R, O'Neill FA, et al. Genetic variation in the serotonin 2a receptor and suicidal ideation in a sample of 270 irish high-density schizophrenia families. American Journal of Medical Genetics Part B-neuropsychiatric Genetics. 2009;150B:411–417. doi: 10.1002/ajmg.b.30833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kruglyak L, Lander ES. A nonparametric approach for mapping quantitative trait loci. Genetics. 1995;139:1421–1428. doi: 10.1093/genetics/139.3.1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kruskal WH, Wallis WA. Use of ranks in one-criterion variance analysis. J Amer Stat Ass. 1952;47:583–621. [Google Scholar]
  • 10.Jonckheere A. A Distribution-free k-sample Test Against Ordered Alternatives. Biometrika. 1954;41:133–145. [Google Scholar]
  • 11.Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, et al. Strengthening the reporting of genetic association studies (strega): an extension of the strobe statement. Human Genetics. 2009;125:131–151. doi: 10.1007/s00439-008-0592-7. [DOI] [PubMed] [Google Scholar]
  • 12.Gold B, Kirchhoff T, Stefanov S, Lautenberger J, Viale A, et al. Genome-wide association study provides evidence for a breast cancer risk locus at 6q22–33. Proceedings of the National Academy of Sciences of the United States of America. 2008;105:4340–4345. doi: 10.1073/pnas.0800441105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Browne RH. The t-test p value and its relationship to the effect size and p(x<y). American Statistician. 2010;64:30–33. [Google Scholar]
  • 14.Brunner E, Munzel U. The nonparametric behrens-fisher problem: Asymptotic theory and a small-sample approximation. Biometrical Journal. 2000;1:17–21. [Google Scholar]
  • 15.Ryu E. Simultaneous confidence intervals using ordinal effect measures for ordered categorical outcomes. Statistics in Medicine. 2008;28:3179–3188. doi: 10.1002/sim.3700. [DOI] [PubMed] [Google Scholar]
  • 16.Konietschke F. Simultane Konfidenzintervalle für nichtparametrische relative Kontrasteffekte. 2009. Ph.D. thesis, Georg-August Universität Göttingen.
  • 17.Smith EN, Chen W, Kahonen Mea. Longitudinal genome-wide association of cardiovascular disease risk factors in the bogalusa heart study. Plos Genetics. 2010;6:e1001094. doi: 10.1371/journal.pgen.1001094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Smith EN, Bloss CS, Badner JA, Barrett T, Belmonte PL, et al. Genome-wide association study of bipolar disorder in european american and african american individuals. Molecular Psychiatry. 2009;14:755–763. doi: 10.1038/mp.2009.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Brunner E, Puri ML. Nonparametric methods in factorial designs. Statistical Papers. 2001;42:1–52. [Google Scholar]
  • 20.Gao XY, Stamier J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology. 2008;32:361–369. doi: 10.1002/gepi.20310. [DOI] [PubMed] [Google Scholar]
  • 21.Marcus R. Some results on simultaneous confidence-intervals for monotone contrasts in oneway anova model. Communications In Statistics Part A-Theory And Methods. 1982;11:615–622. [Google Scholar]
  • 22.Pfeiffer S. Simultane Konfidenzintervalle für nichtparametrische Effekte in faktoriellen Modellen. 2010. Master's thesis, University of Göttingen.
  • 23.Bretz F, Genz A, Hothorn LA. On the numerical availability of multiple comparison procedures. Biometrical Journal. 2001;43:645–656. [Google Scholar]
  • 24.Hasler M, Hothorn LA. Multiple contrast tests in the presence of heteroscedasticity. Biometrical Journal. 2008;50:793–800. doi: 10.1002/bimj.200710466. [DOI] [PubMed] [Google Scholar]
  • 25.Steel RDG. A rank sum test for comparing all pairs of treatments. Technometrics. 1960;2:197–207. [Google Scholar]
  • 26.Hothorn T, Hornik K, van de Wiel MA, Zeileis A. coin: Conditional Inference Procedures in a Permutation Test Framework. 2007 URL http://CRAN.R-project.org/. R package version 0.6-2. [Google Scholar]
  • 27.Benjamini Y, Yekutieli D. False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association. 2005;100:71–81. [Google Scholar]
  • 28.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  • 29.Bathke AC, Brunner E. 2003. pp. 109–120. Recent advances and trends in nonparametric statistics, Elsevier, chapter A Nonparametric Alternative to Analysis of Covariance.
  • 30.Brown BM, Hettmansperger TP. Kruskal-wallis, multiple comparisons and efron dice. Australian and New Zealand Journal of Statistics. 2002;44:427–438. [Google Scholar]
  • 31.Thangavelu K, Brunner E. Wilcoxon mann-whitney test and efrons paradox dice. Journal of Statistical Planning and Inference. 2006;137:720–737. [Google Scholar]

Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES