Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Nov 5.
Published in final edited form as: Genet Epidemiol. 2015 Mar 10;39(4):249–258. doi: 10.1002/gepi.21893

Permutation Testing in the Presence of Polygenic Variation

Mark Abney 1,*
PMCID: PMC4634896  NIHMSID: NIHMS734091  PMID: 25758362

Abstract

This article discusses problems with and solutions to performing valid permutation tests for quantitative trait loci in the presence of polygenic effects. Although permutation testing is a popular approach for determining statistical significance of a test statistic with an unknown distribution—for instance, the maximum of multiple correlated statistics or some omnibus test statistic for a gene, gene-set, or pathway—naive application of permutations may result in an invalid test. The risk of performing an invalid permutation test is particularly acute in complex trait mapping where polygenicity may combine with a structured population resulting from the presence of families, cryptic relatedness, admixture, or population stratification. I give both analytical derivations and a conceptual understanding of why typical permutation procedures fail and suggest an alternative permutation-based algorithm, MVNpermute, that succeeds. In particular, I examine the case where a linear mixed model is used to analyze a quantitative trait and show that both phenotype and genotype permutations may result in an invalid permutation test. I provide a formula that predicts the amount of inflation of the type 1 error rate depending on the degree of misspecification of the covariance structure of the polygenic effect and the heritability of the trait. I validate this formula by doing simulations, showing that the permutation distribution matches the theoretical expectation, and that my suggested permutation-based test obtains the correct null distribution. Finally, I discuss situations where naive permutations of the phenotype or genotype are valid and the applicability of the results to other test statistics.

Keywords: permutation test, polygenic effect, family studies, population structure, type I error rate, QTL

Introduction

In the search for genetic determinants of complex traits, we may be faced with the difficulty of determining the statistical significance of a given test statistic that does not necessarily follow any known probability distribution. This arises when correcting for the multiple comparisons of many correlated tests—for example, to determine genome-wide significance [Abney et al., 2002; Cheng and Palmer, 2010]—or in methods where multiple variants (e.g., rare variants) are aggregated into an omnibus test. Methods that use weights that vary depending on the phenotype data, for instance, typically do not have a known asymptotic distribution and require resampling methods to estimate significance [Fang et al., 2012; Sha et al., 2012]. Even when an asymptotic distribution is known, realities of genetic data, such as population structure or linkage disequilibrium, may result in an inflated false-positive rate [Epstein et al., 2012; Liu et al., 2013; Tintle et al., 2011]. Family-based methods, though often robust to population stratification, can also have false-positive rates above the nominal level [Greco et al., 2014; Kazma and Bailey, 2011]. Permutation tests can be a solution in such cases [Basu and Pan, 2011; Lin and Tang, 2011], but they rely on the assumption that the subjects are independent. This assumption is violated, for instance, in the presence of population stratification [Epstein et al., 2012; Liu et al., 2013] or familial relatedness [e.g., Abney et al., 2002; Bourgain and Genin, 2005; Kazma and Bailey, 2011], preventing the valid application of a permutation test.

At the heart of the invalidity of a permutation test in the presence of population stratification or relatedness is the presence of polygenic effects and its confounding with genotypes. As I discuss below, this may result in a lack of exchangeability between subjects, a fundamental requirement of a permutation test. It is worth noting that relatedness is not always a barrier to a valid permutation test. For instance, in some model organism breeding designs, exchangeability exists, allowing a valid permutation test [Churchill and Doerge, 1994], and more-complicated breeding designs can also, with careful thought, lead to valid permutation tests [Cheng et al., 2013; Cheng and Palmer, 2010; Churchill and Doerge, 2008; Peirce et al., 2008]. Similarly, given specific restrictions on the types of relatedness that is present among the subjects (e.g., only siblings), it may be possible to formulate a valid permutation test [e.g., Allison et al., 1999; Fang et al., 2012]. However, many forms of population structure, including familial relatedness, can cause confounding that can invalidate a permutation test. Although in a simple population stratification scenario—where a limited number of principal components can adjust for the background genetic confounding—it is possible to formulate a valid permutation test [Epstein et al., 2012], in more-complicated scenarios, as often exist in human studies—where close or distant relatedness, cryptic or otherwise, may possibly combine with other forms of population structure—a clear statistical framework to help researchers determine the applicability of a permutation test, or how precisely to do such a test, has been lacking.

Here, I consider possible permutation approaches for quantitative traits where arbitrary forms of population structure may exist in the sample. The presence of both population structure and polygenicity leads me to using the linear mixed model (LMM) for multivariate normal data as a foundation on which to build, as this is a standard model used in the genetic analysis of quantitative traits. Although the approaches used here may be applicable to nonnormal types of data, I do not consider this issue. Certainly, permutation tests in LMMs have been considered in the past [e.g., Anderson and Robinson, 2001; Anderson and Ter Braak, 2003], however these studies consider the case where either the “treatment” (e.g., genotypes) is assigned randomly by the researcher or the stochastic components of the model (i.e., the random effect plus the error terms) are independent. Neither of these situations generally holds true in the genetic analysis of a complex trait, where the researcher is not at liberty to assign genotypes at random and the polygenic effect generally results in nonindependence of the random effect. In addition to defining the LMM, I show how misspecification of the covariance matrix leads to an altered asymptotic distribution of the standard test statistic, and how different permutation approaches can be modeled through different forms of mis-specification of the covariance matrix. I discuss this issue further below. Finally, I also discuss what precisely should be permuted—phenotypes, residuals, or genotypes—and provide simulation results supporting the analytical findings.

Statistical Model

Here I define the statistical model used in the remainder of this article and the resultant likelihood. Given this model, I propose a standard test statistic that, under the right set of conditions, asymptotically follows a central chi-squared distribution with 1 degree of freedom ( χ12). Although it is not really necessary to use a permutation test when the distribution of the statistic is known, its analytical tractability allows for insights that also apply to more general cases. Given the model, I then define exchangeability and the conditions that are needed to ensure that exchangeability holds.

Given n subjects with phenotype data y = (y1, …, yn)t, where the superscript ()t indicates transpose; the n × p covariate data matrix X, which includes the intercept term; and the predictor of interest (e.g., genotypes) g = (g1, …, gn)t, the LMM is

y=Xβ+gγ+e, (1)

where β is the vector of parameters for the covariates, γ is the scalar parameter for the predictor of interest, and e ~ MVN(0, Σσ2) is an error term. The error term encompasses both a random effect and residual error e*, e = u + e*. The residual errors are distributed as independent normals with variance σe2. In the genetic context the random effect u will typically be the polygenic effect, and if we further assume that it is the sum of a large number of independent additive genetic effects in an outbred sample, the central limit theorem dictates that u is multivariate normally distributed with correlation matrix K [Lange, 1978] with the result extended to the case of inbreeding and dominance variance in Abney et al. [2000]. Although I do not assume a particular structure for Σ, a common parameterization in a genetic LMM is Σ = Kh2 + I(1 − h2), where K is an additive genetic relationship matrix (GRM), I is the identity matrix, and h2 is the narrow-sense heritability. The matrix K may be estimated from available genotype data or determined from a pedigree, in which case it is equal to 2Φ, where Φ is the matrix of kinship coefficients. The log likelihood of this model is

l=-n2log2π-12log-n2logσ2-12σ2(y-Xβ-gγ)t-1×(y-Xβ-gγ). (2)

The quantity of interest is the parameter γ, and under the null hypothesis γ = 0. To test against the alternative γ ≠ 0, the statistic T=γ^2Var(γ^), where γ̂ is the best linear unbiased estimator (BLUE, equivalently the maximum-likelihood estimator) of γ, has a χ12 distribution under the null hypothesis when σ2 is known. In practice, we use

T^=γ^Var^(γ^), (3)

where the estimated variance Var^(γ^) uses an estimator S2 in place of the true variance σ2. In an LMM approach S2=1n-p-1(y-y^)t-1(y-y^) and is an unbiased estimator of σ2. This results in being asymptotically χ12 distributed. However, in a genetic analysis Σ is not always known, leading to the question of what the distribution of is when Σ is misspecified.

In genetic analyses of complex traits, using an LMM with a misspecified covariance matrix is likely a common occurrence. Perhaps the simplest example of this is when unrelated individuals are unknowingly sampled from two populations, with different allele frequencies at the tested marker, but are assumed to be from a single population. If the trait is associated with population membership, we see an inflated false-positive rate. This sort of confounding is easily corrected by including either a covariate with an indicator of population membership or a block structured correlation matrix with elements equal to 1 when a pair is from the same population or 0 when they are not. At the other end of the population structure scale, misspecification may occur in family studies with a known pedigree when the pedigree is wrong or incomplete. In fact, even if the pedigree is known without error, misspecification exists when the kinship matrix (as computed from the pedigree) is used as the additive GRM because under a polygenic model the kinship coefficients give only the expected identity by descent (IBD) sharing across the genome whereas the correlation in phenotype values will be the result of the realized IBD sharing. In spite of this last form of misfit, the successful use of the kinship coefficient in the GRM over many decades of pedigree studies in both humans and animals suggests a degree of robustness to the use of the expected covariance in place of the realized covariance.

In order to quantify the effects of covariance matrix mis-specification on hypothesis testing in the LMM, in the Appendix I derive the distribution of the test statistic in the case where the incorrect matrix Ψ is used instead of Σ. I find that is asymptotically distributed as a scaled chi-squared distribution, T^η~χ12, where η is a constant. In the case of no covariates and assuming that both y and g have been centered by their mean values, η takes the form

η=ngtΨ-1Ψ-1ggtΨ-1gTr(Ψ-1).

The scalar η is, in essence, the genomic control [Bacanu et al., 2002; Devlin and Roeder, 1999] parameter, and its general analytical form in the asymptotic limit of very large sample sizes is in the Appendix. In any real dataset we do not know Σ and, hence, cannot know η, but having an analytical form will allow us to determine the degree of miscalibration in for hypothesized circumstances, as we will see below.

Confounding, Exchangeability, and Permutations

If we expect to perform a permutation test given purely observational data, we should also be concerned with the possibility of confounding. Consider the linear model y = μ + x1β1 + x2β2 + e, where e is an independent error and we wish to test the null hypothesis β2 = 0. In a designed experiment we can ensure that x2 has no confounders by random assignment of its values to each subject, and we can safely permute the labels of x2 to obtain a valid test. With purely observational data, however, x2 may be confounded with x1 due to unknown structure in the data, and permutations of the x2 subject labels would result in an invalid test. Note that joint permutation of x1 and x2 would be valid, but this strategy fails when x1 is not observed. This situation arises in genetic studies when there is population structure in the sample. In this case x2 would be genotype and x1 is a predictor that also depends on the population structure, for example, a polygenic effect or an indicator of population membership (there would be P − 1 such indicators for P populations in the sample) with each population having a distinct effect on the outcome y. If genotype x2 is dependent on the population structure, we would not want to naively permute all the subject labels of x2, as this would give an incorrect type 1 error rate. In this case, even if population membership is not recorded, it can often be inferred if there is sufficient genetic data.

Less well appreciated is that, given structured data, confounding can occur even when the predictor x2 is sampled independently from the unobserved structured-population predictor x1. Note that this independence is conditional given the covariance matrices (representing population structure) for x1 and x2 in the sense that they are vectors drawn independently from two distinct multivariate distributions each with a given covariance matrix. To understand this, consider an example where the subjects are connected by some pedigree with x2 being their genotypes at a marker that has no genetic effect and is not in linkage disequilibrium with any causal locus, and with x1 representing the polygenic effect. The genotype x2 and polygenic effect x1 necessarily have the same correlation matrix K induced by the pedigree. Thus, when we sample x1 and x2 independently from their distributions, having the same correlation matrix K, it is actually equivalent to x1 and x2 being conditionally independent given the population or pedigree structure. Marginally (i.e., unconditional on the underlying structure) x1 and x2 are correlated with each other. That is, similar genotype values will tend to match up with similar polygenic (and, hence, trait) values simply because these vectors have a similar correlation structure. It is this correlation between x1 and x2 that leads to confounding when ignoring x1 in the model. Note that this argument does not depend on whether K is the result of population structure resulting from a pedigree or the block diagonal form, with constant off-diagonals in each block, that results from assuming a population-specific genetic effect. Every genetic trait will depend on genotypes with some population structure correlation K, resulting in confounding when testing a genetic marker that also has correlation K, thus altering the type 1 error away from the expected amount unless the confounding is corrected for in the test [Newman et al., 2001]. Conversely, if the elements of either x2 or x1 are unconditionally independent—meaning either one has the identity as the covariance matrix—there will be no confounding of x2 with y. For instance, if the x2 genotypes were independent binomials as would be the case in an unstructured population, there would neither be inflation of the test statistic nor any problems with permuting the values of x2. Unfortunately, with observational data verifying the absence of confounding, and the permissibility of a permutation test, may not be possible.

Developing a permutation test for observational data requires assessing whether the permuted quantities are exchangeable. Quantities are exchangeable if, upon permutation of the labels of those quantities, their distribution function is unchanged [Bernardo and Smith, 2000, Sec. 4.2]. In particular, because we want to know the distribution of the test statistic under the null hypothesis, we require exchangeability when γ = 0. In an LMM, the natural quantities to permute are the residuals e = yXβ. In the Appendix I show that the residuals are exchangeable only in the special case where Σii = a and Σij = b, ij for some scalar constants a and b, where Σij is the i, j -th element of Σ. Note that because we do not in general know β or σ2 but must instead estimate them, even when Σ has an exchangeable structure permuting the residuals technically provides only an approximate permutation test, though the approximation tends to be very accurate [Anderson and Robinson, 2001].

In general, when using an LMM to model polygenic variation the matrix Σ will not have an exchangeable structure. Nevertheless, we might undertake a permutation test where the residuals are permuted under the assumption that the phenotype has an exchangeable correlation matrix Ψ rather than true correlation matrix Σ. The fundamental question is will these permutations give an unbiased estimate of the threshold for rejecting the null hypothesis at some specified false positive rate? To address this question exactly, we would need to understand the properties of the order statistics T(k), k = 1, …, n! under permutations of the residuals. Instead, I address this in an approximate, but more intuitive, approach by treating the statistics T(k) as samples from a distribution with covariance matrix that has an exchangeable structure. In the simulation results below, we will see that the empiric distribution we get by doing permutations closely matches the distribution obtained from assuming Ψ = I.

Simulations

The simulations are done in a sample of 1,415 Hutterite individuals with a known 13-generation pedigree [Abney et al., 2000]. Phenotypes for the sample are generated under the null model to have a mean of 3.0 and covariance matrix Σσ2 with Σ = 2Φh2 + I(1 − h2), with Φ the kinship coefficient matrix as computed from the pedigree and h2 the narrow-sense heritability. Genotypes are simulated by randomly assigning the founders of the pedigree a genotype from a biallelic marker with minor allele frequency of 0.3 and using Mendelian segregation to randomly determine the genotypes of all the other pedigree members.

First, I address the question of what happens when “naive” permutations are done. That is, the residuals under the null model are permuted regardless of whether the correlation matrix is exchangeable and the new phenotype (i.e., the covariate effects plus the permuted residuals) is put through the same LMM analysis as the original data. More precisely, we assume the null model y = Xβ + e where e ~ MVN(0, σ2Σ). That is, analyses done under the null model use exactly the same model as that used to generate the data. Using Equation (2) with γ = 0, I first fit the null model and obtain maximum-likelihood estimates for the parameters, β̂0, h^02,σ^02. Using generalized least squares (GLS), I test the null hypothesis γ = 0 against the alternative γ ≠ 0 using the test statistic (Eq. (3)) computed under the alternative model,

y=Xβ+gγ+e,wheree~MVN(0,σ2^0)and^0=2Φh^02+I(1-h^02). (4)

Note that is necessarily asymptotically distributed as a χ12 because Σ̂0 asymptotically converges to the true correlation matrix Σ. Also note that σ2 in Equation (4) is estimated by the sample variance when computing . I want to compare this asymptotic distribution with the empirical distribution one obtains by doing naive permutations. To do this, I first obtain the estimated residuals under the null, ê = yXβ̂0. I permute these residuals to obtain êπ1 and a new phenotype vector yπ1 = Xβ̂0 + êπ1. Under the alternative model of Equation (4) but with yπ1 in place of y, I obtain a test statistic T^π1v, where the v superscript indicates the use of naive permutations. I repeat this process L = 104 times to obtain T^π1v,,T^πLv. If doing naive permutations were to provide the correct empiric distribution for our original test statistic , then the samples T^π1v,,T^πLv should follow a χ12 distribution.

As shown in Figure 1 the empiric distribution clearly fails to follow the desired asymptotic distribution. The reason for this is that the permutations fail to maintain the correlation structure of the original phenotype data. As discussed above, this form of permutation would give an accurate distribution only when the true correlation matrix of the estimated residuals has an exchangeable structure. We can use the methods in the Appendix to quantify the inaccuracy of the empiric permutation distribution. We can model the statistics T^π1v,,T^πLv as coming from the distribution ηχ12 that results from assuming the incorrect correlation matrix Ψ = I rather than the correlation matrix Σ̂0. Using the theoretically computed value of η, as given in the Appendix, and plotting T^π1v/η,,T^πMv/η against a χ12 in Figure 1 B we see that the distributions match well. From this, we see that computing the significance of from { T^πiv} would lead to an anticonservative estimated level of significance. For instance, to get nominal levels of significance of 10−4 and 10−5, the permuted distribution would select threshold levels of 12.7 and 16.3, respectively. Because the actual distribution of the test statistic is χ12, the observed type 1 error rates would be 3.7 × 10−4 and 5.4 × 10−5, respectively, a substantial inflation.

Figure 1.

Figure 1

QQ plots under naive phenotype residual permutations. In both plots the expected quantiles are for a χ12 distribution and the shaded area is the 95% confidence region. (A) The observed quantiles are the values of the test statistic under permutations of the trait values. (B) The observed quantiles are the values in (A) divided by the theoretical inflation factor.

Though it is not possible to do a exact permutation test when the residuals have a nonexchangeable correlation matrix, it is possible to do a valid permutation-based test. The approach (referred to here as MVNpermute) is described in Abney et al. [2002] and the Appendix and it relies on the fact that there exists a linear transformation of the residuals that results in a vector (i.e., the transformed residuals) whose covariance matrix is proportional to the identity matrix, and is therefore exchangeable. Because MVNpermute is based on permutations of an invertible transformation of the phenotype residuals, all structure in the genotype data (e.g., linkage disequilibrium, allele frequencies) is preserved. Inverting the transformation following permutations then results in new simulated datasets that maintain the structure in the entire original data (i.e., phenotype correlations and genotype structure).

I repeated the above simulations but with the permutations generated using MVNpermute. This gave statistics T^π1M,,T^πLM, where the M superscript indicates the use of MVNpermute. As shown in Figure 2, this results in statistics that follow the expected distribution. That is, by first decorrelating the residuals—ensuring exchangeability for a normally distributed trait—permutations allow us to estimate the proper threshold for a given false-positive rate. In practice, MVNpermute is not necessary to determine the P-value at a single SNP, but obtaining L permutation-based datasets {yπi} allows us to do an empiric multiple testing correction to determine genome-wide significance, for instance [Abney et al., 2002].

Figure 2.

Figure 2

QQ plot of the MVNpermute method. The observed quantiles are the values of the test statitic from 10,000 MVNpermutations, whereas the theoretical quantiles are those from a χ12 distribution. The shaded region is the 95% confidence bounds.

It is not unusual to realize that permuting the phenotypes (or rather the residuals) does not result in a valid permutation test when individuals are related. A possible alternative is to permute the genotypes instead of the phenotypes. A rationale is that the LMM inference is based on the conditional distribution of the phenotypes given the genotypes. As the phenotypes remain fixed while new genotypes get assigned to individuals via permutation, the correlation structure in the phenotypes is preserved, resulting in a valid permutation test. In fact, many statistics used in complex trait mapping assume a distribution that is conditional on the genotype data, even if the form of the test statistic distribution is not known. Permuting the genotype data, then, to estimate this distribution seems a natural approach.

To understand the consequences of a genotype permutation procedure on an arbitrary test statistic, let us first consider the standard ordinary least squares (OLS) statistic,

Tr=γVar(γ) (5)
(βγ)=(MtM)-1M-ty (6)
M=(Xg), (7)

where ·̃ indicates estimation with a scaled covariance matrix Ψ rather than Σ (in this case Ψ = I). That is, the statistic we use does not explicitly account for the polygenic effect. We may be aware that relatedness in our sample will result in r not being χ12 distributed and, thus, perform genotype permutations to obtain the empiric distribution of r. To see if genotype permutations recover the correct distribution, I performed L = 104 permutations of the genotype data while keeping the phenotype data constant to obtain {r,π1, …, r,πL} where π1, …, πL index the permutations. I then compare this to a sample from the true null distribution that I obtain by performing L gene dropping simulations, r,1, …, r,L. The results in Figure 3 show that the distribution obtained by genotype permutations is highly deflated relative to the distribution from gene dropping. Using this approach to obtain an empiric threshold of significance would result in a highly inflated false-positive rate.

Figure 3.

Figure 3

QQ plot of the empiric null distribution for the OLS statistic against the expected null distribution. The expected null distribution is a sample obtained by doing gene dropping. The solid line is the y = x line.

The source of the problem with genotype permutations can be understood by returning to the notion of confounding. Because genotypes were not randomly assigned by the researcher to subjects, the absence of confounding is not guaranteed. In fact, the subject genotypes are correlated as a consequence of Mendelian segregation and all markers in the genome, whether causative or not, share the same pedigree for a given set of individuals. Hence, the covariance of the marker being tested is equal (up to a scalar constant) to the covariance of the polygenic effect. If genotypes are permuted, the similarity of the covariance structures of phenotype and genotype will not be preserved. Thus, I expect that the null distribution of an arbitrary test statistic, not just the OLS statistic, will not be correctly estimated by genotype permutations, in general.

Although the null distribution of an arbitrary test statistic cannot be inferred from genotype permutations, the null distribution of the GLS statistic can be. That is, if instead of using r we use as defined in Equation (3) and do the genotype permutation procedure as described above, we find the permutation distribution of matches the gene dropping permutation distribution (data not shown). We can understand this result by looking at the definition of the GLS statistic . We can view the GLS statistic as the OLS statistic computed on the data following a decorrelation step. That is, if we define Σ1/2 as the symmetric square root matrix of Σ and z = Σ−1/2y, W = Σ−1/2X, f = Σ−1/2g, ε = Σ−1/2e, then we obtain the linear model z = Wβ + fγ + ε, with ε ~ MVN(0, Iσ2). The GLS statistic on the original data is equivalent to the OLS statistic on the decorrelated data z. Because our new trait data z are normally distributed and uncorrelated, they are independent and can no longer be confounded with the genotypes under the null hypothesis. In the absence of confounding, then, permuting the genotypes recovers the true null distribution of the test statistic .

Software

The MVNpermute algorithm is implemented in the R programming language and is available for download from the Comprehensive R Archive Network (http://cran.r-project.org) as the “MVNpermute” package.

Discussion

The fundamental challenge with performing a permutation test is ensuring exchangeability in the permuted quantities. In a genetic association test, it is generally not possible to do an exact permutation test when the trait under study has a polygenic component. The reason is that confounding due to population structure exists between the genotype being tested and the unknown polygenic effect, both of which have similar covariance structures. Only when all individuals are equally related, as in an F2 cross [Churchill and Doerge, 1994], will a naive permutation approach obtain the correct type 1 error rate. Nevertheless, with an accurate estimate of the trait covariance structure, it may be possible to remove the confounding and perform a valid permutation test. I have described an approach we have previously proposed [Abney et al., 2002] for removing the correlation in the phenotype residuals and shown that it generates the correct null distribution. Strictly, the method is valid when the phenotype data are multivariate normally distributed, where removing the correlation is sufficient to ensure exchangeability. Another permutation approach was proposed by Aulchenko et al. [2007]. They estimate the polygenic effect and obtain estimates of the residual error term. Although, under multivariate normality, the residual errors in Equation (1) are exchangeable, the estimated residual errors, in general, will not be. Nevertheless, this may be a case of “close enough,” allowing for a reasonably accurate estimate of significance thresholds, though I have not investigated this question.

Other resampling strategies are possible, though they also have limitations. Gene dropping is one such approach. In this strategy one simulates the Mendelian segregation of the founder genotypes through all descendents. Because Mendelian segregation is random and independent of the phenotype, it provides a valid distribution of the test statistic under the null hypothesis. The primary difficulties with gene dropping is the need for a complete pedigree and knowledge of the founder genotypes. If the pedigree is known, but the founder genotypes are not, it may be possible to reconstruct, or simply guess, them from the available data. Doing so, however, runs the risk of introducing unknown biases as the observed genotypes may be confounded with the phenotypes. On the other hand, if the pedigree is not known, gene dropping is simply not feasible.

Instead of gene dropping we might try to permute genotypes, leaving the covariance structure of the phenotype intact. As discussed above, for an arbitrary test statistic this does not necessarily result in a valid test as permutations of the genotypes will not preserve their covariance structure. In addition, applying a “decorrelating” transformation to the genotypes is not sufficient to ensure their exchangeability because, unlike the multivariate normal distribution of the phenotype, the joint distribution of the genotypes has higher-order dependencies. Nevertheless, in the case of a multivariate normal phenotype being analyzed with a linear mixed model, the standard test statistic naturally transforms the trait data to being independent. Once independent, any sets of dependent or independent genotypes, including permutations of the original ones, can be used to recover the correct null distribution of the test statistic. This approach has been used in mouse cross data to obtain proper genome-wide significance levels [Cheng et al., 2013; Cheng and Palmer, 2010], and more recently in humans [Zhang et al., 2014]. It seems likely that any test statistic that removes the correlation in the phenotype data and does not depend on Mendelian segregation under the null hypothesis would allow genotype permutations to be valid, though I have not investigated this further. An additional caveat arises, however, with genotype permutations when there are other covariates. In particular, if any of the covariates are associated with the genotype, genotype permutations may estimate the incorrect null distribution. That is, it will give the null distribution for when the covariate and genotype are not associated rather than for when they are. This may arise, for instance, when a covariate is itself a genetic trait, or when it is a principal component vector obtained from a population structure analysis. It may also arise when testing effects such as gene-by-environment interaction. In this situation the null model has a nonzero genetic main effect. In general, an approximate permutation test of interaction effects is done by computing and permuting outcome residuals [Anderson and Ter Braak, 2003] as done by MVNpermute. Additional work is needed to understand this effectiveness and validity of MVNpermute and genotype permutations in the presence of genetic interaction terms and of genotype permutations when other genetic predictors are in the null model.

Another resampling strategy is the parametric bootstrap. In this approach one assumes the phenotypes follow a particular parametric distribution with parameter values equal to those estimated from the observed data under the null hypothesis. Samples are then drawn from this distribution and a test statistic computed for each sample, thus obtaining an empiric null distribution. For instance, one might assume the phenotype follows a multivariate normal distribution with fixed effect parameters and variance components estimated by maximum likelihood under the null model. Drawing many phenotypes from this distribution and testing the genotype at an SNP against each randomly drawn phenotype provides a null distribution for the test statistic. This approach relies on the parametric distribution accurately representing the observed data. Insofar as the data deviate from the assumed distribution, biases in the estimated significance threshold may ensue. A true permutation test has the advantage of not needing to make such parametric assumptions. The MVNpermute method also relies on certain distributional assumptions. Namely, that exchangeability under the null is determined by the structure of the correlation matrix. Intuitively, this assumption appears weaker than those used in a parametric bootstrap, suggesting that there may be greater robustness to the permutation-based approach, though I have not investigated this question.

The analyses I performed here were based on using a statistic known to asymptotically follow a χ12 distribution. This allowed me to easily show the invalidity of particular permutation procedures. In practice, one would not need a permutation test for such a statistic, but the lessons extend to other statistics as well. For instance, we might want to determine statistical significance after correcting for multiple correlated tests, as when doing a genome-wide scan or a scan over a smaller region. In this case the statistic of interest would be the maximum over all statistics in the scan. Similarly, statistics that jointly combine information across SNPs or use phenotype dependent weights, for which there may not be a clear generative model, may not have a known distribution under the null hypothesis. Situations such as these would benefit from a permutation test, if one exists. The presence of polygenic variation may make a true permutation test difficult or impossible, but a permutation-based test may be achievable by carefully considering the sources of correlation, or nonexchangeability, in the data. Hopefully, the examples and discussion I provided here will help bring insight into the development of such tests.

Acknowledgments

I would like to thank Abraham Palmer, Riyan Cheng, Peter Carbonetto, and Lei Sun for repeatedly raising this topic with me and motivating me to write this manuscript. Peter Carbonetto also provided helpful comments on the MVNpermute R code. This work was supported by NIH grant HG002899.

Appendix A

Distribution with a Misspecified Covariance Matrix

The LMM of the main text is y = Xβ + gγ + e, where e ~ N(0, Σσ2). The asymptotic distribution of the test statistic T^=γ^2/Var^(γ^) is χ12 under the null hypothesis, but only if we use the correct scaled covariance matrix Σ in our estimate of γ̂ and its variance. If we misspecify this matrix, the distribution of becomes a scaled chi-squared distribution η·χ12 with the scalar η depending on the amount of misspecification. To determine η we can derive and its distribution assuming that we have used the incorrect covariance matrix Ψσ2 in place of the correct Σσ2. First, we obtain the BLUE for γ. If we let Ψ1/2 be the symmetric positive definite square root matrix of Ψ and define

z=Ψ-1/2yW=Ψ-1/2Xf=Ψ-1/2gε=Ψ-1/2eHW=W(WtW)-1Wt (A.1)

then the BLUE for γ is

γ=[ft(I-HW)f]-1ft(I-HW)z,

where ·̃ indicates an estimate using Ψ rather than Σ. The variance of this estimator is

Var(γ)=[ft(I-HW)f]-1ft(I-HW)Θ(I-HW)×f[ft(I-HW)f]-1σ2. (A.2)

where Var(z) = Θσ2 = Ψ−1/2ΣΨ−1/2σ2. The statistic

T=γ2Var(γ)=(ft(I-HW)z)2ft(I-HW)Θ(I-HW)fσ2 (A.3)

is χ12 because z is multivariate normal. The statistic T, however, is not an adequate test statistic because it depends on the unknown matrix Σ and the unknown parameter σ2. In practice we use the test statistic

T=γ2Var(γ)=(ft(I-HW)z)2ft(I-HW)fS2, (A.4)

where in place of σ2 we have the sample variance

S2=1n-p-1(z-z)t(z-z),

where = HAz = A(AtA)−1Atz and A = (W f) = Ψ−1/2M, M = (X g). It is straightforward to show that if Ψ = Σ then the expectation E(S2) = σ2. In general, however, we have,

E(S2)=1n-p-1Tr((I-HA)Θ)σ2=σ2n-p-1[Tr(Ψ-1)-Tr([MtΨ-1M]-1×MtΨ-1Ψ-1M)]. (A.5)

The result of misspecifying the covariance matrix is given by the following lemma.

Lemma 1

Let y ~ N(Xβ + gγ, Σσ2) with Σ nonnegative definite and Ψ be some symmetric nonnegative definite matrix. Define Θ = Ψ−1/2ΣΨ−1/2, M = (X g), with z, f, HW, Θ as in Equation (A.1) and T̃ as in Equation (A.4). If λ1Ψ−1) = o(n1/2), where λ1Ψ−1) is the largest eigenvalue of matrix ΣΨ−1 (equivalently the largest eigenvalue of Θ), then as n → ∞

Tηχ12indistribution,whereη=(n-p-1)ft(I-HW)Θ(I-HW)fft(I-HW)f[Tr(Ψ-1)-Tr([MtΨ-1M]-1MtΨ-1Ψ-1M)]. (A.6)
Proof

We can write T=(γ2/Var(γ))×(Var(γ)/Var(γ)), which is the product of a χ12 random variable, as given in Equation (A.3), and the ratio of the true to estimated variance of γ̃. The true variance is given by Equation (A.2), but the estimated variance is

Var(γ)=[ft(I-HW)f]-1S2,whereS2=1n-p-1zt(I-HA)z.

Thus, the test statistic is

T=γ2Var(γ)×ft(I-HW)Θ(I-HW)ff(I-HW)f×σ2S2,

the product of a χ12 random variable and the quantity

η=ft(I-HW)Θ(I-HW)ff(I-HW)f×σ2S2.

If as n → ∞Var(S2) → 0, then σ2S2σ2E(S2) in probability, where E(S2) is given by Equation (A.5). Thus, we obtain Equation (A.6) of Lemma 1 when Var(S2) → 0. We can obtain a sufficient condition for Var(S2) → 0 by considering

Var(S2)=1(n-p-1)2Var(zt(I-HA)z)=2σ4(n-p-1)2Tr(Θ(I-HA)Θ(I-HA)).

Thus, Var(S2) → 0 when Tr(Θ(IHA)Θ(IHA)) = o(n2).

Now, consider the following eigenvalue result [Zhang, 2011, Theorem 8.12, p. 274]. Let λi(P) be the ith eigenvalue for some n × n matrix P ordered such that λ1(P) ≥ λ2(P) ≥ … ≥ λn(P). Then, for any n × n nonnegative definite, Hermitian matrices P, Q

λi(P)λn(Q)λi(PQ)λi(P)λ1(Q).

It follows that λi(Θ[IHA]) ≤ λi(Θ) because Θ is symmetric, nonnegative definite and IHA is symmetric and idempotent with all eigenvalues equal to 0 or 1. Furthermore, Θ[IHA] is nonnegative definite because λi(Θ[IHA]) ≥ λi (Θ)λn(IHA) ≥ 0, and thus λi(Θ[IHA])2λi(Θ)2. Recalling that the trace of a matrix is the sum of the eigenvalues we have Tr[Θ(IHA)Θ(IHA)] ≤ Tr[Θ2]. If we define B = ΣΨ−1, then Tr(Θ2) = Tr(B2). Hence,

Var(S2)=2σ4(n-p-1)2Tr(Θ(I-HA)Θ(I-HA))2σ4(n-p-1)2Tr(B2)2σ4(n-p-1)2n[λ1(B)]2.

Thus, a sufficient condition for Var(S2) → 0, and hence σ2/S2σ2/E(S2) in probability, is λ1Ψ−1) = o(n1/2).

Note that a possibly tighter sufficient condition is Tr(B2)=i=1nλi(B)2=o(n2).

Exchangeability of the Multivariate Normal Distribution

Given a random vector y = (y1, …, yn) distributed as a multivariate normal f(y) = N(μ, Σ), under what conditions are the elements of y exchangeable? If we let P be a permutation matrix so that Py is a permutation of the elements of y, then y is exchangeable when f(y) = f(Py) for every permutation matrix P [Bernardo and Smith, 2000, Sec. 4.2]. Taking the log of both sides, this reduces to,

(y-μ)t-1(y-μ)=(Py-μ)t-1(Py-μ),

which implies the condition,

yt(Pt-1P--1)y+2μt-1(Py-y)=0. (A.7)

In order for Equation (A.7) to hold for every y and P, each of the two terms must be zero. If the first term is to be zero for all y and P, then PtΣ−1P = Σ−1. This condition is met if and only if all the diagonal elements Σii = v, for some constant v, and all the off-diagonals Σij = , for some constant ρ. To set the second term to zero we first note that for some vector w, the condition wt(Pyy) = 0 implies w = α1, for some constant α and where 1 = (1, …, 1)t. Hence, given the structure of Σ we already determined, the second term is zero whenever μ= (μ, …, μ)t for some constant μ.

In the text, the null model corresponds to y ~ N(Xβ, Σ). Let us assume the Σ has an exchangeable structure. In general, however, the vector Xβμ1 for some fixed μ and y is still not exchangeable. The vector e = yXβ, though, does satisfy the requirements for exchangeability, and a permutation test can be based on permutations of the residuals. In practice, the vector β is unknown and must be estimated, resulting in estimated residuals ê. Permuting the estimated residuals, then, results in only an asymptotically exact permutation test when Σ has an exchangeable structure.

MVNpermute Algorithm

The permutation-based algorithm was originally presented in Abney et al. [2002, pp. 926–927] and I review it here for completeness. Assume the outcome y follows the model as given in Equation (1) in the main text, and let the errors e have known covariance matrix Ω = Σσ2. In practice, this matrix may not be known, in which case a consistent estimator will maintain the asymptotic properties of the permutation based procedure. For instance, a maximum-likelihood estimate under the null model (γ = 0) could be used. The residuals under the null model ê0 = yXβ̂0, where β0 = (XtΩ−1X)−1XtΩ−1y, have covariance matrix V* = Ω − X(XtΩ−1X)−1Xt.

The goal, then, is to transform the residuals, which are not exchangeable, to a new vector whose elements are exchangeable. We can accomplish this by premultiplying Equation (1) by Ct where C is given by the Cholesky de-compostion Ω = CtC. The resulting model under the null hypothesis γ = 0 is z = Wβ + ε where z = Cty, W = CtX, and ε= Cte. The covariance matrix of the residuals ε̂ = zWβ̂0 is V = IW(WtW)−1Wt. Note that V is symmetric and idempotent (i.e., V2 = V) and if X is of rank p then V has rank np. By the spectral theorem we can make the decomposition V = UΛUt, where Λ is a diagonal matrix with the first np elements equal to the eigenvalue 1 and the last p elements equal to the eigenvalue 0, and U is the matrix whose columns are eigenvectors. Let U = (U1 U0), where U1 is the matrix whose np columns are the eigenvectors associated with eigenvalue 1. Then, we have V=U1U1t and U1tU1=In-p. The vector ξ=U1tε^ has covariance matrix U1tVU1=In-p and its elements, under the assumption of multivariate normality of the residuals, are exchangeable. The elements of ξ are now permuted to obtain ξπ = Pξ where P is a permutation matrix, and then transformed by U1 to get ε̂π = U1ξπ. Note that I use the convention that “π” used as a superscript denotes the variable is permuted, whereas “π” used as a subscript denotes that the variable is derived from permuted and nonpermuted quantities. A new shuffled dataset obtained from the permutation is

yπ=Xβ^0+Ctε^π=Xβ^0+CtU1PU1tC-te^0.

The MVNpermute algorithm is coded as an R function that takes as input the outcome vector y, matrix of covariates X, assumed covariance matrix Ω, and the desired number of permutations. The output is a matrix with columns being the permutation-based outcome vectors. The MVNpermute function is available as a download from CRAN.

Footnotes

The author has no conflicts of interest to declare.

References

  1. Abney M, McPeek MS, Ober C. Estimation of variance components of quantitative traits in inbred populations. Am J Hum Genet. 2000;66:629–650. doi: 10.1086/302759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Abney M, Ober C, McPeek MS. Quantitative-trait homozygosity and association mapping and empirical genomewide significance in large, complex pedigrees: fasting serum-insulin level in the Hutterites. Am J Hum Genet. 2002;70:920–934. doi: 10.1086/339705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Allison DB, Heo M, Kaplan N, Martin ER. Sibling-based tests of linkage and association for quantitative traits. Am J Hum Genet. 1999;64:1754–1763. doi: 10.1086/302404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Anderson M, Robinson J. Permutation tests for linear models. Aust N Z J Stat. 2001;43:75–88. [Google Scholar]
  5. Anderson MJ, Ter Braak CJF. Permutation tests for multi-factorial analysis of variance. J Stat Comput Simul. 2003;73:85–113. [Google Scholar]
  6. Aulchenko YS, de Koning DJ, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007;177:577–585. doi: 10.1534/genetics.107.075614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bacanu SA, Devlin B, Roeder K. Association studies for quantitative traits in structured populations association studies for quantitative traits in structured populations association studies for quantitative traits in structured populations. Genet Epidemiol. 2002;22:78–93. doi: 10.1002/gepi.1045. [DOI] [PubMed] [Google Scholar]
  8. Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol. 2011;35:606–619. doi: 10.1002/gepi.20609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bernardo JM, Smith AFM. Bayesian Theory. New York: Wiley; 2000. [Google Scholar]
  10. Bourgain C, Genin E. Complex trait mapping in isolated populations: are specific statistical methods required? Eur J Hum Genet. 2005;13:698–706. doi: 10.1038/sj.ejhg.5201400. [DOI] [PubMed] [Google Scholar]
  11. Cheng R, Palmer AA. A simulation study of permutation, bootstrap and gene dropping for assessing statistical significance in the case of unequal relatedness. Genetics. 2013;193:1015–1018. doi: 10.1534/genetics.112.146332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cheng R, Lim JE, Samocha KE, Sokoloff G, Abney M, Skol AD, Palmer AA. Genome-wide association studies and the problem of relatedness among advanced intercross lines and other highly recombinant populations. Genetics. 2010;185:1033–1044. doi: 10.1534/genetics.110.116863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics. 1994;138:963–971. doi: 10.1093/genetics/138.3.963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Churchill GA, Doerge RW. Naive application of permutation testing leads to inflated type I error rates. Genetics. 2008;178:609–610. doi: 10.1534/genetics.107.074609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
  16. Epstein MP, Duncan R, Jiang Y, Conneely KN, Allen AS, Satten GA. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet. 2012;91:215–223. doi: 10.1016/j.ajhg.2012.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Fang S, Sha Q, Zhang S. Two adaptive weighting methods to test for rare variant associations in family-based designs. Genet Epidemiol. 2012;36:499–507. doi: 10.1002/gepi.21646. [DOI] [PubMed] [Google Scholar]
  18. Greco B, Luedtke A, Hainline A, Alvarez C, Beck A, Tintle NL. Application of family-based tests of association for rare variants to pathways. BMC Proc. 2014;8(Suppl 1):S105. doi: 10.1186/1753-6561-8-S1-S105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kazma R, Bailey JN. Population-based and family-based designs to analyze rare variants in complex diseases. Genet Epidemiol. 2011;35(Suppl 1):S41–S47. doi: 10.1002/gepi.20648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Lange K. Central limit theorems for pedigrees. J Math Biol. 1978;6:59–66. [Google Scholar]
  21. Lin DY, Tang ZZ. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89:354–367. doi: 10.1016/j.ajhg.2011.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu Q, Nicolae DL, Chen LS. Marbled inflation from population structure in gene-based association studies with rare variants. Genet Epidemiol. 2013;37:286–292. doi: 10.1002/gepi.21714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Newman DL, Abney M, McPeek MS, Ober C, Cox NJ. The importance of genealogy in determining genetic associations with complex traits. Am J Hum Genet. 2001;69:1146–1148. doi: 10.1086/323659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Peirce JL, Broman KW, Lu L, Chesler EJ, Zhou G, Airey DC, Birmingham AE, Williams RW. Genome reshuffling for advanced intercross permutation (GRAIP): simulation and permutation for advanced intercross population analysis. PLoS One. 2008;3:e1977. doi: 10.1371/journal.pone.0001977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol. 2012;36:561–571. doi: 10.1002/gepi.21649. [DOI] [PubMed] [Google Scholar]
  26. Tintle N, Aschard H, Hu I, Nock N, Wang H, Pugh E. Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 genomes project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17. Genet Epidemiol. 2011;35(Suppl 1):S56–S60. doi: 10.1002/gepi.20650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zhang F. Matrix Theory: Basic Results and Techniques. 2. New York: Springer; 2011. [Google Scholar]
  28. Zhang Q, Wang L, Koboldt D, Boreki IB, Province MA. Adjusting family relatedness in data-driven burden test of rare variants. Genet Epidemiol. 2014;38:722–727. doi: 10.1002/gepi.21848. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES