Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jan 1.
Published in final edited form as: Genet Epidemiol. 2014 Jan;38(1):51–59. doi: 10.1002/gepi.21778

A Versatile Omnibus Test for Detecting Mean and Variance Heterogeneity

Ying Cao 1,2,#, Peng Wei 1,2,#, Matthew Bailey 3, John S K Kauwe 3, Taylor J Maxwell 1,*, for the Alzheimer’s Disease Neuroimaging Initiative
PMCID: PMC4019404  NIHMSID: NIHMS580943  PMID: 24482837

Abstract

Recent research has revealed loci that display variance heterogeneity through various means such as biological disruption, linkage disequilibrium (LD), gene-by-gene (GxG), or gene-by-environment (GxE) interaction. We propose a versatile likelihood ratio test that allows joint testing for mean and variance heterogeneity (LRTMV) or either effect alone (LRTM or LRTV) in the presence of covariates. Using extensive simulations for our method and others we found that all parametric tests were sensitive to non-normality regardless of any trait transformations. Coupling our test with the parametric bootstrap solves this issue. Using simulations and empirical data from a known mean-only functional variant we demonstrate how linkage disequilibrium (LD) can produce variance-heterogeneity loci (vQTL) in a predictable fashion based on differential allele frequencies, high D’ and relatively low r2 values. We propose that a joint test for mean and variance heterogeneity is more powerful than a variance only test for detecting vQTL. This takes advantage of loci that also have mean effects without sacrificing much power to detect variance only effects. We discuss using vQTL as an approach to detect gene-by-gene interactions and also how vQTL are related to relationship loci (rQTL) and how both can create prior hypothesis for each other and reveal the relationships between traits and possibly between components of a composite trait.

Keywords: Linkage Disequilibrium, vQTL, rQTL, GxG, GxE, GWAS

INTRODUCTION

Most statistical tests for single locus association with quantitative traits look for mean differences between genotypes or alleles (via average excess in additive models [Álvarez-Castro and Yang 2012]). Most of these linear models assume that the factors have equal variances. There are a few of scenarios where deviations from this assumption not only violate the model but also can be biologically meaningful and result in identifying important loci. Typical genome-wide association studies (GWAS) rely on linkage disequilibrium (LD) to identify physical regions of association with a trait assuming that a marker associated with the trait is due to LD with one or more functional variants in close proximity. It generally assumes that this association will be made due to mean differences; however, even if the functional variant has no variance heterogeneity, the locus in LD with it will likely have an inflation of variance within its genotypes due to being a mixture of genotypes from the functional variant [Balding 2009]. Under certain circumstances, this can lead to variance heterogeneity.

A functional locus may also have different variances across genotypes whether or not there are differences in genotypic means. A number of papers have focused on variance heterogeneity as a result of gene-by-gene (GxG) or gene-by-environment (GxE) interactions [Deng and Pare 2011; Paré et al., 2010; Struchalin et al., 2010]. These loci with variance heterogeneity are referred to as variance-heterogeneity quantitative trait loci (vQTL) [Rönnegård and Valdar 2012]. If a genotypic effect at one locus were dependent on the genotype of another locus, the variance of that genotype would derive from a composite of multiple distributions with different means, even if the variances were the same, resulting in an inflated variance for that genotype. These papers suggest that vQTL are an avenue to identifying loci involved in GxG. Loci with heterogeneous variance may also be related to gene-by-environment (GxE) interactions with similar consequences as the GxG cases.

Here we present a method which tests for differences in genotypic means and variances simultaneously, while allowing adjustment for covariates. We present a description of the method and some analytical results and simulations based on some of the scenarios along with comparisons to other basic tests for differences in means alone, variances alone, and tests for both. Using data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), we present a real world example of the inflated variance due to LD with a known functional variant. This cases study uses a non-synonymous variant in the matrix metallopeptidase 3 (MMP3) gene and surrounding markers in LD that are very strongly associated with MMP3 protein levels in cerebrospinal fluid (CSF).

MATERIALS AND METHODS

A new omnibus likelihood ratio test for both mean and variance heterogeneity

Assume n subjects are genotyped. For subject i = 1, … n, let yi denote the quantitative phenotype value and let Gi = 0, 1, or 2, denote the genotype for the SNP of interest, corresponding to major allele homozygous, heterozygous, and minor allele homozygous, respectively. We further define dummy variables Xi1 = I(Gi=1) and Xi2 = I(Gi=2). Let Zi = (Zi1, … , Zim) denote the m covariates, such as sex, age, and principle components capturing population substructure. To allow both mean and variance differences across genotype groups, we have the following regression model:

yi=α0+Ziα+Xi1β1+Xi2β2+εi,εi~N(0,σGi2). (M1)

Simultaneously testing mean and variance difference is equivalent to testing the null hypothesis H0: β1 = β2 = 0 and σ02=σ12=σ22, and the alternative hypothesis is Hα : at least one of “=”does not hold. We propose to use the likelihood ratio test (LRT) to test model M1 against the null model:

yi=α0+Ziα+εi,εi~N(0,σ2). (M2)

In matrix notation, model M1 is y = + ε, where y = (y1, … , yn)′, the ith row of the design matrix C is (1, Zi, Xi1, Xi2), γ = (α0, α′, β1, β2)’ and ε = (ε1, … , ε)′ ~ with V being diagonal matrix (σG12,,σGn2). Therefore, y~N(Cγ, V). The log-likelihood of y under Mi is

l(γ,V)=12{nlog(2π)+logV+(yCγ)V1(yCγ)} (1)

Taking partial derivative of l(γ, V) with respect to γ, we have

lγ=CV1Cγ+CV1y.

Equating it to 0, we can see (1) is maximized over γ for any fixed V: γ~=(CV1C)1CV1y, which is also the generalized least squares (GLS) estimator of γ. The maximum likelihood estimate (MLE) of V can be found by maximizing the profile log-likelihood for V, obtained from plugging γ~ in (1):

lP(V)=12{nlog(2π)+logV+(yCγ~)V1(yCγ~)}=12{logV+yV1[IC(CV1C)1CV1]y}n2log(2π)

The MLE of γ is then γ^=(CV^1C)1CV^1y, where V^ is the MLE of V. The LRT for both mean and variance difference is LRTMV=2{l(γ^M2,V^M2)l(γ^M1,V^M1)}, where the MLE of γ and V are obtained under the full model M1 and the null model M2, respectively. For large sample size n, LRTMV approximately follows χ2(4) distribution. Similarly, for variance difference only or mean difference only, we test model M1 against

yi=α0+Ziα+Xi1β1+Xi2β2+εi,εi~N(0,σ2), (M3)
oryi=α0+Ziα+εi,εi~N(0,σGi2), (M4)

respectively. The LRTs are LRTV=2{l(γ^M3,V^M3)l(γ^M1,V^M1)} and LRTM=2{l(γ^M4,V^M4)l(γ^M1,V^M1)}, both of which approximately follow χ2(2) for large n. If we are willing to assume an additive mean effect model, a one degree-of-freedom LRTM test can be performed. The above models are described in terms of discrete genotypes; however, they can easily be modified to accommodate additive models for both means and variances as well as dominance for the mean. Specifically, If we assume that in the full model M1 the variance depends on the genotype as a linear function of the number of minor alleles/imputed dosage, i.e., additive model for the variance, we have y~N(, V), where V=σ02×diag(1,1,,1)+δ×diag(G1,G2,,Gn) and Gi is the number of minor alleles the ith subject carries. It follows that the variance of yi is, respectively, σ02, σ02+δ or σ02+2δ, for major allele homozygous, heterozygous, or minor allele homozygous. δ can be either positive or negative; however, we impose a constraint δ>σ022 such that the variance of yis are all positive. The corresponding LRTV tests against the null model υ~N(Cγ, V), where V=σ02×diag(1,1,,1), leading to a χ2(1) test as the full and null models differ by one free parameter δ. It is noted that, should an additive model for the variances be desired, an additive model for the means should also be assumed with the mean model modified accordingly. We have implemented the above tests in R code found via the link at the end of the article along with a file of examples implementing the functions.

Parametric bootstrap LRT

LRTV is closely related to Bartlett’s test for equality of variances [Bartlett 1954],which is well known to be not robust to violation of the normality assumption, even subtle deviation from the normal distribution [Conover et al., 1981; Struchalin et al., 2010]. We also observe in our simulation study that for non-normal quantitative traits, LRTMV and LRTV can have inflated Type I error, while LRTM still controls Type I error satisfactorily. In light of the superior performance of the proposed LRTs when the normality assumption does hold, we propose the following parametric bootstrap-based LRT procedure for non-normal traits. The parametric bootstrap is widely used in genetics when the null distribution of the test statistic is unknown and covariates are present [Bůžková et al., 2011; Davison and Hinkley 1997]. We carry out the parametric bootstrap-based LRT as follows:

  1. Obtain parameter estimates γ^M2=(α0^,α^) under the null model M2 from the original data

  2. Calculate the residuals: ri=yiCiγ^M2 for i = 1, … , n.

  3. Permute the ri’s to generate the ri’s and create new trait values yi=Ciγ^M2+ri for i = 1, … , n.

  4. Replace the yi’s by the yi’s and recalculate the test statistic LRTMV for i = 1, … n.

  5. Repeat steps (3)-(4) for B times

The parametric bootstrap p-value is #{LRTMV(b)LRTMV,b=1,,B}+1B+1, where LRTMV is the test statistic computed from the original data. Parametric bootstrap-based LRTV can be similarly performed by fitting the null model M3 in step (1) and calculating resampled test statistic LRTV in step (4).

Comparison with other methods for testing variance heterogeneity

Double GLM (DGLM)

Rönnegård and Valdar [2011] proposed to employ the double generalized linear models (DGLMs;[Smyth 1989]) to detect mean and variance differences simultaneously. Specially, both mean and variance of the quantitative trait depend on the genetic factor via yi=α0+Ziα+Xi1β1+Xi2β2+εi,εi~N(0,σi2), and log(σi2)=log(σ2)+Xi1θ1+Xi2θ2. To test equality of both means and variances, we test the null hypothesis H0: β1 = β2 = 0 and θ1 = θ2 = 0. The DGLM is implemented in the R package “dglm”. We can see that the DGLM is equivalent to model M1. As a result, inflation of Type I error is expected for traits deviating from the normal distribution as confirmed by our simulation study. Rönnegård and Valdar [2011] proposed to apply Box-Cox transformation on traits that appear to deviate from normal distribution. Here we demonstrate using simulations that, as the variance test is very sensitive to even subtle deviation from the normal distribution, Box-Cox transformation prior to testing does not guarantee that the Type I error be controlled. This is even true when the residuals are simulated from the distribution of total cholesterol, which is generally considered as normally distributed.

Levene’s Test

Levene’s test has been shown to be a powerful and robust test for equality of variances [Conover et al., 1981; Gastwirth et al., 2009; Levene 1960], and has been used for detecting vQTLs [Paré et al., 2010; Shen et al., 2012; Struchalin et al., 2010]. For ease of exposition, we rewrite the n trait values according to the genotype groups: ykj for k = 0,1,2 and j = 1, . . , nk with k=02nk=n. The test statistic is the ANOVA F-test applied to the absolute differences between each observation and the mean of its group dkj=ykjyk.. The resulting F statistic is F=(n3)k=02nk(dk.d..)2(31)k=02j=1nk(dkjdk.)2, which, because of non-normality of dkj, approximately follows an F(2, n – 3). When n is large, F is well approximated by χ2(2). In addition, Brown and Forsythe [1974] proposed to use the group median y~k instead of the group mean in defining individual deviation dkj for more robust results and this version of Levene’s test is more commonly used [Paré et al., 2010; Shen et al., 2012]. Levene’s test is implemented in the R function “levene.test” in the “lawstat” package. Potential limitations of Levene’s test include no covariates are allowed and only equality of variances, but not means, can be tested.

Lepage Test

Lepage test is a rank-based non-parametric test for either location or dispersion difference [Hollander and Wolfe 1999; Lepage 1971]. For two-sample comparison, it combines Wilcoxon rank sum test statistic for location (median) and Ansari-Bradley test statistic for dispersion [Ansari and Bradley 1960]. Hothorn et al. [2006] extended the Lepage test to K-sample problems (K ≥ 2) to combine the Kruskal-Wallis (KW) test statistic for location and the Fligner-Killeen (FK) test statistic for dispersion, and implemented it in the R package “coin”. Note that the Wilcoxon rank sum test is a special case of the KW tests for two-sample location test, while the FK test was found to perform as well as Levene’s test in a previous comparative study [Conover et al., 1981]. These nonparametric tests cannot adjust for covariate effects.

Simulation studies

In order to compare the power of different tests and their robustness to non-normality assumptions, we simulated a common SNP with a minor allele frequency (MAF) of 0.4, mimicking the functional SNP in the real data example and two sets of quantitative traits, one normally distributed and the other non-normally distributed. For each set, we considered four scenarios: 1. Genotypes have no effects on quantitative traits; 2. Genotypes affect means of quantitative traits; 3. Genotypes affect variances of quantitative traits; and 4. Genotypes affect both means and variances of quantitative traits. The quantitative traits (yi) were generated using the model: yi = Xi1β1 + Xi2β2 + εi, where Xi1 is an indicator variable for the heterozygous genoty Xi2 is an indicator variable for the minor allele homozygote genotype. Without mean effects, β1 = β2 = 0; when genotypes affect means, β1 = −.03, β2 = −.08. For normally distributed quantitative traits, we simulated εi from N(0, 0.232) for scenarios without variance effects. For scenarios with variance effects, εi was generated from N(0, 0.232), N(0, 0. 252), and N(0, 0. 292) corresponding to major allele homozygous, heterozygous, and minor allele homozygous, respectively. The mean and variance effect sizes and MAF were specified to mimic the observed ones in the real data analysis described below.

For non-normally distributed quantitative traits, εi was simulated from a t-distribution (df=5) and scaled with a scale parameter of 0.19 for scenarios without variance effects so that the variance of εi was comparable to that of normally distributed εi. For scenarios with variance effects, we simulated εi from t-distributions with df = 10, 5, and 3, corresponding to major allele homozygous, heterozygous, and minor allele homozygous, respectively, and scaled all the εi with a scale parameter of 0.19. For each scenario, we simulated 1000 replicates with sample size of 1000 in each replicate. Empirical power/Type I error was calculated as the proportion of replicates with statistically significant effects at the threshold level of 0.05. 10,000 resamplings were performed for each replicate when using parametric bootstrap LRT.

To further compare different tests when there are covariates to adjust for, following the simulation setup in Demissie and Cupples [2011], we generated quantitative traits (yi) using the model: yi = Ziα + Xi1β1 + Xi2β2 + εi, where Zi is a covariate, Xi1 and Xi2 are the same as defined above. We considered two cases: 1. Zi is independent of Xi1 and Xi2; 2. Zi is correlated with Xi1 and Xi2, where Zi is a confounder. For case 1, (Xi, Zi)′ were generated from N(μ, Σ), where μ=(00.5), =(1001). For case 2, (Xi, Zi)′ were generated from N(μ, Σ), where μ=(00.5), =(10.60.61). Xi was then categorized using dummy variables Xi1 and Xi2, corresponding to a SNP with MAF of 0.4. We kept Zi as a continuous variable. We used α = 0.2 and also considered four different scenarios as described previously. Simulation of QTL effects was the same as the previous simulation study for normally distributed quantitative traits. In each scenario, 1000 replicates were generated with a sample size of 1000 in each replicate. For Levene’s test and all the non-parametric tests, quantitative trait effects were analyzed after adjusting for the covariate using a two-stage approach, i.e., the first stage fitted a covariate only model to obtain y^i=Ziα^ and the second stage tested the association between the residual yiy^i and the SNP. When using LRT and linear regression (LR), quantitative trait effects and covariate effect were analyzed simultaneously. 10,000 resamplings were performed for each replicate when using parametric bootstrap LRTMV.

Real data analysis: application to the Alzheimer’s Disease Neuroimaging Initiative (ADNI)

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada [Trojanowski et al., 2010; Weiner et al., 2010]. For up-to-date information, see www.adni-info.org. We obtained CSF MMP3 protein levels and genetic data from 293 individuals from the Alzheimer’s Disease Neuroimaging Initiative. Using these data, we tested SNPs within 100 kilobases of the known functional SNP rs679620 for association with CSF MMP3 protein levels using LRTMV, LRTM, LRTV, and LR, respectively. All tests adjusted for age, gender and principal components from population stratification analyses.

RESULTS

Empirical Type I error/power of different tests in the four simulated scenarios were summarized in Table I. For normally distributed quantitative traits (Table I A), the false positive rates were all close to 0.05 when there is no effect. In other words, Type I errors were well controlled for all the tests. Among the joint tests that test for heterogeneity in means and variances simultaneously, the non-parametric Lepage test had larger power than LRTMV in all the scenarios, but not substantially; while LRTMV and parametric bootstrap LRTMV had almost identical performance. Different mean tests were comparable in all the scenarios. LRTV stood out as the most powerful variance test. In addition, the simulation study results confirmed that LRTM is equivalent to the mean test of DGLM (DGLMM), and LRTV is equivalent to the variance test of DGLM (DGLMV). When there was only a mean effect, joint tests lost power in comparison with mean tests. Similarly, joint tests were not as powerful as variance tests when only variance heterogeneities existed. However, the power of joint tests was not much less for the mean or variance only scenarios and the joint tests were substantially more powerful when there were both mean and variance heterogeneity.

Table I.

Comparison of empirical Type I error/power of different tests in four simulated scenarios. PB:Parametric Bootstrap; LR:Linear Regression; KW:Kruskal-Wallis; FK:Fligner-Killeen.

A. Simulated normally distributed quantitative traits.

Simulated
effects
joint tests
mean tests
variance tests
LRTMV LRTMV(PB) Lepage LRTM LR KW DGLMM LRTV Levene FK DGLMV
No effect 0.044 0.040 0.049 0.045 0.045 0.042 0.045 0.052 0.060 0.060 0.052
Mean 0.800 0.801 0.862 0.893 0.891 0.862 0.893 0.065 0.063 0.060 0.065
Variance 0.764 0.760 0.774 0.046 0.063 0.054 0.046 0.831 0.786 0.779 0.831
Mean & var 0.978 0.975 0.983 0.828 0.855 0.820 0.828 0.842 0.796 0.784 0.842
B. Simulated non-normally distributed quantitative traits.

Simulated
effects
joint tests
mean tests
variance tests
LRTMV(PB) Lepage LRTM LR KW LRTV(PB) Levene FK
No effect 0.038 0.039 0.037 0.038 0.037 0.044 0.040 0.041
Mean 0.513 0.930 0.869 0.876 0.933 0.056 0.045 0.041
Variance 0.795 0.449 0.060 0.100 0.050 0.810 0.676 0.437
Mean & var 0.967 0.985 0.795 0.848 0.927 0.853 0.700 0.459
C. Tests that cannot control Type I error for non-normally distributed quantitative traits.

Simulated
effects
LRTMV LRTV DGLMV DGLMV
Box-Cox transformation
No effect 0.246 0.325 0.325 0.172

It is well known that variance tests are sensitive to violation of the normality assumption [Conover et al., 1981; Struchalin et al., 2010]. When quantitative traits were simulated from t-distributions, we observed substantial Type I error inflations of LRTMV, LRTV and DGLMV (Table I C). Rönnegård and Valdar [2011] suggested Box-Cox transformation to correct for inflated Type I error. However, the simulation study shows that Box-Cox transformation did decrease the inflated Type I error of DGLMV, but it was still well above 0.05. We did not include the tests with inflated Type I error in the power comparison since it would not be meaningful to compare the powers of tests if their Type I errors cannot be controlled.

Table I B demonstrated that all the mean tests, non-parametric tests and Levene’s test were robust to violation of the normality assumption. In addition, parametric bootstrap LRTMV and parametric bootstrap LRTV can also correct Type I error inflation. When quantitative traits were non-normally distributed, Lepage test was more powerful than parametric bootstrap LRTMV if only mean effects existed, while parametric bootstrap LRTMV had higher power if there were only variance effects. For both mean and variance effects, the two tests had comparable performance. Comparing different mean tests, the KW test was the best. We also noticed that linear regression (LR) had inflated Type I error (0.1) when non-normally distributed quantitative traits only had variance heterogeneity. Parametric bootstrap LRTV had the highest power among variance tests. The relative performance of different tests did not change when the significance level α was set at 0.01 (Supplemental Table 1) or when the SNP to be tested had a lower MAF (Supplemental Table 2 and 3). A simulation study was also performed to show that the versatile omnibus test for mean and variance heterogeneity we propose here can model mean and variance using the additive genetic model (Supplemental Table 4).

In genome-wide association studies, there are often confounders or covariates to adjust for. However, Levene’s test and all the nonparametric tests cannot incorporate covariates without resorting to a two-step approach. Demissie and Cupples [2011] and Che et al. [2012] showed that a two-stage residual-outcome regression analysis can introduce bias and cause loss of power. Table II summarized the simulation studies comparing one-step tests and two-step tests when there was a covariate to adjust for. When the simulated covariate was independent of the QTL, we did not observe inflated Type I error or loss of power of two-step tests (Table II B). However, when the QTL was correlated with the covariate, which was a confounder in this case, both Lepage test and the KW test lost power (Table II A). This simulation study only included one moderate correlated confounder (see details in the Methods section). We would expect more substantial loss of power for both Lepage test and the KW test when there are highly correlated confounders or multiple confounders to adjust for, which is often the case in common genome-wide association studies [Che et al., 2012].

Table II.

Comparison of empirical Type I error/power between one step tests and two-step tests in four simulated scenarios adjusting for a covariate. PB:Parametric Bootstrap; LR:Linear Regression; KW:Kruskal-Wallis; FK:Fligner-Killeen.

A. QTL is correlated with the covariate (confounder).

Simulated
effects
one-step tests
two-step tests
LRTMV LRTMV(PB) LRTM LRTV LR Lepage KW FK Levene
No effect 0.047 0.049 0.053 0.061 0.049 0.026 0.029 0.058 0.055
Mean 0.624 0.619 0.730 0.048 0.736 0.551 0.578 0.040 0.040
Variance 0.736 0.726 0.051 0.822 0.055 0.749 0.031 0.756 0.764
Mean & var 0.951 0.948 0.707 0.829 0.747 0.948 0.560 0.775 0.782
B. QTL is independent of the covariate.

Simulated
effects
one-step tests
two-step tests
LRTMV LRTMV(PB) LRTM LRTV LR Lepage KW FK Levene
No effect 0.057 0.053 0.048 0.045 0.051 0.044 0.048 0.046 0.043
Mean 0.779 0.780 0.866 0.046 0.869 0.844 0.857 0.044 0.047
Variance 0.722 0.718 0.040 0.823 0.059 0.751 0.047 0.762 0.767
Mean & var 0.978 0.979 0.811 0.845 0.853 0.983 0.833 0.786 0.795

To investigate if variance heterogeneity can be due to in LD with a functional SNP with mean heterogeneity, we performed association analysis of MMP3 SNPs with MMP3 protein levels in CSF. SNP rs679620 of MMP3 gene showed extremely strong association with MMP3 protein levels in CSF (p=6.36E-26) and showed mean heterogeneity but no variance heterogeneity. SNP rs679620 is a nonsynonymous variant in the matrix metallopeptidase 3 (MMP3) gene that results in a change from Lysine to Glutamic acid at amino acid position 45 in the MMP3 protein and has been implicated in several human disease processes [Niu and Qi 2012]. The association analysis of SNPs surrounding rs679620 with MMP3 protein levels in CSF (Figure 1) illustrated that SNPs in LD with the functional SNP showed both mean and variance heterogeneity. Mean heterogeneity fades as LD (|r|) with the functional SNP decreases. However, variance heterogeneity begins to rise and peak in a short interval where LD (|r|) is less than 0.5 (r2 < 0.25) with the functional SNP (Figure 1A). To determine if detected variance heterogeneity is due to LD with a true functional variant with mean heterogeneity, instead of true functional variance heterogeneity, we would expect a strong signal of mean heterogeneity among SNPs in LD with the detected SNP with variance heterogeneity (Figure 1B). Using the real MMP3 genetic data, we also performed association analysis with a simulated quantitative trait on a common variant (supplemental Figure 2), and a simulated quantitative trait on an uncommon variant (supplemental Figure 3), separately. Consistent mean and variance heterogeneity patterns due to LD (|r|) were observed from simulation studies.

Figure 1.

Figure 1

Figure 1

Test statistics (−log10(P values)) for association between SNPs within 100kb of functional SNP rs679620 and MMP3 protein levels in Cerebralspinal Fluid. A. Lowess of test statistics (−log10(P values)) against LD (|r|) with the functional SNP rs679620, from 5′ and 3′ separately. B. Lowess of test statistics (−log10(P values)) against LD (|r|) with the SNP having the smallest p-value of LRTV, from 5′ and 3′ separately. (See supplemental Figure 1 for plots without smoothing.) LR: linear regression; LRTMV: likelihood ratio test testing both mean and variance heterogeneity; LRTM: likelihood ratio test testing mean heterogeneity; LRTV: likelihood ratio test testing variance heterogeneity.

In addition to LD measurement |r|, we also explored the relationship between variance heterogeneity and LD measurement D’ (supplemental Figure 2 and 3). We found a distinct relationship between variance heterogeneity and D’, compared with the relationship between variance heterogeneity and |r|. If a functional variant is common with only a mean effect, it is likely that any relatively uncommon variant in high D’ with it will show a variance heterogeneity peak but it will occur for relatively low |r| values (roughly according to our limited data and simulations 0.5 > |r| > 0.1, which translates to 0.25 > r2 > 0.01). For the opposite situation, if we have a relatively uncommon functional variant with only a mean effect, it is likely that any common functional variant in high D’ with it will show variance heterogeneity at about the same distance (|r|<0.5 and r2 < 0.25) as the prior case. In addition to the simulation and real data analysis results, we also analytically demonstrated why variance heterogeneity can arise due to LD with a functional locus with only mean effect (see supplemental text).

DISCUSSION

We have demonstrated how our method has utility for finding loci that affect trait means, variances, or both without a great sacrifice in power. This provides a nice way to identify classes of loci that may ordinarily be missed by most traditional single locus tests without sacrificing power to detect traditional loci that only affect means. In addition, as other papers have noted, loci that affect variances (vQTL) automatically become a priori hypotheses for GxG interactions. This greatly increases power and reduces computation over standard GxG analyses by reducing the number of tests to the number of loci (i.e. a standard GWAS) instead of every possible pair of loci (n choose 2).

Approach to detecting vQTL

The ability to detect variance heterogeneity is inherently less powerful (regardless of test type) than detecting mean differences. This is primarily because means are the first moment and variances are the second moment. We propose for studies interested in detecting vQTL, the LRTMV test should be used. For multiple independent tests and using the Bonferroni, the LRTMV test controls Type I error with normally distributed traits and non-normally distributed traits with the parametric bootstrap. Under the global null, the Type I error is controlled for the LRTV test if it is only performed for globally significant LRTMV. If there is an underlying mean effect and no variance effect we find a slight Type I inflation of the LRTV test using the Bonferroni correction based on the number of globally significant LRTMV tests (see Section 7 in the supplemental text). Nevertheless, the Bonferroni correction for the variance test appeared to work well in both scenarios. Further investigation is warranted to more comprehensively study the properties of the multiple testing procedure entailed by the two-stage tests proposed here.

Our proposed LRTV test is as powerful as the commonly used Levene’s test for variance heterogeneity, while the latter does not allow adjustment for covariates, which may not be desirable in GWAS of complex traits. Another interesting finding in our study is that the nonparametric Lepage test is a powerful and robust alternative to the LRTMV for simultaneous detection of mean and variance heterogeneity, with the only disadvantage of not being able to accommodate covariates. The LRTMV is only slightly less powerful than LRTV when there are variance only effects yet dramatically more powerful when there are both mean and variance effects. While we may not be able to detect variance effects at the border of genome-wide significance for an LRTV test, with the LRTMV we would detect any that are nominally above genome-wide significance for LRTV. We would also be able to detect loci with real variance effects that are not strong enough for genome-wide significance alone unless they are coupled with a mean effect. Shen et al. [2012] suggested that many loci may show both mean and variance effects, some of which neither would be strong enough for detection by themselves. The recently discovered vQTL in the FTO gene for BMI [Yang et al., 2012] is a locus with very strong mean effects and moderately strong variance heterogeneity effects. While it was difficult to reach genome-wide significance for the vQTL alone with this locus, it would be very easy with the LRTMV test and a subsequent LRTV subtest. While the LRTV subtest may have a mildly inflated Type I error, the p-value from any of their separate datasets (see Table 2 of Yang et al. 2012) would still be highly significant after correcting for inflation.

While DGLM produces statistic for mean only and variance only tests identical to our LRTM and LRTV tests, they are slower computationally and are not amenable to the parametric bootstrap which we have shown is essential for maintaining appropriate Type I error. And so far it has not led to an omnibus test like our LRTMV in their paper or implemented in their R package. Typically, vQTL studies do a set of genome-wide vQTL tests and also a separate set of genome-wide traditional mean tests. The LRTMV test would explicitly avoid the usually unacknowledged issue of doing all of the tests twice while still having power to detect instances of each scenario, mean effects, variance effects, and both. In fact, in the long run it probably has the best chance of identifying vQTL as a byproduct of a locus with both mean and variance effects.

The LRTMV test also has an advantage over a traditional linear regression in general because traditional linear regression may have Type I error inflation in the presence of variance heterogeneity. Variance heterogeneity is a common part of loci in LD with a functional variant with mean effects as demonstrated here. While the type I error inflation tends to mislead a means test in a scenario where we would actually like an association (i.e. the bias points towards situations we are interested in finding), it is not very satisfying to use a test that is giving you the “right” answer for a partially wrong reason. The parametric bootstrap version of LRTMV does not have this inflation and can actually lead us through subtests of LRTM and LRTV to understand to some extent what is contributing to the association. After identifying vQTL via LRTMV and subsequent subtests, a descriptive plot of LRTV and LRTM may reveal the patterns we identified if it is due to LD with a mean functional variant (Figure 1).

Parametric Bootstrap and Computation

It is disheartening to find that in general we recommend doing the parametric bootstrap to control Type I error because tests for testing variance heterogeneity are much more sensitive to deviations from normality. We tried many different transformations and found that none of them appropriately controlled Type I error. Inverse normal transformation was used in the meta-analysis to identify SNPs in association with height or BMI variability [Yang et al., 2012]. However, Struchalin et al. [2010] showed that inflated Type I error of a variance test due to normality deviation cannot be controlled even after inverse normal transformation if the SNP effects mean heterogeneity [Struchalin et al., 2010, Figure 2B]. In addition to inflated Type I error, Beasley et al. [2009] demonstrated that inverse normal transformation can reduce statistical power in some circumstances.

For computational efficiency we suggest a series of options, with the least computational first. The first is to do the standard parametric test; this is sufficient if there is no inflation in the Quantile-Quantile plots. Type I error inflation can also be detected by permuting the phenotype with respect to a random SNP to create an empirical null distribution and compare it with the theoretical asymptotic null distribution. If there is inflation then we suggest a single set of parametric bootstraps can be performed to create a null distribution for all tests for that particular phenotype. We found that this distribution is valid for all the SNPs in the sample for a particular phenotype despite varying minor allele frequencies (see supplemental text and Supplemental Figure 4). Beyond these suggestions, the most intensive options are to do bootstrap replicates for each preliminarily significant SNP or each SNP separately. The full set of bootstrap replicates for a given SNP can be discontinued if the p-value is more than 0.1 after 100 bootstrap replicates.

vQTL or LD

As we have shown, a locus associated with variance heterogeneity of a trait could be a true vQTL or could be due to LD with a functional variant. Variance heterogeneity due to LD only appears when the r2 value is relatively low and D’ is high. This imposes the condition that the two variants have very different allele frequencies. Any two loci in high D’ with similar allele frequencies will by definition have high r2, in which case variance heterogeneity (due to LD) across genotypes will not be possible because the alleles AND genotypes would be highly correlated. Whereas if they are of disparate allele frequency the D’ could be high while the overall correlation is low and the minor allele of the rarer variant would only be seen on one of the allelic backgrounds creating variance heterogeneity across genotypes. Wray [2005] quantified the maximum r2 as a function of the two loci’s minor allele frequencies and their difference. To determine if the putative vQTL is due to LD, if it is common we should look for relatively low frequency variants with large mean effects in high D’ with it but relatively low r2 (see Results). If it has a relatively low frequency, we should look for common variants with mean effects in high D’ with it but relatively low r2. Unless we have all available variants, it will be harder to rule out LD as cause for a putative vQTL when it is common because we may not have sampled all of the low frequency variants in the region.

Scale

The scale of measurement of a trait can determine what pattern of association a locus has with the trait (i.e. mean and/or variances) [Rönnegård and Valdar 2012]. This can also be influenced by various transformations. Biologically, if a locus affects the mean of an unmeasured trait that subsequently has an exponential (or other nonlinear) effect on the measured trait, the locus may display variance heterogeneity with respect to the measured trait. While the interpretation of the locus as a vQTL may not fully describe the inherent underlying relationship, it is still a vQTL for the trait of interest at that scale and gives us a link to the system for which we may uncover the true biology. The test itself allows us to identify loci related to the trait and get a foot in the door. Knowledge of the various ways that this pattern can occur allows us to realize that multiple inferences are possible and that we must pursue our suspected or favorite inference but also take it with a grain of salt. That can be said for just about any pattern identified by a statistical test. With this knowledge in hand, we can develop and test hypotheses related to the range of known possibilities with further biological knowledge and/or trait measurements.

vQTL and rQTL

Once a locus is found to be significant, it can be determined if the effect is due primarily to means, variances, or a combination of both. If variance heterogeneity plays a role, this vQTL could then be used in a GxG analysis. vQTL are theoretically connected to work by James Cheverud who developed the concept of a relationship locus (rQTL) [Pavlicev et al., 2008; Pavlicev et al., 2011]. An rQTL is a locus where the relationship between two traits varies by genotype (i.e. within genotype beta coefficients for the bivariate regression of the two traits differ). Theoretically and empirically, they have been shown to be involved in GxG or GxE interactions. An rQTL can be due to covariance and/or variance (i.e. vQTL) differences across genotypes. In order for an rQTL to exist, one of the interacting loci has a pleiotropic effect on the two traits resulting in some form of relationship between them. The other locus disrupts this pleiotropic relationship by an interaction with the pleiotropic locus for only one of the traits or by opposing interaction affects for each of the traits (i.e. differential epistasis). Finding a vQTL automatically makes it a candidate for being an rQTL with the current trait and some other. Many quantitative traits studied in relation to human disease are risk factors. An rQTL for a disease endpoint and a risk factor acts to modulate (enhance or reduce) the risk. Any vQTL found for a quantitative risk factor is potentially an rQTL and could therefore act to modulate the relationship between that risk factor and disease. Both the vQTL and/or any interacting loci for that risk factor establish a priori hypotheses for these rQTL relationships.

Another important relationship between rQTL and vQTL is that they are theoretically interchangeable through linear combinations of traits. A locus that is a rQTL for two traits and only affects the covariances is a vQTL for any linear combination of the traits such as one of the principle components of the traits or even a composite trait such as the addition of traits 1 and 2. Fundamentally, it means that a vQTL for a trait may suggest that the trait itself is a composite of multiple traits for which the locus is an rQTL. Most biological traits that we measure are composites of multiple factors due to the highly modular forms of biological systems. For example total cholesterol is a composite of the many different types of lipoproteins such as low-density lipoprotein, high-density lipoprotein and very low-density lipoprotein, which themselves are composites of multiple types. That same vQTL for the composite trait may also be an rQTL for the composite trait and some other trait.

Variance heterogeneity gives us another window into gene-by-gene interactions and can also be a tool for identifying loci in LD with functional loci in a region. Our method allows us to leverage both mean and variance heterogeneity to identify important loci and also to shed light on how they may be related to our traits of interest. Just before submitting this manuscript, we discovered a paper published ahead of print in Genetic Epidemiology [Aschard et al., 2013] that describes a nonparametric method to test for different overall distributions across genotypes which would be another way to test for mean and variance effects simultaneously.

The LRTMV, LRTM and LRTV tests are implemented in R using the “nlme” package, which is posted on our website at https://sites.google.com/site/utpengwei/

Supplementary Material

Supplemental data

Acknowledgements

This work was supported by the NIH grant RO1HL105502 to TJM. PW was partially supported by NIH grants R01HL116720 and R01CA169122. The authors declare that there are no conflicts of interest. The authors thank the reviewers for helpful and constructive comments.

LITERATURE CITED

  1. Álvarez-Castro JM, Yang RC. Clarifying the relationship between average excesses and average effects of allele substitutions. Frontiers in Genetics. 2012;3:30. doi: 10.3389/fgene.2012.00030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ansari AR, Bradley RA. Rank-sum tests for dispersions. The Annals of Mathematical Statistics. 1960;31:1174–1189. [Google Scholar]
  3. Aschard H, Zaitlen N, Tamimi RM, Lindström S, Kraft P. A nonparametric test to detect quantitative trait loci where the phenotypic distribution differs by genotypes. Genetic Epidemiology. 2013;31:323–333. doi: 10.1002/gepi.21716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Balding DJ. The advance of Bayesian methods for genetic association analysis; Presented at the 59th Annual Meeting of The American Society of Human Genetics; Honolulu, HI. November 22, 2009.2009. [Google Scholar]
  5. Bartlett M. A note on the multiplying factors for various χ2 approximations. Journal of the Royal Statistical Society. Series B. 1954;16:296–298. [Google Scholar]
  6. Beasley TM, Erickson S, Allison DB. Rank-based inverse normal transformations are increasingly used, but are they merited? Behavior genetics. 2009;39(5):580–95. doi: 10.1007/s10519-009-9281-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brown MB, Forsythe AB. Robust tests for the equality of variances. Journal of the American Statistical Association. 1974;69(346):364–367. [Google Scholar]
  8. Bůžková P, Lumley T, Rice K. Permutation and parametric bootstrap tests for gene-gene and gene-environment interactions. Annals of human genetics. 2011;75(1):36–45. doi: 10.1111/j.1469-1809.2010.00572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Che R, Motsinger-Reif AA, Brown CC. Loss of Power in Two-Regressio Stage Residual-Outcome n Analysis in Genetic Association Studies. Genetic Epidemiology. 2012;26:890–894. doi: 10.1002/gepi.21671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Conover WJ, Johnson ME, Johnson MM. A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics. 1981;23(4):351–361. [Google Scholar]
  11. Davison AC, Hinkley DV. Bootstrap methods and their application. Cambridge university press; 1997. [Google Scholar]
  12. Demissie S, Cupples LA. Bias due to two-stage residual-outcome regression analysis in genetic association studies. Genetic Epidemiology. 2011;35(7):592–596. doi: 10.1002/gepi.20607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Deng WQ, Pare G. A fast algorithm to optimize SNP prioritization for gene-gene and gene-environment interactions. Genetic Epidemiology. 2011;35:729–738. doi: 10.1002/gepi.20624. [DOI] [PubMed] [Google Scholar]
  14. Gastwirth JL, Gel YR, Miao W. The impact of Levene’s test of equality of variances on statistical theory and practice. Statistical Science. 2009;24(3):343–360. [Google Scholar]
  15. Hollander M, Wolfe DA. Nonparametric Statistical Methods. NY John Wiley & Sons; 1999. [Google Scholar]
  16. Hothorn T, Hornik K, Van De Wiel M, Zeileis A. coin: Conditional Inference Procedures in a Permutation Test Framework. 2006 URL http://CRAN.R-project.org, R package version 0.6-6.
  17. Lepage Y. A combination of Wilcoxon’s and Ansari-Bradley’s statistics. Biometrika. 1971;58(1):213–217. [Google Scholar]
  18. Levene H. Robust tests for equality of variances. In: Olkin I, editor. Contributions to probability and statistics. Stanford University Press; Palo Alto, CA: 1960. pp. 278–292. [Google Scholar]
  19. Niu W, Qi Y. Matrix metalloproteinase family gene polymorphisms and risk for coronary artery disease: systematic review and meta-analysis. Heart. 2012;98:1483–1491. doi: 10.1136/heartjnl-2012-302085. [DOI] [PubMed] [Google Scholar]
  20. Paré G, Cook NR, Ridker PM, Chasman DI. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women’s Genome Health Study. PLoS genetics. 2010;6(6):e1000981. doi: 10.1371/journal.pgen.1000981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pavlicev M, Kenney-Hunt JP, Norgard EA, Roseman CC, Wolf JB, Cheverud JM. Genetic variation in pleiotropy: differential epistasis as a source of variation in the allometric relationship between long bone lengths and body weight. Evolution. 2008;62(1):199–213. doi: 10.1111/j.1558-5646.2007.00255.x. [DOI] [PubMed] [Google Scholar]
  22. Pavlicev M, Norgard EA, Fawcett GL, Cheverud JM. Evolution of pleiotropy: epistatic interaction pattern supports a mechanistic model underlying variation in genotype–phenotype map. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution. 2011;316(5):371–385. doi: 10.1002/jez.b.21410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Rönnegård L, Valdar W. Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genetics. 2011;188(2):435–447. doi: 10.1534/genetics.111.127068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Rönnegård L, Valdar W. Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC genetics. 2012;13:63. doi: 10.1186/1471-2156-13-63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Shen X, Pettersson M, Rönnegård L, Carlborg Ö . Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana. PLoS genetics. 2012;8(8):e1002839. doi: 10.1371/journal.pgen.1002839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Smyth GK. Generalized linear models with varying dispersion. Journal of the Royal Statistical Society. Series B. 1989;51:47–60. [Google Scholar]
  27. Struchalin MV, Dehghan A, Witteman JCM, van Duijn C, Aulchenko YS. Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC genetics. 2010;11:92. doi: 10.1186/1471-2156-11-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Trojanowski JQ, Vandeerstichele H, Korecka M, Clark CM, Aisen PS, Petersen RC, Blennow K, Soares H, Simon A, Lewczuk P. Update on the biomarker core of the Alzheimer’s Disease Neuroimaging Initiative subjects. Alzheimer’s & dementia: the journal of the Alzheimer’s Association. 2010;6(3):230. doi: 10.1016/j.jalz.2010.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Weiner MW, Aisen PS, Jack CR, Jr, Jagust WJ, Trojanowski JQ, Shaw L, Saykin AJ, Morris JC, Cairns N, Beckett LA. The Alzheimer’s disease neuroimaging initiative: progress report and future plans. Alzheimer’s & dementia: the journal of the Alzheimer’s Association. 2010;6(3):202. doi: 10.1016/j.jalz.2010.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wray NR. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Research and Human Genetics. 2005;8(2):87–94. doi: 10.1375/1832427053738827. [DOI] [PubMed] [Google Scholar]
  31. Yang J, Loos RJF, Powell JE, Medland SE, Speliotes EK, Chasman DI, Rose LM, Thorleifsson G, Steinthorsdottir V, Mägi R. FTO genotype is associated with phenotypic variability of body mass index. Nature. 2012;490:267–272. doi: 10.1038/nature11401. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental data

RESOURCES