Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction

Arend Voorman; Thomas Lumley; Barbara McKnight; Kenneth Rice

doi:10.1371/journal.pone.0019416

. 2011 May 12;6(5):e19416. doi: 10.1371/journal.pone.0019416

Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction

Arend Voorman ¹, Thomas Lumley ¹, Barbara McKnight ¹, Kenneth Rice ^1,^*

Editor: Stacey Cherny²

PMCID: PMC3093379 PMID: 21589913

Abstract

Genome-wide association studies of gene-environment interaction (GxE GWAS) are becoming popular. As with main effects GWAS, quantile-quantile plots (QQ-plots) and Genomic Control are being used to assess and correct for population substructure. However, in G Inline graphic E work these approaches can be seriously misleading, as we illustrate; QQ-plots may give strong indications of substructure when absolutely none is present. Using simulation and theory, we show how and why spurious QQ-plot inflation occurs in GE GWAS, and how this differs from main-effects analyses. We also explain how simple adjustments to standard regression-based methods used in G Inline graphic E GWAS can alleviate this problem.

Introduction

Genome-wide association studies of Gene-environment interaction (G Inline graphic E GWAS) are now being undertaken to search for modification of environmental effects by genotypes [1], [2]. As in main-effects GWAS that search for the effects of genotype alone, differences in recent ancestry, termed population substructure, can be mistaken for true genetic effects, and is therefore a serious concern [1], [3].

In main-effects GWAS, the extent of the substructure problem is typically addressed using Genomic Control [4]. Here, under the assumption that processes of local mating and genetic drift inflate measures of association in the same way genome-wide, the degree of inflation of the median test statistic (known as Inline graphic ) is a useful assessment of the degree of test statistic inflation at all levels. Dividing test statistics by is a widely-used approach to correct for minor substructure problems; for examples, see e.g. [5], [6]. Adjusting for principal components, which we will use in this paper, is another popular correction method [7], [8].

In G Inline graphic E GWAS, one can also argue that substructure leads to inflation of test statistics by a multiplicative factor. However, in GE GWAS the same inflation can also be caused by an entirely different mechanism: systematic underestimation of variability of effect estimates across the genome. This is not confounding, but it gives the appearance of confounding; hence nave use of Genomic Control can be misleading.

In this paper, we show how the separate effects of population substructure and underestimation of variability affect interpretation of G Inline graphic E GWAS results, and we show how this problem can be solved. In the Results section, using simulation and theory, we describe how spurious QQ-plot inflation can occur. We also illustrate how model-robust estimates of standard errors (also known as “sandwich” standard errors) rectify the problem, while retaining Inline graphic 's ability to identify true substructure.

Assumptions in GE GWAS: classical approaches

In general, regression methods incorporate assessments of variability by estimating standard errors; for a given estimated effect (i.e. Inline graphic ), larger standard errors reflect greater variability from sample to sample, and produce less significant results. However, the precise assumptions reflected in these statements of variability differ between methods.

Under “classical” or “model-based” regression approaches, standard errors only account for random variation in the phenotype (denoted Inline graphic ). Furthermore, for their validity these classical variability estimates require that the mean value of is truly linear in the coefficients of the independent variables, such as environmental variables (denoted ) or genotypes (denoted ) [9].

To illustrate these classical assumptions, we consider linear regression, with Inline graphic coded as 0/1/2 copies of the minor allele. For classical main-effects analysis one might assume that the mean value of truly is

Association would be assessed using the least squares regression estimator Inline graphic and its estimated standard error, which is based on estimated random variation in the phenotype with the values of the observed predictor fixed. (Formally, the analysis is ‘conditioned’ on the independent variable ) [10].

Using the classical approach for interaction analyses, one might instead assume that

(1)

Inference would use Inline graphic and its estimated standard error, where again the variability accounted for by model-based standard errors is that of the phenotype, , in replicate experiments where and are fixed at the values observed in the original data.

How does the mean model assumption affect GWAS work? In main effects analyses, the validity of the mean model is not a major concern. Under the ‘strong null hypothesis' of no association between Inline graphic and , the true mean value of is simply

This means that the model assumptions hold under the null hypothesis, which is sufficient for valid p-values. But in G Inline graphic E work, even under the null hypothesis of no statistical interaction ( in (1)), model-based standard errors assume that the mean of is truly linear in and the residual variance is constant with respect to . When this assumption fails, model-based errors may be too small.

How does accounting for different sources of variability impact GWAS work? In main-effects analyses, we typically have the same, well-specified model for each gene we test, under the null hypothesis. In this case, the variability in our estimates is the same whether or not Inline graphic is truly fixed. As a result, model-based standard errors can be used to produce valid QQ-plots, even though each point on the plot represents a different . But when there is mean-model mis-specification in GE GWAS, variability in interaction term coefficient estimates from to becomes important. QQ-plots using model-based standard errors provide results based on viewing Inline graphic as random, and and as fixed. This contrasts with the observed variation in p-values entering the computation of , where is fixed, but varies – all along the genome. In particular, this means that varies in a way not accounted for by model-based analysis.

We will see that in G Inline graphic E GWAS using model-based standard errors, the behavior of QQ-plots and may not be as straightforward as in main-effects work. In Results, we show how violation of the assumptions both about mean-model validity and what is considered random can lead to misbehaved QQ-plots in studies.

Assumptions in GE GWAS: robust approaches

‘Model-robust’ standard errors are an alternative to model-based. Here, instead of assuming a particular form for the mean Inline graphic given and , standard error estimation views regression estimates as simple summaries of the observed association between and , or and . For example, interaction terms summarize how a measure of the association differs between values of . While the summary is expressed linearly, no underlying assumption of true linearity, in either the Inline graphic relationship or how it differs between levels of , is required for accurate standard error estimates [11]. Thus, concerns about mis-specification of the mean model in GE GWAS disappear. This form of standard error estimation should give inherently better-behaved QQ-plots than the model-based approach.

Model-robust standard error estimates are known as “heteroscedasticity-consistent”, “model-agnostic”, “Huber-White”, or “sandwich” standard errors, and are available in standard statistical software [12]–[14]. Unlike model-based standard errors, they summarize uncertainty in estimates where Inline graphic and all independent variables are considered random. In GWAS work, this means that repeated sampling variability in , and is accounted for. However, when we examine QQ plots we have and fixed while only varies. As will be discussed in the theoretical portion of the Results, this produces about the same amount of variability as when all variables are considered random, and more than when only Inline graphic is considered random. As a result, robust standard errors should give a better assessment of variability than model-based standard errors when we vary due to genome-wide comparison as we do on a QQ-plot.

Results

Simulation results

Before deriving theoretical results, we illustrate the scope of the difference between model-based and model-robust inference in G Inline graphic E GWAS, and the extent of QQ-plot inflation that may be produced in the absence of population substructure.

In Figures 1 and 2, we show the QQ plots for linear regression results in G Inline graphic E GWAS, based on simulations of well specified and misspecified modeled relationships between and . All simulations use Wald tests, independent Normal phenotypes , biallelic genotypes in Hardy Weinberg equilibrium with MAF varying between 0.02 and 0.5 and coded as 0/1/2 copies of the minor allele; for details see Methods. Importantly, the null hypothesis of no Inline graphic interaction holds throughout, and no population substructure is present. Using model-based standard errors, in Figure 1 we see no inflation beyond that expected by chance alone. In Figure 2, in the presence of either of two types of slight model mis-specification, substantial inflation of model-based statistics is observed ( Inline graphic and ), well beyond chance, despite the absence of real interactions or of population substructure. Using the model-robust approach, we see no inflation in the correctly specified model (Figure 1), or for either of the mis-specified models (Figure 2).

In this scenario the data is generated according to , independent of . Both the model-based and robust standard errors are valid estimates of variability, as demonstrated by the QQ-plot.

Panels A and C show scatterplots of vs. generated according to and respectively, independent of . Panels B and D demonstrate the corresponding effect of this mis-specified mean model and non-constant variance.

In Figure 3, we show that similar behavior can occur when substructure is present in an interaction analysis with model mis-specification. Here, structure was incorporated by assigning MAFs to two sub-populations, choosing Wright's Inline graphic to be 0.01, and the mis-specification exactly that displayed in panels A and B of Figure 2. Using a model-based analysis that accounts for the substructure by including one principal component of the SNP data as a covariate in the regression, we see that inflation persists, spuriously. However, the principal component-adjusted model-robust inference removes the substructure problem, and again gives correctly-calibrated Inline graphic -values.

In the left panel, nothing is done to account for the structure. On the right, the results are adjusted for principal components, leaving about the same amount of inflation as the case with no population stratification.

Finally, in Figure 4, we show that similar behavior holds for non-linear regression analysis. In these, model-based errors assume linearity on a modified scale: logit Inline graphic for logistic regression, and the log hazard for Cox proportional hazards regression. Here, in the top row we show results for binary , a relationship that is non-linear on a logit scale, and no true interaction. In the bottom row, we show similar results for a mis-specified Cox proportional hazards regression, with uniform censoring at the median [15]. Similar results hold when using likelihood ratio tests and joint tests of Inline graphic .

The top row displays the results for logistic regression, and the bottom for proportional hazards. The data was simulated according to and with half of the data censored at the median survival time. The top left shows the log odds of an event, which demonstrates non-linearity that was not specified in the model. The plot on the lower left displays a loess curve through the Schoenfeld residuals from the regression of Y on E. A non-zero slope is indicative of violation of the proportional hazards assumption.

Theoretical results

We now develop theoretical results governing the behavior of Inline graphic under model-based and model-robust analyses of GE GWAS.

In the absence of population structure, the population parameter consistently estimated by Inline graphic for interaction terms can be viewed as a ratio of conditional and unconditional variances, as follows:

(2)

where .4549 is the median of the Inline graphic distribution and is the variance estimate, either model-based or robust, used in the analysis. For simplicity we first consider the situation where 1) for all , where 2) is independent of , and where 3) the minor allele frequency is the same for all SNPs . We note that, in the absence of population stratification, the first two conditions are approximately true for nearly all SNPs. The third condition will later be relaxed. Under these three conditions, Inline graphic is approximately constant and can be factored out of the computation of the median in equation (2).

Since Inline graphic and is asymptotically Normal,

is consistent for the variance of Inline graphic taken over the distribution of but conditioning on and . The genomic control can then be written as

The numerator of Inline graphic is the empirical variance of the regression coefficients and is always a good estimate of , the true variance over genotypes fixing the outcome and exposure variable. The denominator of is the estimated variance of from the regression analysis. If model-based inference is used, this estimates Inline graphic , the variance taken over the distribution of the outcome, conditional on the predictor variables. If a model-robust variance estimator is used, the denominator estimates , the unconditional variance of taken over the distribution of all variables.

To see that Inline graphic should be approximately 1 when there is no population structure, despite the conditioning on and that is implicit in the computation of its numerator, we can examine the variance decomposition:

(3)

The numerator of Inline graphic accurately estimates the second term in this decomposition. We show in Appendix S1 that the first term is approximately zero for the case of linear regression, so

as required. Our simulations confirm that this result also holds for logistic regression and Cox regression.

So far we have assumed constant MAF, but the arguments do not depend on the value of the MAF, nor does the conclusion that Inline graphic . Since is defined from the median of the chi-squared statistic, if for the SNPs with each fixed MAF we must also have pooling over a range of MAF. For this reason, the results should hold with typical range of MAFs seen in GWAS so long as the sample size and MAF are large enough to allow accurate estimation of the sandwich variances. This is further supported by the simulation results, which used a wide range of MAFs.

The analog of equation 3 for the model-based estimator is

(4)

The first term in this decomposition is not negligible unless the Inline graphic model is correctly specified, so under model mis-specification

and Inline graphic will tend to be greater than 1 even when there is no confounding by population substructure. Figure 5 shows an example of this.

The panels show estimates of over replications with different variables held constant. At left, the relationship is truly linear. Because is the same regardless of which variables are held constant, then according to the variance decomposition, so is the variability. In the right panel the relationship is exponential. With and fixed, a certain amount of within-sample correlation remains fixed, making different for each instance of . Both the setting where and are fixed and is random, and the setting when all variables are random incorporate this extra variability.

As a further complication, the model-based variance estimator Inline graphic need not be close to the true variance, the second term in equation 4, if the model is misspecified [16].

Discussion

We have seen in the above that standard errors that rely on model assumptions can be underestimates of Inline graphic when those model assumptions are not met, while model-robust estimates of variance provide well-calibrated standard errors and p-values. This distinction can be seen in all types of regression examined. The problem is not merely theoretical; our research was motivated by seeing apparent population substructure similar to that in Figure 2 in initial analyses of a Inline graphic GWAS of echocardiographic traits [17] and noticing that the inflation was absent in cohorts that had used model-robust standard error estimates. The simulation results from linear regression show that even mild heteroskedasticity or mean-model mis-specification can inflate model-based test statistics.

Intuitively explaining sources of variability

The impact of different sources of variability and its relation to model mis-specification is not well recognized. We illustrate the situation for Inline graphic GWAS in 5. Here, for a continuous phenotype , continuous exposure , and binary genotype , we show the spread of estimates holding different variables constant when there is no true interaction or population structure present. Within the blue boxes, and are held fixed while is varied to produce different estimates of Inline graphic . From boxplot to boxplot is varied. Each blue boxplot illustrates the variability in using what model-based errors assume is fixed; it can be compared to the variability with , , and all random, and with and fixed. Under model mis-specification, it is clear that the distribution of Inline graphic varies from to , and that the variability in is larger when , and are all random or when only and are fixed.

When the linear model is true, as in the data summarized in left panel, then the linear trend is the same for any level of Inline graphic . When this is true, the variability in is the same whether or not and are taken to be random. However, when the linear model is not true, then the linear trend need not be the same at different levels of . In right panel of Figure 5, the data were generated according to an exponential relationship between Inline graphic and . Under this model the linear trend will be steeper in samples where the values of are larger. Now for any single instance of and there is always some small degree of correlation between them within the data. As each of these small, fixed associations between and varies over Inline graphic , there is truly effect modification: subjects with different genotypes will tend to have slightly different levels of , and hence a slightly different relationship with . So in addition to the usual sampling variability in estimating , we have this ‘bias’ that varies from each pair of Inline graphic and to the next. If we add these two sources of variability, we obtain the full variability that we observe when and are also random.

Conclusions

In G Inline graphic E GWAS, nave use of QQ-plots and genomic control with model-based standard errors may lead to false conclusions about substructure. The extent of this problem depends on the degree of mis-specification of the mean-model, the form of regression used, and the distribution of the environmental exposure. Use of model-robust inference offers a simple alternative that avoids these difficulties, and retains genomic control as a useful tool for the assessment of substructure.

Methods

Simulation studies in R [18] were used to assess the performance of model-based standard errors and sandwich standard errors in a variety of scenarios, with the genomic-control Inline graphic used to assess the degree of inflation in the test statistics. Visually, this can be seen in QQ-plots.

We simulated a normally distributed environmental exposure, and a response generated from this either under a correctly specified linear model, or under a quadratic mean-model. Genotypes at 10,000 loci were simulated according to a binomial distribution, with minor allele frequency (MAF), drawn from a beta(.5,.5) distribution truncated at 1/2, and with frequencies filtered to be above 0.02. We found that the behavior of the simulations was not affected in a substantial way when the MAF was fixed at any particular value for all loci. In this way, genotype is entirely unrelated to phenotype in these simulations, and so we would hope that tests for gene-environment interaction yield uniformly distributed p-values, as they should be under the null hypothesis.

Population stratification was simulated by drawing an MAF for each of two sub-populations at each locus, centered around some MAF drawn from the distribution described above. These sub-population MAFs were distributed according to a beta distribution parametrized by the central MAF and Wright's Inline graphic , in this case chosen to be 0.01 [4]. In order to allow for confounding, we created a slight difference in the relationship between phenotype and environmental exposure: the linear component of the relationship was 20% of the population and in the other population, while the quadratic component was Inline graphic in both groups.

In addition to linear regression, performance of model-based and sandwich standard errors was assessed in logistic and proportional hazards regression. In these situations we generated simulations in which departures from linearity were on the appropriate transformed scale. In logistic regression, this meant that the linearity was judged on the scale of the logit of the probability of ‘success’. In proportional hazards, the scale was on the log hazard scale. To achieve this, we generated exponentially distributed event times where the exponentiated ‘rate’ parameter was related quadratically to exposure.

Supporting Information

Appendix S1

(PDF)

Click here for additional data file.^{(41.1KB, pdf)}

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This research was funded in part by NIH/NHLBI training grant T32 HL07183-34 and by research grant R01 HL074745. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]
2.Thomas D. Gene-environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11:259–272. doi: 10.1038/nrg2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Pearson T, Manolio T. How to interpret a genome-wide association study. Jama. 2008;299:1335. doi: 10.1001/jama.299.11.1335. [DOI] [PubMed] [Google Scholar]
4.Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
5.Ganesh S, Zakai N, van Rooij F, Soranzo N, Smith A, et al. Multiple loci inuence erythrocyte phenotypes in the CHARGE Consortium. Nature genetics. 2009;41:1191–1198. doi: 10.1038/ng.466. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Nolte I, Wallace C, Newhouse S, Waggott D, Fu J, et al. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies. PLoS One. 2009;4:e6138. doi: 10.1371/journal.pone.0006138. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
8.Zhang F, Wang Y, Deng H. Comparison of population-based association study methods correcting for population stratification. PLoS One. 2008;3:3392. doi: 10.1371/journal.pone.0003392. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Draper N, Smith H. Applied regression analysis. 1998;706 John Wiley and Sons. New York. [Google Scholar]
10.Cox D. Principles of statistical inference. Cambridge Univ Pr; 2006. [Google Scholar]
11.White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817–838. [Google Scholar]
12.StataCorp. Stata statistical software: Release. 2009;11 [Google Scholar]
13.Zeileis A. Econometric computing with hc and hac covariance matrix estimators. Journal of Statistical Software. 2004;11:1–17. [Google Scholar]
14.Zeileis A. Object-oriented computation of sandwich estimators. Journal of Statistical Software. 2006;16:1–16. [Google Scholar]
15.Therneau T original R port by Thomas Lumley. survival: Survival analysis, including penalised likelihood. 2009. URL http://CRAN.R-project.org/package=survival. Accessed 2011 April 7. R package version 2.35-4.
16.Royall R. Model robust confidence intervals using maximum likelihood estimators. International Statistical Review/Revue Internationale de Statistique. 1986;54:221–226. [Google Scholar]
17.Glazer NL, Felix JF, Dörr M, Chen MH, Schmidt R, et al. Genome-wide meta-analyses of snp by environmental factor interactions on echocardiographic traits: a charge-echogen study. 2010. In Press.
18.R Development Core Team. R: A Language and Environment for Statistical Computing. 2009. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. ISBN 3-900051-07-0.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1

(PDF)

Click here for additional data file.^{(41.1KB, pdf)}

[pone.0019416-Hunter1] 1.Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6:287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]

[pone.0019416-Thomas1] 2.Thomas D. Gene-environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11:259–272. doi: 10.1038/nrg2764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0019416-Pearson1] 3.Pearson T, Manolio T. How to interpret a genome-wide association study. Jama. 2008;299:1335. doi: 10.1001/jama.299.11.1335. [DOI] [PubMed] [Google Scholar]

[pone.0019416-Devlin1] 4.Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]

[pone.0019416-Ganesh1] 5.Ganesh S, Zakai N, van Rooij F, Soranzo N, Smith A, et al. Multiple loci inuence erythrocyte phenotypes in the CHARGE Consortium. Nature genetics. 2009;41:1191–1198. doi: 10.1038/ng.466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0019416-Nolte1] 6.Nolte I, Wallace C, Newhouse S, Waggott D, Fu J, et al. Common genetic variation near the phospholamban gene is associated with cardiac repolarisation: meta-analysis of three genome-wide association studies. PLoS One. 2009;4:e6138. doi: 10.1371/journal.pone.0006138. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0019416-Price1] 7.Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[pone.0019416-Zhang1] 8.Zhang F, Wang Y, Deng H. Comparison of population-based association study methods correcting for population stratification. PLoS One. 2008;3:3392. doi: 10.1371/journal.pone.0003392. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0019416-Draper1] 9.Draper N, Smith H. Applied regression analysis. 1998;706 John Wiley and Sons. New York. [Google Scholar]

[pone.0019416-Cox1] 10.Cox D. Principles of statistical inference. Cambridge Univ Pr; 2006. [Google Scholar]

[pone.0019416-White1] 11.White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817–838. [Google Scholar]

[pone.0019416-StataCorp1] 12.StataCorp. Stata statistical software: Release. 2009;11 [Google Scholar]

[pone.0019416-Zeileis1] 13.Zeileis A. Econometric computing with hc and hac covariance matrix estimators. Journal of Statistical Software. 2004;11:1–17. [Google Scholar]

[pone.0019416-Zeileis2] 14.Zeileis A. Object-oriented computation of sandwich estimators. Journal of Statistical Software. 2006;16:1–16. [Google Scholar]

[pone.0019416-Therneau1] 15.Therneau T original R port by Thomas Lumley. survival: Survival analysis, including penalised likelihood. 2009. URL http://CRAN.R-project.org/package=survival. Accessed 2011 April 7. R package version 2.35-4.

[pone.0019416-Royall1] 16.Royall R. Model robust confidence intervals using maximum likelihood estimators. International Statistical Review/Revue Internationale de Statistique. 1986;54:221–226. [Google Scholar]

[pone.0019416-Glazer1] 17.Glazer NL, Felix JF, Dörr M, Chen MH, Schmidt R, et al. Genome-wide meta-analyses of snp by environmental factor interactions on echocardiographic traits: a charge-echogen study. 2010. In Press.

[pone.0019416-R1] 18.R Development Core Team. R: A Language and Environment for Statistical Computing. 2009. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org. ISBN 3-900051-07-0.

PERMALINK

Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction

Arend Voorman

Thomas Lumley

Barbara McKnight

Kenneth Rice

Roles

Abstract

Introduction

Assumptions in GE GWAS: classical approaches

Assumptions in GE GWAS: robust approaches

Results

Simulation results

Figure 1. Correctly Specified Model.

Figure 2. Mis-specified model.

Figure 3. QQ-plots with added population structure.

Figure 4. Example of behavior in logistic and proportional hazards regression.

Theoretical results

Figure 5. Illustrating the variance decomposition.

Discussion

Intuitively explaining sources of variability

Conclusions

Methods

Supporting Information

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction

Arend Voorman

Thomas Lumley

Barbara McKnight

Kenneth Rice

Roles

Abstract

Introduction

Assumptions in GE GWAS: classical approaches

Assumptions in GE GWAS: robust approaches

Results

Simulation results

Figure 1. Correctly Specified Model.

Figure 2. Mis-specified model.

Figure 3. QQ-plots with added population structure.

Figure 4. Example of behavior in logistic and proportional hazards regression.

Theoretical results

Figure 5. Illustrating the variance decomposition.

Discussion

Intuitively explaining sources of variability

Conclusions

Methods

Supporting Information

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases