Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 1.
Published in final edited form as: Genet Epidemiol. 2014 May 6;38(5):416–429. doi: 10.1002/gepi.21810

The role of covariate heterogeneity in meta-analysis of gene-environment interactions with quantitative traits

Shi Li 1, Bhramar Mukherjee 1,*, Jeremy MG Taylor 1, Kenneth M Rice 2, XiaoquanWen 1, John D Rice 1, Heather M Stringham 1, Michael Boehnke 1
PMCID: PMC4108593  NIHMSID: NIHMS595394  PMID: 24801060

Abstract

With challenges in data harmonization and covariate heterogeneity across various data sources, meta-analysis of gene-environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed-effect meta-analysis: the standard inverse-variance weighted meta-analysis and a meta-regression approach. Akin to the results in Simmonds and Higgins (2007), we obtain analytic efficiency results for both methods under the assumption of gene-environment independence. The relative efficiency of the two methods depends on the ratio of within- versus between- cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta-analysis and meta-regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse-variance weighted estimator also had asymptotically full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta-analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high-density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.

Keywords: ADAPTIVELY WEIGHTED ESTIMATOR, COVARIATE HETEROGENEITY, GENE-ENVIRONMENT INTERACTION, INDIVIDUAL PATIENT DATA, META-ANALYSIS, META-REGRESSION, POWER CALCULATION

1 Introduction

Genome-wide association studies (GWAS) provide tremendous opportunities for large-scale exploration of associations between genetic variants and complex traits. Searching genetic associations based on GWAS has been successfully identifying marginal effects of variants at multiple susceptibility loci for a wide spectrum of complex traits, e.g. type 2 diabetes (T2D) (Scott et al. (2007), Zeggini et al. (2008), Morris et al. (2012), Saxena et al. (2013)), cardiovascular outcomes (Psaty et al. (2009), Sarwar et al. (2012)) and cancer (Song et al., 2013). The agnostic discovery strategy of GWAS can be used to detect gene-environment interactions (GEI) that can further characterize the genetic architecture of complex traits through sub-group or joint effects (Khoury and Wacholder (2009); Mukherjee et al. (2012)). In general, the definition of ‘environment’ can be broad, including demographic factors (age, gender etc.), behavioral factors (smoking, alcohol consumption, diet, medication use etc.), and external factors (exposure to air pollution, radio-active substances etc.). Complex traits are influenced by both genetic and environmental factors and possibly their interaction, e.g., physical activity appeared to attenuate the effect of fat mass associated FTO gene variants on obesity risk (Kilpeläinen et al., 2011). With limited number of findings on GEIs so far, it is likely that the GEI effects are small to modest, warranting the need for larger sample sizes and collaboration across different study sites for joint or meta-analysis. Many collaborative networks have been formed to share individual or summary level data from multiple GWAS of related traits, e.g. the DIAGRAM (T2D) (Zeggini et al. (2008), Voight et al. (2010), Morris et al. (2012)), MAGIC (glucose and insulin related traits) (Dupuis et al. (2010), Scott et al. (2012)), CHARGE (heart and aging research) (Psaty et al., 2009), GIANT (anthropometrics) (Speliotes et al., 2010), and Global Lipids (Teslovich et al., 2010) GWAS consortia. There are also computationally efficient tools (e.g. METAL (Willer et al., 2010)) to implement GWA meta-analysis (GWAMA). However, there are relatively few papers that explore analytical issues for meta-analysis of GEI (e.g. Manning et al. (2011), Aschard et al. (2011) to name a couple) compared to meta-analysis of marginal genetic associations.

Several meta-analytic techniques used for randomized clinical trials can be adapted in genetic epidemiology, e.g., the fixed-effects model (FEM) (Whitehead and Whitehead, 1991) and random-effects model (REM) (DerSimonian and Laird, 1986). The term ‘fixed effect model’ in the classical literature (Whitehead and Whitehead (1991), Fleiss (1993), Borenstein et al. (2010), Lin and Zeng (2010b)) most often refers to a model with fixed and common effect. But in general, ‘fixed effects model’ (in plural) only requires that there are fixed and unrelated effects in each study, regardless of the homogeneity assumption. Effect homogeneity can be tested by the Cochran’s Q-test (Cochran, 1954). In this paper, we consider the fixed and common effect framework as in Lin and Zeng (2010b) to derive our analytical results. We comment on this choice as opposed to a general fixed effects model where the interaction parameter can be different across studies later in the paper.

The joint analysis of individual patient data (IPD) from all studies is typically regarded as the ‘gold standard’ for evidence synthesis. However, considerable time and resources are required to share individual level data even in an existing consortium. We refer to the joint analysis of raw data from all studies as IPD analysis (also called mega-analysis in some papers, e.g. Lin and Zeng (2010a)), and classify the methods that combine summary statistics derived from analysis of different studies as meta-analysis. A natural question to ask is how much efficiency gain, if any, can be achieved by analyzing IPD over meta-analysis. Recently, Lin and Zeng (2010b) considered a multivariate IVW (MIVW) estimator under the common effect model. In constructing the MIVW, if the estimates with full covariance matrix for all the common parameters are pooled across studies, then the MIVW is asymptotically equivalent to the IPD estimator. However, in meta-analysis of published results, it is often difficult to obtain the full covariance matrix, while univariate summary statistics (e.g. estimate and standard error) for the effects of interest are more likely to be available. Lin and Zeng also quantified the efficiency loss of using an univariate IVW (UIVW) versus a MIVW estimator. The results from Lin and Zeng are derived in a very general setting. In this paper, we specifically focus on the estimation and testing of GEI parameter. Our goal is to construct estimator for GEI parameter using only univariate summary statistics, bypassing issues with sharing individual level data or multivariate covariance matrices across studies, without sacrificing much efficiency or incurring increased bias.

Another pragmatic question to ask is whether we can detect GEI from summary statistics obtained from previously conducted genome-wide meta-analysis of marginal genetic effects, without the knowledge of IPD. Meta-regression (MR) is a regression-based technique to investigate whether some particular study-level covariates explain heterogeneity among effect estimates from multiple studies. Many studies (e.g., Simmonds and Higgins (2007); Kovalchik (2013)) have compared aggregate data analysis (e.g. MR) with IPD analysis to detect treatment-biomarker interactions for randomized clinical trials (analogous to gene-environment interactions in our case). Simmonds and Higgins considered three methods IPD, UIVW and MR and showed that under certain natural assumptions, analytical power formulae to detect interactions can be expressed in terms of total, within and between study sum of squares (TSS, WSS and BSS respectively) corresponding to the environmental covariate. In absence of IPD, they recommended using UIVW rather than MR if the WSS exceeds BSS and vice versa. We borrow from their work to derive similar analytical expressions for testing GEI under certain assumptions.

Instead of making a discrete choice between UIVW versus MR, we propose a novel adaptively weighted estimator (AWE) combining UIVW and MR, and archiving the same asymptotic efficiency as the IPD estimator under certain conditions. The AWE has two major advantages over the MIVW estimator shown in the following main text: (1) AWE requires only univariate summary statistics from each study (study-specific estimate and standard error for the marginal association of G and GEI parameter, and study-level mean of E); (2) AWE has less efficiency loss compared to MIVW under model misspecification, for example, when the main effects of G or E are heterogeneous across studies or when a continuous covariate E is centered within each study at the study level mean. Our simulation studies indicate that AWE is very robust across multiple model violation scenarios we considered, including presence of non-linearity in interaction term.

The rest of the paper is organized as follows. In the methods section we describe different strategies for meta-analysis of GEI, followed by analytical results on bias, variance and power properties of the newly proposed method. A comprehensive simulation study was performed to assess the performance of the meta-analysis methods under a variety of scenarios. We primarily focus on the issue of covariate heterogeneity, but also explore several other important factors that could potentially affect the relative performance of these methods: (1) departures from gene-environment independence; (2) heterogeneity in minor allele frequencies (MAFs) across cohorts; (3) lack of a common set of covariates to adjust for across studies; (4) misspecification of the genetic susceptibility model (dominant/co-dominant/additive); and finally (5) the presence of a non-linear form of interaction. In the results section, we report simulation findings followed by an illustrative example, where we examine whether variants in FTO gene modify the effect of environmental factors (age and BMI) on high-density lipoprotein cholesterol (HDL-C) levels, a T2D related quantitative trait. This paper is expected to provide useful insights and guidelines for practitioners conducting meta-analyses of GEI.

2 Methods

2.1 Meta-analysis of GEI under a common effect model

Consider a quantitative trait Y , a continuous environmental exposure E, a bi-allelic genetic locus G with genotypes of AA, Aa and aa (where A is the minor allele), and other covariates Z. Suppose that there are K independent studies and a total of N participants, with nk participants in the k-th study, k=1,,K,k=1Knk=N. Let Yki, Eki, Gki and Zki be the corresponding observations for participant i in study k, for i = 1, ..., nk and k = 1, ..., K. The assumed model for individual responses follows

Yki=β0k+βGGki+βEEki+δGkiEki+βZTZki+ki, (1)

where β0k is the study specific intercept, βG, βE and βZ are the main effects corresponding to G, E and Z, and δ is the GEI effect of primary interest. The vector β = (βG, βE, δ, βZ) is assumed to be fixed and common across studies. The random errors εki's are assumed as kiiidN(0,σk2). Our interest lies in estimating the common interaction parameter δ and in testing H0 : δ = 0.

There are multiple reasons for assuming a common effect model (1). First, this model is used quite frequently in the literature (Lin and Zeng (2010a),Lin and Zeng (2010b), Hartung et al. (2011)). Second, the analytical derivation of the relative efficiency and power are facilitated; Third, meta-regression can only be meaningfully conceptualized if the interaction parameter is assumed to be same across studies; Fourth, with unrelated but distinct fixed effects across studies, it is often hard to find a scientific interpretation/relevance of the limiting/expected value of the population parameter to which the standard inverse-variance weighted estimator converges to; thus quantities like bias and mean-squared error become less interpretable. Finally, there was no evidence of effect heterogeneity for the interaction parameters in our T2D data analysis example of 8 European studies. However, as we will discuss subsequently, the ‘common effect’ assumption can be relaxed for testing purposes and most of the methods we discuss are valid to test the stronger null hypothesis H0 : δ1 = ... = δK = 0, if we allowed study specific parameters δk in (1).

Various susceptibility models including the dominant model (G = 1 if AA and Aa; G = 0 if aa), recessive model (G = 1 if AA; G = 0 if Aa and aa), additive model (G = 2 if AA; G = 1 if Aa; G = 0 if aa) and co-dominant model (G = AA, Aa or aa with aa as the reference level) are considered. For co-dominant model, βG=(βGAa,βGAA) and δ = (δAa, δAA) for genotypes Aa and AA can be modified accordingly in model (1).

In the following, we first describe in section 2.1.1 three traditional approaches to detect GEI under model (1). The approaches are IPD analysis, standard meta-analysis (UIVW or MIVW) and MR. For the sake of completeness, we also describe a two-step estimator previously suggested by Simmonds and Higgins (2007). We then propose the new AWE in section 2.1.2. Throughout the paper, we use the generic notation v(δ^) for the asymptotic variance (covariance matrix for multivariate δ^) of any given estimator δ^, and v^(δ^) for the corresponding estimated variance.

2.1.1 Existing methods

(i) Individual patient data analysis (IPD)

The IPD analysis fits model (1) using individual level data. Methods such as weighted least square (WLS) can be used to handle the heterogeneous σk2 across studies, if σk2 can be estimated with sufficient accuracy. However, for simplicity, we consider a simple linear regression model that assumes common residual variance σk2=σ2 for k = 1, ..., K in (1), as the standard implementation of the IPD analysis. Denote the maximum likelihood estimate (MLE) of δ as δ^IPD, and its estimated variance as v^(δ^IPD).

(ii) Meta-analysis using inverse-variance weighted estimator: Since the data required for IPD analysis are seldom available in published results, meta-analysis combining summary statistics across individual studies is often used. We consider some variants of IVW estimator under model (1). (ii.A) UIVW: A UIVW estimator needs the collection of the MLEs δ^k and v^(δ^k) estimated from model (1) using data from only study k. Under the above model, δ^kiidN(δ,v(δ^k)). The UIVW estimator is given by,

δ^UIVW={kv^(δ^k)1}1kv^(δ^k)1δ^kv^(δ^UIVW)={kv^(δ^k)1}1.

The validity of the method requires that δ^k is asymptotically normal δ^kiidN(δ,v(δ^k)) for a large nk and the asymptotic variance v(δ^k) can be estimated by v^(δ^k) with negligible error (Whitehead and Whitehead, 1991). We refer to these conditions as ‘standard conditions’ throughout, and we note that it is often implicitly assumed to hold in classic meta-analysis literature (e.g., DerSimonian and Laird (1986), Whitehead and Whitehead (1991), Lin and Zeng (2010b) ).

(ii.B) MIVW: Let β^k=(β^Gk,β^Ek,δ^k,β^Zk) be the MLE of β from study k, with estimated variance-covariance matrix v^(β^k). When both β^k and (v^(β^k) are available from each study, we consider the MIVW estimator following Lin and Zeng (2010b),

β^MIVW={kv^(β^k)1}1kv^(β^k)1β^kv^(β^MIVW)={kv^(β^k)1}1.

Then δ^MIVW and v^(δ^MIVW) corresponding to the interaction parameter δ can be obtained from the corresponding element in β^MIVW and v^(β^MIVW) respectively. Following Lin and Zeng (2010b), δ^MIVW has full asymptotic efficiency as δ^IPD under the common effect model (1). However the full covariance matrix v^(β^k) is not likely to appear in meta-analysis of published results and different studies may adjust for different covariates Z. So δ^UIVW remains the most commonly used meta-analytic method in spite of potential efficiency loss as compared to δ^MIVW and δ^IPD.

(iii) Meta-regression: The true model (1) implies that the Y − G association depends linearly on E. So we consider a linear MR model to reveal the underlying dependence between the marginal genetic effects and the aggregated study mean values of E (say mk = δi Eki/nk). Screening for the marginal effect of G is routinely performed as the first step in GWA analysis. For each study k, we first consider the association model

Yki=λ0k+λGkGki+λZkTZki+ηki,i=1,,nk. (2)

where the errors ηkiiidN(0,σηk2). At the second step, the MLE λ^Gk is regressed on mk through the MR model

λ^Gk=γ0=γmk+νk,k=1,,K. (3)

To account for the potential heterogeneity in v^(λ^Gk) across studies, we consider the WLS estimator of γ, with weight wk=v^(λ^Gk)1 assumed as known, i.e., νkN(0,v^(λ^Gk)). Denote the WLS estimator of γ in model (3) by δ^MR. Then δ^MR and v^(δ^MR) can be expressed as

δ^MR=(kwk)(kwkmkλ^Gk)(kwkmk)(kwkλ^Gk)(kwk)(kwkmk2)(kwkmk)2,v^(δMR)=kwk(kwk)(kwkmk2)(kwkmk)2σ^2.

The advantage of MR approach is that one can identify GEI with only limited summary data on E (only the mean mk's) and published results of marginal genetic effects (λ^Gk and v^(λ^Gk)).

(iv) Two-stage estimator

Let m=k,iEkiN denote the overall sample mean of E, sE2=N1k,i(Ekim)2 be the total sample variance of E, sEk2=nk1k=1nk(Ekimk)2 be the sample variance of E within the k-th study. Denote the corresponding population parameters for m, mk, sE2, sEk2 as μ, μk, σE2, σEk2 respectively. We make the usual partition of the total sum of squares (T SS) of E as the sum of the within-study sum of squares (W SS) and between-study sum of squares (BSS), i.e., T SS = W SS +BSS, where TSS=k,i(Ekim)2=NsE2, WSS=ki(Ekimk)2=knksEk2 and BSS=knk(mkm)2. Throughout this paper, we assume nk/Nk ∈ (0,1) as N → ∞ Consider the limiting true population quantities as tss=σE2, wss=kϱkσEk2 and bss=kϱk(μkμ)2. We have TSSNptss, WSSNpwss, BSSNpbss, as N → ∞.

Motivated by the fact that asymptotic relative efficiency (ARE) of δ^MR compared to δ^UIVW is driven by bss/wss, we consider a two-stage approach analogous to Simmonds and Higgins (2007) as

δ^TS={δ^UIVW,ifBSSWSS1;σ^MR,ifBSSWSS>1,}

i.e., using δ^UIVW instead of δ^MR if W SS ≥ BSS and vice versa. Note that δ^TS is an ad-hoc procedure of discretely determining which method to use, based on the statistic BSS/W SS that measures heterogeneity in E between studies relative to within study variation in E.

2.1.2 Adaptively weighted estimator

We note that, using only summary statistics, both δ^UIVW and δ^MR can potentially lack precision. Moreover, δ^MR can have significant ecological bias (Morgenstern (1982), Greenland (1987), Schwartz (1994), Berlin et al. (2002)) if the aggregate data relationship differs from the one observed in individual level data. Thus, we propose an adaptive estimator that combines δ^UIVW and δ^MR to trade-off between bias and efficiency in a data adaptive way. We first prove the following lemma which is also used in Kooperberg and LeBlanc (2008) and Dai et al. (2012).

Lemma 1. Let Yi be independent random variables with equal variance, for i = 1, ..., n, and let Xj=(X1j,,Xnj)T be the j-th predictor, j = 1, ..., p + q. Let λ^j(j=1,,p) and β^j be the MLEs of the parameters under the two nested linear regression models

Yi=λ0+j=1pλjXij+ηiandYi=β0+j=1p+qβjXij+i,

then (λ^1,,λ^p) and (β^p+1,,β^p+q) are asymptotically independent.

Proof of Lemma 1 is presented in Appendix B.1.

Applying Lemma 1 to models (1) and (2), the marginal genetic association λ^Gk and GEI δ^k are asymptotically independent within each study k, as they are coming from two nested linear regression models using data within study k. Note that δ^UIVW is a linear combination of δ^k, and that δ^MR is a linear combination of λ^Gk, then the following corollary holds.

Corollary 1. δ^UIVW and δ^MR are asymptotically independent.

The independence of the two estimators are critical as we can now borrow the classical idea of constructing an IVW estimator using these two independent ingredients. Assuming the standard conditions, we propose an AWE of the form

δ^AWE={v^(δ^UIVW)1+v^(δ^MR)1}1{v^(δ^UIVW)1δ^UIVW+v^(δ^MR)1δ^MR},

which combines δ^UIVW and δ^MR using their inverse-variances as weights. In order to calculate δ^AWE, summary statistics of study-specific effect estimates (δ^k, v^(δ^k), λ^Gk and v^(λ^Gk)) and study-level covariate means mk are needed from each study k. The intuitive rationale behind the AWE is that, when v^(δ^UIVW) is relatively smaller than v^(δ^MR), δ^AWE puts more weight on δ^UIVW and vice versa.

Theorem 1. For the class of weighted estimators δ^AWE(w)=wδ^UIVW+(1w)δ^MR, 0 ≤ w ≤ 1, v(δ^AWE(w))1 attains its maximum at v(δ^UIVW)1+v(δ^MR)1 if and only if the weight w=v(δ^MR){v(δ^UIVW)+v(δ^MR)}.

Theorem 1 proved in Appendix B.1 establishes the optimality of the inverse variance weights for AWE. A consequence of Theorem 1 is that the precision of δ^AWE is the sum of the precisions of δ^UIVW and δ^MR. Under the standard conditions, the estimated variance of the AWE estimator is given by v^(δ^AWE)1=v^(δ^UIVW)1+v^(δ^MR)1. We will further show that δ^AWE is fully efficient as δ^IPD under certain plausible assumptions in section 2.2.

Remark 1: Co-dominant model. For the co-dominant model with δ=(δAa,δAA), it is straightforward to translate the proposed methods to their bivariate counterparts. In particular, δ^IPD and v^(δ^IPD) can be directly obtained from (1); δ^UIVW and v^(δ^UIVW) can be obtained as {kv^(δ^k)1}1kv^(δ^k)1δ^k and {kv^(δ^k)1}1;δ^MIVW and v^(δ^MIVW) can be obtained from β^MIVW and v^(β^MIVW); MR model can be modified as a multiple response regression λ^k=γ0+γmk+νk, where λ^k=(λ^GkAa,λ^GkAA)T and νkiidN(0,v^(λ^k)). Corollary 1 and Theorem 1 also hold following Lemma 1 for bivariate δ. A bivariate form of AWE can be considered as δ^AWE={v^(δ^UIVW)1+v^(δ^MR)1}1{v^(δ^UIVW)1δ^UIVW+v^(δ^MR)1δ^MR}.

2.2 Analytical results

This section presents some analytical results regarding bias, variance and power properties for the adaptive estimator described in section 2.1.2. We consider models without covariate Z to simplify the presentation.

2.2.1 Bias

Following classic linear regression and meta-analysis results, δ^IPD, δ^UIVW and δ^MIVW are all asymptotically unbiased estimators of δ. However, δ^MR is not necessarily unbiased for δ in general. The relationship between the marginal effect of G and the study-specific means mk may differ from the underlying relationship between the marginal effect of G and individual level data for E. This phenomenon is known as ‘ecological bias’ or ‘ecological fallacy’, and is well characterized in the literature (Morgenstern (1982), Greenland (1987), Schwartz (1994)). However, we note that δ^MR is an unbiased estimator of δ under the following G-E independence assumption, which is a plausible assumption well-discussed in the literature. We use the generic notation P (·) to denote the distribution of a random variable.

Assumption 1. P(G, E| study = k) = P(G| study = k)P (E| study = k), for k = 1, ..., K, i.e., G and E are independent within each study.

Proposition 1. Under Assumption 1, δ^MR of model (3) is asymptotically unbiased for δ.

Proof of Proposition 1 is presented in Appendix B.2. In the following Remark 2, we further discuss the issue of potential bias of δ^MR and thus in δ^AWE (which assigns a positive weight on the MR estimator) if Assumption 1 is violated.

Remark 2: Bias of δ^MR and δ^AWE. Without Assumption 1, we showed (in Appendix B.2) that the limiting value of the bias of δ^MR is proportional to the ratio tss/bss and the correlation between G and E. If the G-E correlations within each study are 0, then E(δ^MR)δp0. If Assumption 1 holds, δ^AWE is an asymptotically unbiased estimator of σ as both its components are unbiased. Moreover, we show later in section 2.2.2 that the limiting value of the weight corresponding to δ^MR in δ^AWE is bss/tss. So δ^AWE adaptively puts less weight on δ^MR when the bias of δ^MR increases. We find through our numerical investigation that δ^AWE is robust to potential ecological bias in δ^MR even when the ecological bias is substantial (please see Appendix Figure 9 and Table 6 for the simulation results, and main text Table 4 corresponding to the T2D example).

Table 4.

IPD/Meta-analysis results of GEI for the T2D study, where log transformed HDL-C level was regressed on SNP, age, BMI, sex, T2D status, cohorts, and SNPxE interaction (E as BMI and age in two separate analysis) in the IPD model. Estimates, SEs and CIs have been multiplied by 1000.

Methodsa rs1121980 (additive) x BMI
P-value*
Estimate SEb 95% CIb Additive Co-dominant
IPD 1.474 0.687 (0.128,2.821) 0.03** 0.03*
UIVW 1.731 0.675 (0.407 , 3.054) 0.01* 0.02*
MIVW 1.518 0.663 (0.219 , 2.816) 0.02* 0.01*
MR −0.719 3.136 (−6.866, 5.429) 0.81 0.69
AWE 1.622 0.660 (0.328, 2.916) 0.01* 0.02*
rs1121980 (additive) x age
Additive Co-dominant
IPD 0.011 0.304 (−0.585 , 0.606) 0.97 0.68*
UIVW 0.046 0.337 (−0.613 , 0.706) 0.89 0.74
MIVW −0.008 0.307 (−0.610, 0.594) 0.97 0.69
MR 0.180 0.522 (−0.843 , 1.203) 0.73 0.77
AWE 0.086 0.283 (−0.469 , 0.640) 0.76 0.69
a

IPD: individual patient data; UIVW: univariate inverse-variance weighted estimator; MIVW: multivariate inverse-variance weighted estimator; MR: Meta-regression; AWE: adaptively weighted estimator combining UIVW and MR.

b

SE: standard error; CI: confidence interval.

*

indicating significance at α = 0.05 level.

*

indicating whether additive or co-dominant model has smaller AIC under the IPD model.

2.2.2 Variance and Relative Efficiency

Explicit variance formulae v^(δ^) and v(δ^) for each estimator of δ are derived under Assumption 1 in Appendix B.3. Because the linear regression likelihood k,iP(YkiGki,Eki) corresponding to model (1) does not use any assumptions about the joint distribution of G and E, the role of the G-E independence assumption in this paper is only to provide simpler expressions for the variances. This is different from case-control studies where assuming G-E independence and using the retrospective likelihood can lead to large gains in efficiency (Piegorsch et al. (1994), Umbach and Weinberg (1997), Chatterjee and Carroll (2005)).

In this section 2.2.2, for simplicity of presentation we assume σk2=σ2 for k = 1, ..., K, and consider a dominant susceptibility model for stating Theorems 2 and 3. Let G = 1 (G = 0) indicate whether an individual is a carrier (non-carrier) of the minor allele A, and let pk denote P(G = 1| study = k) the carrier frequencies in study k, k = 1, ... , K.

Theorem 2. Under Assumption 1, v(δ^IPD)1v(δ^UIVW)1+v(δ^MR)=v(δ^AWE)1. The equality holds if and only if pk = p, for k = 1, 2, ..., K, where p is the common carrier frequency across all studies.

Proof of Theorem 2 is given in Appendix B.4. Under Assumption 1, the precision of δ^IPD is in general greater than that of δ^AWE. However, under the additional assumption of homogeneity of the MAFs (Assumption 2 stated below), we have equality v(δ^IPD)=v(δ^AWE).

Assumption 2. The MAFs corresponding to the susceptible SNP are constant across all studies, i.e. pk = p, for k = 1, 2, ..., K.

Theorem 3. Under Assumptions 1 and 2, v(δ^IPD)1=v(δ^AWE)1=v(δ^UIVW)1+v(δ^MR)1, where v(δ^UIVW)={Np(1p)wss}1σ2, v(δ^MR)={Np(1p)bss}1σ2 and v(δ^IPD)=v(δ^AWE)={Np(1p)tss}1σ2.

Proof of Theorem 3 is given in Appendix B.5. Following Theorem 3, the asymptotic variances v(δ^IPD), v(δ^UIVW), v(δ^MR) and v(δ^AWE) are all expressed in terms of covariate heterogeneity of E. The ARE between δ^UIVW(δ^MR) and δ^IPD is wss/tss (bss/tss). So v(δ^UIVW)v(δ^MR), if wss bss, and vice versa. For the extreme case, when there is no between-study heterogeneity in the study means of E (i.e. μk = μ), v(δ^UIVW)=v(δ^IPD)), and δ^AWE reduces to δ^UIVW; in contrast, if all σEk2=0 (i.e. E is constant within each study), v(δ^MR)=v(δ^IPD), and δ^AWE reduces to δ^MR.

The limiting weights in δ^AWE can be simplified as w=v(δ^MR){v(δ^UIVR)+v(δ^MR)}=bss1{wss1+bss1}=wsstss. Since WSSTSSpwsstss and BSSTSSpbsstss, as N → ∞ we can use the estimated weights W SS/T SS and BSS/T SS in δ^AWE, which leads to

δ^AWE=WSSTSSδ^UIVW+BSSTSSδ^MR.

δ^AWE adaptively captures the precision trade-off between the two estimators: δ^AWE puts more weight on δ^UIVW if W SS is relatively larger than BSS, and vice versa. In summary, under Assumptions 1 and 2, δ^AWE is a consistent, unbiased, and asymptotically fully efficient estimator, which uses only univariate summary statistics without the knowledge of the original IPD. The operating characteristics for the proposed meta-analytic methods are summarized in Table 1. The results in Theorems 2 and 3 are numerically evaluated through a simulation study to examine the effect of relaxing Assumption 1 or 2, and relaxing the homogeneity assumption of σk2.

Table 1.

Glossary of the meta-analysis methods for GEI with summary properties. [IPD: individual patient data analysis; UIVW: univariate inverse-variance weighted estimator; MIVW: multivariate inverse-variance weighted estimator; MR: Meta-regression; AWE: adaptively weighted estimator combining UIVW and MR. λ^k stands for estimates of marginal genetic association and δ^k stands for gene-environment interaction respectively.]

Methods Data shared Bias AREa
IPD individual level data unbiased 1
UIVW δ^k,v^(δ^k) unbiased wss/tss under Assumptions 1 and 2
MIVW β^k,v^(β^k) unbiased 1 under assumptions in LZ
MR λ^Gk,v^(λ^Gk) and mk unbiased under Assumption 1 bss/tss under Assumptions 1 and 2
AWE λ^k,v^(λ^k);λ^Gk,v^(λ^Gk) and m k unbiased under Assumption 1b 1 under Assumptions 1 and 2
a

ARE: asymptotic relative efficiency as compared to δ^IPD

b

bias adaptively controlled in AWE.

Remark 3: Additive and co-dominant models. In general, it is difficult to provide analytical results related to δ^AWE in Theorems 2 and 3 for an additive model, but we can directly translate Theorems 2 and 3 for δAa and δAA respectively under a co-dominant model if we assume diag(v^(λ^GkAa),v^(λ^GkAA)) for v^(λ^k) in the MR model, i.e., two separate MRs. The statements in Theorems 2 and 3 are numerically evaluated for additive and co-dominant models through a simulation study relaxing Assumptions 1 or 2, and relaxing the homogeneity assumption of σk2.

Remark 4: Centering of covariate E. Continuous E is often centered to facilitate the interpretation of βG as the main effect of G at the mean value of E. Under a meta-analysis set-up, it is natural to consider each study k fits model (1) with E centered at their respective study specific means mk. For the IPD analysis, it is natural to consider that E is centered at the overall mean m. With these centering strategy, δ^IPD, δ^UIVW, δ^MR and δ^AWE remain invariant, and results in Theorems 1-3 still hold for the centered models. The details are shown in Appendix B.4. However, properties of δ^MIVW do not hold under the above centering strategy. It is not fully efficient, because mk-centered model creates artificial heterogeneous main effects of G (depending on mk) across studies. The mk-centered model only has two common fixed-effects compared to the true model having three common fixed-effects (Appendix B.4), and this leads to efficiency loss of δ^MIVW according to Lin and Zeng (2010b). In terms of efficiency, δ^AWE is preferable to δ^MIVW when covariate E is mk-centered using the study level means.

Remark 5: Relaxing the common effect assumption. First, we consider heterogeneous main effects of G and E, namely (δGk, δEk) with a common GEI δ across studies in model (1). To handle effect heterogeneity, we could replace (βG, βE) by (βGk, βEk) in model (1) for the IPD analysis; replace MR model (3) by λ^Gk=γ0k+γmk+νk for the MR analysis; and still use δ^UIVW as it does not require homogeneity of (βGk, βEk). According to Lin and Zeng (2010b), the modified estimator has the property that v(δ^IPD)=v(δ^MIVW)=v(δ^UIVW) because δ is the only common parameter across studies. Theorem 1 still holds since it makes no homogeneity assumption on (βGk, βEk). Then we have v(δ^AWE)<(δ^MIVW). In terms of precision, δ^AWE is better than δ^MIVW when (βGk, βEk) are heterogeneous. Next, we consider heterogeneous GEI δ1, ..., δK across studies in model (1), regardless of (βGk, βEk) are homogeneous or not. In this case, it is hard to interpret the expected value of δ^UIVW, δ^MIVW, δ^MR or δ^AWE as a scientifically relevant population parameter. Thus, estimation properties such as bias and mean squared error (MSE) become less meaningful. In this case we are simply getting an weighted average of within study interaction estimates. Although the analytical results corresponding to δ^AWE are derived under a common effect model, the test based on δ^AWE is still valid for the stronger null hypothesis H0 : δ1 = ... = δK = 0. We will numerically evaluate the power and Type-I error under violation of the common effect assumption through simulation studies.

2.2.3 Power

For dominant and additive models, we consider the Wald-type test statistic T=v^(δ^)12δ^ for testing the null hypothesis H0: δ = 0 against H1: δ≠ = 0. The power to detect an effect size δ* at level α is approximately Pw(δ,α)=Φ(zα2+v^(δ^)12δ)+Φ(zα2v^(δ^)12δ), where Φ is the cumulative distribution function (CDF) of a standard normal variable Z and zα2 is the corresponding α2 th upper percentile. For co-dominant models, we consider a joint Wald test statistic T=δ^Tv^(δ^)1δ^H0χ22 for testing H0: δ = 0 against H1: δ ≠ 0, where χ22 is a Chi-square distribution with two degrees of freedom. The power is approximately Pw(δ,α)=1Φχ22(χ2,α2δTv^(δ^)1δ), where Φχ22 is the CDF for a χ22 distributed random variable and χ2,α2 is the corresponding th upper percentile. The power functionPw(δ*, α) or simply Pw, is strictly decreasing in the variance v^(δ^) Thus, the results regarding variances in Theorems 1-3 also determine relative power properties.

Table 1 provides a glossary table for all the methods we have discussed, along with their properties, and the summary statistics required to carry out these procedures.

2.3 Simulation study

In order to study the role of G-E independence (Assumption 1) and homogeneity in MAFs across cohorts (Assumption 2), we consider P (G, E) under four different settings: when Assumptions 1 and 2 hold and do not hold. To study the role of covariate heterogeneity in E, we consider cases where wss is greater or smaller than bss, for a fixed value of tss. The details of generating data pair (Gki, Eki) jointly are described in Appendix B.6.

Given (Gki, Eki), we then generate the continuous trait Yki under the IPD model (1), where the study specific intercepts are sampled from β0kiidU(1.3,1.5), and the true effect sizes (βE, βG, β*) are determined such that G, E and GEI explain 1%, 10% and 0.5 − 1% of the total variation in Y respectively, in terms of partial R2. The residuals follow a N(0,σk2) distribution, i.e. no requirement for homogeneity of σk2 is made. In particular, we generate σk2iidU(0.3,0.45) that leads to a marginal distribution of Y ~ N(1.4, 0.42). The choice of N(1., 0.42) is motivated by the distribution of log HDL-C level (mmol/l) in our T2D data set. We generate K = 20 studies with different sample sizes involving a total of N = 10, 000 participants (nk = 200, for k = 1, ..., 6; nk = 400, for k = 7, ..., 11; nk = 500, for k = 12, ..., 17; n18 = 800; n19 = 1000; n20 = 2000).

We calculate δ^ and v^(δ^) corresponding to each proposed estimator, including δ^IPD, δ^UIVW, δ^MIVW, δ^MR, δ^TS and δ^AWE. We carry out R = 1, 000 replications under each setting, and summarize the results in terms of relative bias (1Rr=1Rδ^(r)δ)δ×100%, average model based variance 1Rr=1Rv^(δ^(r)), empirical variance 1R1r=1R(δ^(r)δ^¯(r))2, MSE(1Rr=1R(δ^(r)δ)2), power (proportion of simulations rejecting the null hypothesis using the Wald test) and Type-I error (proportion of simulations rejecting the null hypothesis when the data are generated under the null).

Lack of common set of covariates across studies: We then consider covariate Z = (Z1, Z2, Z3) that stand for typical covariates (age, sex, race) in the IPD model (1). In particular, age (Z1) is continuous and associated with E, gender (Z2) is binary and independent of both (G, E), race (Z3) is a 3-level categorical variable and associated with both (G, E), with βZ is determined such that the Type-III partial R2 corresponding to (Z1, Z2, Z3) is (2%, 1%, 1%) respectively. Let Zk be the set of covariates for the k-th study. We consider an analysis where Zk is only partially available from individual studies, and refer to this situation as ‘lack of common set of covariates across studies’. In particular, we consider Zk=(Zk1,Zk2,Zk3) for k = 1, 2, 3; Zk=(Zk1,Zk2) for k = 4, 5, 6; Zk=(Zk1,Zk3) for k = 7, 8, 9; Zk=(Zk2,Zk3) for k = 10, 11, 12; Zk=Zk1 for k = 13, 14; Zk=Zk2 for k = 15, 16; Zk=Zk3 for k = 17, 18; No Zk available for k = 19, 20. For IPD analysis without any imputation of covariates, one can only obtain an IPD estimator based on the common subset of variables available across all studies, which reduces to an unadjusted model in the above setting. We refer it as a naive IPD estimator (δ^NIPD). For the meta-analysis, we obtain δ^UIVW and δ^MIVW from the k-th study model adjusted for available Zk, for k = 1, ..., K. For MR, we adjust for Zk at the first stage in the marginal genetic association model, and regress the MLEs of adjusted effects of G on mk. These methods are compared with an ideal IPD estimator δ^IPD that adjusted for all Z.

Non-linear GEI model: We consider a non-linear GEI model where the phenotype-genotype association parameter βG(E) varies with E through a sigmoid function βG(E) = 2 exp(E − 50) /{1 + exp(E − 50)} + 2, instead of the assumed linear interaction (shown in Figure 1). In this case, βG(E) changes at different rates for different values of E (sharper around the mean value of E, flatter at more extreme values of E), which leads to non-linear interaction. In Figure 1, most studies only contribute to a restricted range of E, leading to heterogeneity of individual interaction estimates across studies. In this case, meta-analysis with a misspecified linear interaction model might fail to detect the true non-linear interaction. In the simulation study, we generate K = 20 studies, where 4 studies have relatively larger within study variability (studies 5, 10, 11, 15 in Figures 1 and 2) as compared to the other 16 studies. The complete description of nk, mk and σEk for the 20 studies are given in Figures 1 and 2. We generate Y through the non-linear interaction model Yki = β0k + βG(Eki)Gki + εki, where εkiiidN(0,σk2). The within study relationships of the marginal effects of G as a function of E, namely, βG(E) are substantially different across studies. The effect heterogeneity and non-linearity might influence the validity and relative performance of the proposed methods where a linear form of interaction is assumed. Therefore, we evaluate the robustness of the proposed meta-analysis estimators under this non-linear GEI model.

Figure 1.

Figure 1

Non-linear GEI model: the (red) sigmoid curve shows the true relationship between Y -G association and E, namely, βG(E) = 2 exp(E − 50)/{1 + exp(E − 50)} +2; the boxplots show the covariate heterogeneity of E across studies where the dots show the corresponding covariate means of E.

Figure 2.

Figure 2

Non-linear GEI model: the height of the bars represent the power to detect GEI across individual studies; the (green) curve shows the value of the true non-linear GEI parameter; the top panel shows the sample sizes nk and the within study standard deviations σEk of E, the four studies with relatively greater σEk are highlighted (in red).

3 Results

3.1 Simulation results

The simulation results are summarized in terms of bias, variance, MSE and power (Appendix Tables 1-4). The relative performances of the methods are very similar across all three susceptibility models and all four settings, so we only present in the main text the most general setting where the data are generated without either Assumption 1 or 2. The detailed simulation results are given in the Appendix, and we only summarize the key features in the following.

Gene-environment independence and ecological bias:

For bias, comparing settings with and without G-E independence, we observe no substantial difference the proposed estimators, including the potentially biased estimators δ^MR and δ^AWE (Appendix Tables 1-4). When Assumption 1 is relaxed (Appendix Tables 3 and 4), the magnitude of relative bias of δ^MR may be up to ±7% but bias of δ^AWE is still well controlled (up to ±4%). Comparing to the Monte Carlo error (up to ±3% even for the unbiased estimators), the bias of δ^AWE is not to a level of practical concern even when there is some bias in δ^MR. In additional simulation settings where δ^MR is susceptible to substantial ecological bias (up to 35%) and when G-E correlation is extremely strong, our results (Appendix Figure 9 and Table 6) indicate the adaptive feature of δ^AWE in controlling the bias from δ^MR by assigning decreased weight. The relative bias of δ^AWE is still < 5%. Thus the issue of ecological bias for aggregate analysis in δ^MR is less of a concern for δ^AWE. For variance, we did not observe precision gain by making the G-E independence assumption as expected. Results stated in Theorem 2 appear to hold numerically for all three genetic susceptibility models, even when Assumption 1 is relaxed (Appendix Tables 3 and 4).

Homogeneity in allele frequencies across cohorts

Comparing settings with and without homogeneous allele frequencies across studies, we did not observe any appreciable differences in results. Results in Theorem 3 hold numerically for all three genetic susceptibility models, even when Assumption 2 is relaxed.

Covariate heterogeneity in E

We observe that the ARE between δ^UIVW(δ^MR) and δ^IPD can be well characterized in terms of wss/tss (bss/tss) respectively. We found that δ^UIVW is more efficient than δ^MR if wss > bss, and vice versa. The precision trade-off is captured well by the adaptively determined weights in δ^AWE. We observe that δ^AWE is more efficient than the usual meta-analytic estimators δ^UIVW, δ^MR or δ^TS, and had almost the same efficiency as δ^IPD and δ^MIVW under all simulation scenarios. The findings with finite samples are consistent with our analytical results in Theorems 2 and 3 and Lin and Zeng (2010b).

In terms of power, we find the proposed methods (IPD, UIVW, MIVW, MR, TS and AWE) are divided into three groups in Figure 3, as expected. Group 1: IPD, MIVW and AWE; group 2: UIVW; group 3: MR. Group 1 has the most powerful tests, which is consistent with our analytical results and Lin and Zeng (2010b); group 2 is more powerful than group 3 if bss < wss, and vice versa. TS performs similarly as the better group between groups 2 and 3. The empirical estimates of Type-I error are close to the true 0.05 level for all tests.

Figure 3.

Figure 3

Comparison of the proposed meta-analytical methods (in terms of power) under different scenarios of susceptibility models and covariate heterogeneity through a simulation study, where data are simulated without any assumption on gene-environment independence or homogeneity in allele frequencies across studies.

Heterogeneous GEI effects across studies

We examine the power and Type-I error corresponding to the proposed methods, where δ1, ..., δK are heterogeneous across studies in model (1). In particular, for each true effect size δ* determined by a given R2 as in section 2.3, we generate σkiidU(0,2δ) that vary across studies but has the same mean δ*. Appendix Table 5 shows the power and Type-I error under data generated from the heterogeneous GEI model, where we do not observe any substantial difference with the testing results under homogeneous GEI model.

Misspecification of the genetic susceptibility model

We examine the power under misspecified susceptibility models (dominant/additive), where the true generating model is co-dominant. When δAA = 1.5 δAa (we accordingly choose βGAA=1.5βGAa), i.e., the second copy of A has an effect size between the two assumed in dominant (δAA = δAa) and additive (δAA = 1.5 δAa) models, there is no substantial difference of power between the misspecified dominant/additive model and the co-dominant model (shown in Appendix Figures 7), because the misspecification is not strong and the fitted dominant or additive models use one less parameter. When δAA = − δAa (we accordingly choose βGAA=βGAa, i.e., the second copy of A has a reverse effect, the fitted dominant or additive models had much less power than the co-dominant model (shown in Figure 5). Thus, it could happen that the co-dominant model has more power compared to other simpler models, though it uses two additional parameters for capturing GEI.

Figure 5.

Figure 5

Power curves under misspecified susceptibility models (dominant/additive), where the gen erating co-dominant model has δAA = −δAa, where data are simulated without any assumption on gene-environment independence or homogeneity in allele frequencies.

Lack of common set of covariates across studies

Figure 4 shows the power curves under this situation without either Assumption 1 or 2. Compared to the basic setting without covariate adjustment (Figure 3), there is no substantial difference in the relative performances of these methods. We observe that the GEI estimate δ^ and variance v^(δ^) is fairly stable, though the main effects of β^G and β^E are substantially influenced under this situation. VanderWeele et al. (2012) also show similar results that, under G-E independence, there is no effect of unmeasured environmental confounding on the GEI parameter; and that if G and E are dependent, the environmental confounding needs to be very strong to incur substantial bias in GEI. Power curves under various other settings with similar results are given in Appendix Figures 4-6.

Figure 4.

Figure 4

Comparison of the proposed meta-analytical methods (in terms of power) under different scenarios of susceptibility models and covariate heterogeneity through a simulation study (representing the situation of lack of common set of covariates across studies), where data are simulated without any assumption on gene-environment independence or homogeneity in allele frequencies.

Non-linear GEI model

When the IPD are generated under the non-linear GEI model, the power to detect GEI from individual studies is very low (< 0.25), except study 10 where the sample size n10, effect size (depends on E) and variance σE102 are all relatively larger than the other studies (Figure 2). In Table 2, IPD, MIVW and AWE show the highest powers. PwMIVW is close to PwIPD because the model based standard errors of δ^IPD and δ^MIVW are asymptotically the same. Because most of the 20 studies are unable to capture the true non-linear GEI, especially those with restricted range of E at the two extremes of the E distribution, the non-linearity of GEI leads to the low power of δ^UIVW. In this particular example, we observe that PwMR is greater than PwUIVW. Instead of choosing alternatively between δ^UIVW and δ^MR, we can use δ^AWE as the default meta-analytic estimator. The relative performance of δ^AWE is close to δ^IPD. This is a practically noteworthy finding as a linear interaction model is typically the initial screening tool, and the AWE is able to pick up signals under model misspecification that univariate meta-analysis methods can not.

Table 2.

Comparison of methods in terms of estimate, standard error of the estimate and power for GEI, under a simulation study of non-linear GEI. [IPD: individual patient data; UIVW: univariate inverse- variance weighted estimator; MIVW: multivariate inverse-variance weighted estimator; MR: Meta-regression; AWE: adaptively weighted estimator combining UIVW and MR; TS: two-stage approach.]

Methods Estimate SEa Power
IPD 0.21 0.045 0.98
UIVW 0.18 0.070 0.69
MIVW 0.21 0.045 0.98
MR 0.23 0.060 0.82
AWE 0.21 0.045 0.98
TS 0.85
a

SE: standard error.

3.2 Data analysis for a set of studies investigating T2D

The proposed methods are applied to a set of studies investigating T2D, including 8 European cohorts: FIN-D2D 2007 study (D2D2007), DIAbetes GENetic study (DIAGEN), Finnish Diabetes Prevention Study (DPS), Finland-United States Investigation of NIDDM Genetics study (FUSION, FUSION S2), Nord-Trϕndelag Health Study 2 (HUNT), METabolic Syndrome in Men study (METSIM) and Tromsϕ study (TROMSO). A number of SNPs in the FTO gene region (16q12.2) have previously been identified to be associated with T2D and BMI in the DIAGRAM consortium (Zeggini et al. (2008), Voight et al. (2010)), where the variants at FTO gene are known to influence T2D predisposition through an effect on BMI. Age, BMI and sex are all known risk factors for T2D and the T2D related quantitative trait HDL-C (Scott et al. (2012), Morris et al. (2012)). In this paper, we investigated whether SNPs in FTO gene modifies the effect of environmental factors (e.g. age and BMI) on HDL-C. The effect modification characterized by interaction on HDL-C has not been reported so far, though marginal association between SNPs in FTO and HDL-C have been noted previously (Kring et al. (2008), Doney et al. (2009)).

Among the 8 cohorts, we have a total of N = 11, 150 genotyped participants who have HDL-C levels, age, sex and BMI available, with sample sizes ranging between 172 and 2,729. Participants known to be on lipid medication are excluded from this analysis. The descriptive summary statistics for the 8 cohorts are shown in Table 3. Since the SNPs we initially examined (namely, 10 SNPs in FTO strongly associated with T2D/obesity/BMI that are listed on the National Human Genome Research Institute GWAS catalog) are in high linkage disequilibrium and show very similar results, we only present our results for one representative SNP, rs1121980. The SNP's genotype follows Hardy-Weinberg equilibrium, and we did not need any imputation given the missing genotype proportion < 0.1%. The MAF of rs1121980 ranges from 0.40-0.49 across cohorts, as a suggestive evidence for no violation of Assumption 2. As in Table 3, the mean age ranges from 55-67 years except FUSION (mean age=39). This cohort is younger because it is actually a sub-cohort of the original FUSION study, with either spouse or offspring of T2D selected. The mean BMI ranges from 26-28 kg/m2 except the DPS cohort (mean BMI=31). DPS cohort has an inclusion criterion of having BMI> 25 at baseline. The covariate heterogeneity of E between cohorts is relatively small, where BSSage/T SSage = 15% and BSSBMI/T SSBMI = 2% respectively. The two ‘outlier’ cohorts, FUSION and DPS, both have only very small sample sizes compared to the other studies, so their influence on UIVW and MIVW is expected to be small. However, their influence on MR could be substantial due to a small number of studies.

Table 3.

Summary statistics for the 8 European cohorts.

T2D HDL-C (mmol/l) rs1121980 Age (year) BMI (kg/m2) Gender SNP*Age SNP*BMI
Cohortsa N Mean (SD) MAFb Mean (SDb) Mean (SD) Female (%) Corr (P)c Corr (P)
D2D2007 2116 14 1.46 (0.35) 0.41 58.8 (8.3) 27.2 (4.8) 54 −0.03 (0.19) 0.03 ( 0.24)
DIAGEN 1510 29 1.45 (0.47) 0.46 63.3 (14.3) 27.9 (5.2) 55 −0.01 (0.76) 0.03 ( 0.24)
DPS 433 0.0 1.22 (0.29) 0.44 55.1 (7.1) 31.3 (4.6) 68 −0.02 (0.69) 0.16 (<.01)
FUSION 172 0.0 1.29 (0.32) 0.43 38.6 (10.9) 26.2 (4.9) 55 0.04 (0.56) 0.23 (<.01)
FUSION-S2 2729 31 1.45 (0.41) 0.40 57.3 (8.4) 27.9 (5.1) 44 −0.02 (0.22) 0.06 (<.01)
HUNT 1324 43 1.26 (0.38) 0.47 67.2 (13.1) 28.0 (4.4) 48 <.01 (0.94) 0.06 ( 0.03)
METSIM 1456 43 1.42 (0.40) 0.44 56.3 (6.6) 27.9 (4.7) 0 −0.05 (0.08) 0.03 ( 0.32)
TROMSO 1410 50 1.43 (0.42) 0.49 59.9(12.5) 27.6 (4.7) 50 <.01 (0.91) 0.04(0.15)
Entire study 11150 31 1.41 (0.40) 0.44 59.4(11.3) 27.8 (4.9) 44 <.01 (0.85) 0.05 (<.01)
a

Data reflect patients who were genotyped from the 8 European cohorts;

b

SD: standard deviation; BMI: body mass index; MAF: minor allele frequency;

c

Corr(P): Spearman correlation between SNP rs1121980 and environmental factor with corresponding P-value.

Analysis Model: The IPD model we fitted is given by log(HDL-Cki) = β0k + βGGki + δGki×Eki + βaageki + βb BMIki + βssexki + βtT2Dki + εki(4) for k = 1, ..., 8; i = 1, ..., nk. In model (4), SNP rs1121980 is used for G with both additive and co-dominant coding; BMI and age is used as E in two separate analyses; T2D status is adjusted to account for biased sampling of the genotyped subjects (more T2D cases are genotyped than non-cases). HDL-C is log-transformed in order to reduce the skewness of its distribution. The proposed methods, including IPD, UIVW, MIVW, MR and AWE, are implemented and compared. G-E independence appears to be violated for rs1121980×BMI analysis (Spearman correlations across studies are reported in Table 3). This is expected as FTO is an obesity related gene. G-E independence does not appear to be violated for rs1121980×age (Table 3).

Results: Figure 6 shows the forest plots of estimated GEI from individual cohorts and the combined estimates using joint analysis and meta-analysis. The corresponding numerical results are summarized in Table 4. There is no evidence of effect heterogeneity for both rs1121980×BMI (P = 0.87) and rs1121980×age (P = 0.90) interactions based on Cochran's Q test, so we proceed with a common effect model. In Figure 6 and Table 4, all these meta-analytical methods UIVW, MIVW and AWE showed very similar results as IPD, except MR. For example, rs1121980×BMI, the marginal SNP effects of rs1121980 with mean BMI values across cohorts are shown in Appendix Figure 8, where MR is very sensitive to the outliers as the number of cohorts is small (K=8). MR also appears to lack efficiency due to small K and small ratio BSS/W SS. Here, δ^AWE is robust to the bias from δ^MR since it only assigned a weight of 0.05 on δ^MR(v^(δ^MR)1{v^(δ^UIVW)1+v^(δ^MR)1}). This is further evidence that δ^AWE can data adaptively shrink to the ‘better’ estimator.

Figure 6.

Figure 6

Forest plots showing the estimated gene-environment interactions (under additive model of rs1121980) across the 8 European cohorts, as well as the combined estimates through meta-analysis. [IPD: individual patient data; UIVW: univariate inverse-variance weighted estimator; MIVW: multi variate inverse-variance weighted estimator; AWE: adaptively weighted estimator combining UIVW and Meta-regression.]

In the interaction model (4), positive rs1121980×BMI interactions are found under all proposed methods (except MR) in Table 4, with P-values ranging from 0.01 to 0.03 for the additive model. In particular, the estimates obtained from model (4), when converted in terms of percentage change in actual HDL-C levels, indicated that: with 1 kg/m2 increase in BMI, (1) under additive model, HDL-C level on average decreased by 1.53% (95% CI: (1.37, 1.70)) given rs1121980=GG, by 1.39% (95% CI: (1.28, 1.49)) given rs1121980=AG or GA; and by 1.24% (95% CI: (1.06, 1.42)) given rs1121980=AA; (2) under co-dominant model, HDL-C level decreased by 1.51% (95% CI: (1.33, 1.69)) given rs1121980=GG, by 1.41% (95% CI: (1.27, 1.56)) given rs1121980=AG or GA; and by 1.21% (95% CI: (0.99, 1.42)) given rs1121980=AA. The results under additive and co-dominant models are very close. The trend of the effects of BMI among the three groups defined by rs1121980 indicated that the presence of minor allele A in rs1121980 attenuated the negative association between BMI and HDL-C. We did not find similar rs1121980×BMI interaction effect on low-density lipoprotein cholesterol (LDL-C), total cholesterol or LDL-C/HDL-C ratio. The suggestive effect modification of BMI by the SNPs on FTO that we have found for HDL-C needs to be replicated in independent studies and validated in larger meta-analysis.

4 Discussion

In this paper, we proposed and compared a set of meta-analysis approaches for analyzing GEI. We showed the proposed AWE, as a combination of meta-analysis and meta-regression estimators, performed better than discretely choosing between the two estimators in terms of precision and power. We showed that the precision trade-off between the two components in AWE depends on the covariate heterogeneity through the ratio of the between and within study variances of the covariate E, and that the AWE adaptively weights its component estimators to minimize the variance of the resulting hybrid estimator. The resulting AWE retains full efficiency of the joint analysis using IPD under certain assumptions. The AWE is very simple to calculate based on summary statistics for marginal genetic association and gene-environment interaction parameters (estimate and standard error) along with the covariate mean of E (see Table 1). The computation is simple and scalable to genomewide analysis. We suggest possible use of AWE as a default choice for the meta-analysis of GEI based on summary data. We studied several key features that could potentially influence the efficiency and power for meta-analysis of GEI. The features included: (1) departures from G-E independence; (2) heterogeneity in MAFs across cohorts; (3) lack of a common set of covariates across studies; (4) misspecification of the genetic susceptibility model (dominant/co-dominant/additive); and (5) the presence of a nonlinear form of interaction. Under all the above situations, we found the performance of AWE is close to IPD estimator. In particular, under the non-linear interaction model setting, where standard meta-analytical technique failed and the AWE is able to capture the lost efficiency based on the summary data. We also reported some suggestive evidence for GEI between rs1121980 on the FTO gene and BMI on HDL-C levels.

As a reviewer has pointed out, we are risking some bias for gaining precision in AWE by including MR as a component, and MR is susceptible to ecological bias. However, as we note in the analysis of the T2D example in Table 4, where the MR estimate is quite different from the rest, the AWE aligns itself with the more sensible UIVW estimator. Our simulation results also indicate this adaptive feature of AWE in controlling the bias from the MR component by assigning decreased weight to it (Appendix Figure 9 and Table 6). Moreover, regardless of ecological fallacy, under the additional assumption of G-E independence, AWE remains unbiased. Thus the issue of ecological bias for aggregate analysis in MR is less of a concern for AWE. We also noted that AWE performs well across the whole spectrum of BSS/T SS ratio, not just intermediate values of this quantity (Appendix Figure 9 and Table 6).

We have mainly focused on quantitative traits with an underlying common fixed effect model. The potential limitation of this approach is that the results might not translate directly to dichotomous traits under a case-control design, where assuming G-E independence can lead to huge gain in efficiency (Piegorsch et al. (1994), Umbach and Weinberg (1997), Chatterjee and Carroll (2005)). We plan to extend our methods using a retrospective likelihood framework under a case-control design. Investigating the results under a truly random effects meta-analysis model is another possible extension to our work. Sample code for all methods is available at http://www-personal.umich.edu/~bhramar/software/.

Supplementary Material

Supplementary Material

Acknowledgement

The research of Bhramar Mukherjee was supported by NSF DMS 1007494 and NIH grants ES 20811, and CA 156608. The FUSION study was supported by DK062370. We thank the D2D2007, DIAGEN, DPS, FUSION, HUNT, METSIM and TROMSO investigators for providing access to their data.

References

  1. Aschard H, Hancock DB, London SJ, et al. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Human Heredity. 2011;70:292–300. doi: 10.1159/000323318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berlin JA, Santanna J, Schmid CH, et al. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: Ecological bias rears its ugly head. Statistics in Medicine. 2002;21:371–387. doi: 10.1002/sim.1023. [DOI] [PubMed] [Google Scholar]
  3. Borenstein M, Hedges LV, Higgins J, et al. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods. 2010;1:97–111. doi: 10.1002/jrsm.12. [DOI] [PubMed] [Google Scholar]
  4. Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies. Biometrika. 2005;92:399–418. [Google Scholar]
  5. Cochran W. The combination of estimates from different experiments. Biometrics. 1954;10:101–129. [Google Scholar]
  6. Dai J, Kooperberg C, Leblanc M, et al. Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction. Biometrika. 2012;99:929–944. doi: 10.1093/biomet/ass044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials. 1986;7:177–188. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
  8. Doney A, Dannfald J, Kimber C, et al. The FTO gene is associated with an atherogenic lipid profile and myocardial infarction in patients with type 2 diabetes: A genetics of diabetes audit and research study in tayside Scotland (Go-DARTS) study. Circ Cardiovasc Genet. 2009;2:255–259. doi: 10.1161/CIRCGENETICS.108.822320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dupuis J, Langenberg C, Prokopenko I, et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nature Genetics. 2010;42:105–116. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Fleiss J. Review papers: The statistical basis of meta-analysis. Statistical Methods in Medical Research. 1993;2:121–145. doi: 10.1177/096228029300200202. [DOI] [PubMed] [Google Scholar]
  11. Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews. 1987;9:1–30. doi: 10.1093/oxfordjournals.epirev.a036298. [DOI] [PubMed] [Google Scholar]
  12. Hartung J, Knapp G, Sinha BK. Statistical meta-analysis with applications. Wiley; New York: 2011. [Google Scholar]
  13. Khoury M, Wacholder S. Invited commentary: From genome-wide association studies to gene-environment-wide interaction studies: Challenges and opportunities. American Journal of Epidemiology. 2009;169:227–230. doi: 10.1093/aje/kwn351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kilpeläinen T, Qi L, Brage S, et al. Physical activity attenuates the influence of FTO variants on obesity risk: A meta-analysis of 218,166 adults and 19,268 children. PLoS Medicine. 2011;8:e1001116. doi: 10.1371/journal.pmed.1001116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kooperberg C, LeBlanc M. Increasing the power of identifying gene×gene interactions in genome-wide association studies. Genetic Epidemiology. 2008;32:255–263. doi: 10.1002/gepi.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kovalchik SA. Aggregate-data estimation of an individual patient data linear random effects meta-analysis with a patient covariate-treatment interaction term. Biostatistics. 2013;14:273–283. doi: 10.1093/biostatistics/kxs035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kring S, Holst C, Zimmermann E, et al. FTO gene associated fatness in relation to body fat distribution and metabolic traits throughout a broad range of fatness. PLoS ONE. 2008;3:e2958. doi: 10.1371/journal.pone.0002958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lin D, Zeng D. Meta-analysis of genome-wide association studies:no efficiency gain in using individual participant data. Genetic Epidemiology. 2010a;34:60–66. doi: 10.1002/gepi.20435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lin D, Zeng D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika. 2010b;97:321–332. doi: 10.1093/biomet/asq006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Manning A, LaValley M, Liu C, et al. Meta-analysis of gene-environment interaction: Joint estimation of SNP and SNP×environment regression coefficients. Genetic Epidemiology. 2011;35:11–18. doi: 10.1002/gepi.20546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health. 1982;72:1336–1344. doi: 10.2105/ajph.72.12.1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Morris AP, Voight BF, Teslovich TM, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genetics. 2012;44:981–990. doi: 10.1038/ng.2383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mukherjee B, Ahn J, Gruber S, et al. Testing gene-environment interaction in large-scale case-control association studies: Possible choices and comparisons. American Journal of Epidemiology. 2012;175:177–190. doi: 10.1093/aje/kwr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Statistics in Medicine. 1994;13:153–162. doi: 10.1002/sim.4780130206. [DOI] [PubMed] [Google Scholar]
  25. Psaty B, O'Donnell C, Gudnason V, et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circulation: Cardiovascular Genetics. 2009;2:73–80. doi: 10.1161/CIRCGENETICS.108.829747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Sarwar N, Butterworth A, Freitag D, et al. Interleukin-6 receptor pathways in coronary heart disease: A collaborative meta-analysis of 82 studies. The Lancet. 2012;379:1205–1213. doi: 10.1016/S0140-6736(11)61931-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Saxena R, Saleheen D, Been LF, et al. Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. Diabetes. 2013;62:1746–1755. doi: 10.2337/db12-1077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schwartz S. The fallacy of the ecological fallacy: The potential misuse of a concept and the consequences. American Journal of Public Health. 1994;84:819–824. doi: 10.2105/ajph.84.5.819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Scott L, Mohlke K, Bonnycastle L, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. doi: 10.1126/science.1142382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Scott RA, Lagou V, Welch RP, et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nature Genetics. 2012;44:991–1005. doi: 10.1038/ng.2385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Simmonds M, Higgins J. Covariate heterogeneity in meta-analysis: Criteria for deciding between meta-regression and individual patient data. Statistics in Medicine. 2007;26:2982–2999. doi: 10.1002/sim.2768. [DOI] [PubMed] [Google Scholar]
  32. Song C, Chen GK, Millikan RC, et al. A genome-wide scan for breast cancer risk haplotypes among African American women. PloS ONE. 2013;8:e57298. doi: 10.1371/journal.pone.0057298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Speliotes EK, Willer CJ, Berndt SI, et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics. 2010;42:937–948. doi: 10.1038/ng.686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Teslovich TM, Musunuru K, Smith AV, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature Genetics. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Umbach DM, Weinberg CR. Designing and analysing case-control studies to exploit independence of genotype and exposure. Statistics in Medicine. 1997;16:1731–1743. doi: 10.1002/(sici)1097-0258(19970815)16:15<1731::aid-sim595>3.0.co;2-s. [DOI] [PubMed] [Google Scholar]
  36. VanderWeele T, Mukherjee B, Chen J. Sensitivity analysis for interactions under unmeasured confounding. Statistics in Medicine. 2012;31:2552–2564. doi: 10.1002/sim.4354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Voight BF, Scott LJ, Steinthorsdottir V, et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nature Genetics. 2010;42:579–589. doi: 10.1038/ng.609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Whitehead A, Whitehead J. A general parametric approach to the meta-analysis of randomized clinical trials. Statistics in Medicine. 1991;10:1665–1677. doi: 10.1002/sim.4780101105. [DOI] [PubMed] [Google Scholar]
  39. Willer C, Li Y, Abecasis G. METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zeggini E, Scott L, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics. 2008;40:638–645. doi: 10.1038/ng.120. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES