Testing Departure from Additivity in Tukey’s Model using Shrinkage: Application to a Longitudinal Setting

Yi-An Ko; Bhramar Mukherjee; Jennifer A Smith; Sung Kyun Park; Sharon LR Kardia; Matthew A Allison; Pantel S Vokonas; Jinbo Chen; Ana V Diez-Roux

doi:10.1002/sim.6281

. Author manuscript; available in PMC: 2014 Dec 20.

Published in final edited form as: Stat Med. 2014 Aug 11;33(29):5177–5191. doi: 10.1002/sim.6281

Testing Departure from Additivity in Tukey’s Model using Shrinkage: Application to a Longitudinal Setting

Yi-An Ko ^a,^*, Bhramar Mukherjee ^a, Jennifer A Smith ^b, Sung Kyun Park ^b,^c, Sharon LR Kardia ^b, Matthew A Allison ^d, Pantel S Vokonas ^e,^f,^g, Jinbo Chen ^h, Ana V Diez-Roux ⁱ

PMCID: PMC4227925 NIHMSID: NIHMS615558 PMID: 25112650

Abstract

While there has been extensive research developing gene-environment interaction (GEI) methods in case-control studies, little attention has been given to sparse and efficient modeling of GEI in longitudinal studies. In a two-way table for GEI with rows and columns as categorical variables, a conventional saturated interaction model involves estimation of a specific parameter for each cell, with constraints ensuring identifiability. The estimates are unbiased but are potentially inefficient because the number of parameters to be estimated can grow quickly with increasing categories of row/column factors. On the other hand, Tukey’s one degree of freedom (df) model for non-additivity treats the interaction term as a scaled product of row and column main effects. Due to the parsimonious form of interaction, the interaction estimate leads to enhanced efficiency and the corresponding test could lead to increased power. Unfortunately, Tukey’s model gives biased estimates and low power if the model is misspecified. When screening multiple GEIs where each genetic and environmental marker may exhibit a distinct interaction pattern, a robust estimator for interaction is important for GEI detection. We propose a shrinkage estimator for interaction effects that combines estimates from both Tukey’s and saturated interaction models and use the corresponding Wald test for testing interaction in a longitudinal setting. The proposed estimator is robust to misspecification of interaction structure. We illustrate the proposed methods using two longitudinal studies — the Normative Aging Study and the Multi-Ethnic Study of Atherosclerosis.

Keywords: adaptive shrinkage estimation, gene-environment interaction, longitudinal data, Tukey’s one df test for non-additivity

1. Introduction

The presence of gene-environment interactions (GEI) implies that the effect of an environmental exposure (E) is enhanced or reduced for a sub-group with a certain genotype or vice versa. Investigation of GEI is essential to better understand the etiology and development of common, complex diseases. Many longitudinal environmental epidemiology studies have been collecting genetic data with the goal of identifying GEI. In these cohort studies, GEI is often investigated by focusing on an established association between an exposure biomarker (e.g., lead levels in blood or bone) and a quantitative trait (e.g., pulse pressure), and how this association is modified by a selected set of genetic markers. The set of genes (candidate genes) to be studied is often determined by the metabolic pathway related to the exposure instead of an agnostic search across the genome.

While there has been extensive literature on GEI regarding ways to enhance the efficiency of interaction test in case-control studies [1, 2, 3], statistical methods for GEI in longitudinal settings remain limited. Methods to study disease-gene association in longitudinal settings, however, have started to receive attention. For instance, Wang et al. [4] proposed to estimate and test for time-varying genetic effects using semiparametric models with penalized splines. Fan et al. [5] also used penalized spline models to estimate the mean function and genetic regression coefficients with extensions to linkage disequilibrium (LD) mapping. Nevertheless, very limited number of studies have focused on testing of gene-gene interactions (GGI) or GEI for complex traits in longitudinal settings. The multivariate adaptive splines presented by Zhang [6, 7] have been applied to analyze GEI in longitudinal cohort studies (e.g., Zhu et al. [8]). Xu [9] developed an empirical Bayes method to estimate GGI effects under the mixed model framework and compared it with several variable selection procedures. Malzahn et al. [10] developed a nonparametric test for investigation of GGI in repeated measures data using a rank procedure. Mukherjee et al. [11] proposed to explore the GEI structure with various parsimonious classical ANOVA models for non-additivity by taking the average of repeated measurements and forming cell means of a two-way GEI table. Along the same lines, Ko et al. [12] extended the classical ANOVA models under a mixed model framework and developed a resampling-based test for GEI that accounts for correlation within repeated measures.

Typically, an interaction model including cross-product terms of gene and environment under the mixed model framework is used for testing GGI and GEI in longitudinal studies [13]. In considering the estimation of GEI for longitudinal data where both the genetic factor (G) and E are categorical variables, this conventional modeling approach involves distinct parameter estimation for each configuration of GEI (i.e., a saturated interaction form) with sum-to-zero type constraints to ensure identifiability. Estimation bias is minimized since the model does not impose any structural assumptions on the interaction term. However, the number of parameters and hence the corresponding degrees of freedom (df) for the interaction test can become substantially large as the number of categories of G and/or E increases. In addition, under a saturated interaction model only observations in a cell can contribute to the parameter estimation for that cell. This may result in reduced efficiency and loss of power for detecting interactions because of small cell sample size in human studies involving a gene with a modest minor allele frequency.

Tukey’s one df model for non-additivity [14], originally proposed for data with no replication per cell, has been applied to the modeling of GGI in cohort studies. Maity et al. [15] used Tukey’s form of interaction for repeated measures data to test main genetic associations in the presence of GGI. The interaction term in Tukey’s model is treated as a scaled product of main effects, implying that the existence of interaction is conditional on the presence of main effects. When a GEI study is based on a two-stage strategy, namely, the candidate genes are selected based on marginal genetic associations [16, 17], it may be reasonable to adopt Tukey’s interaction form for GEI. Chatterjee et al. [18] proposed that Tukey’s model is also consistent with the notion that individual markers within a gene are associated with disease through a common biological mechanism. However, when candidate genes are chosen in relation to an exposure pathway, genes may not necessarily have main effects. Also, when the assumption of Tukey’s interaction structure is violated (e.g., absence of genetic main effects), the estimate for the interaction effect using Tukey’s model will be biased and the corresponding one-df test can result in extremely low power [11, 19].

When searching for GEI across multiple genetic markers, it is possible that GEIs exhibit distinct interaction patterns, departing from Tukey’s model. Conducting multiple tests under a fixed interaction structure (e.g., Tukey) may not capture interactions of alternative forms. At the same time, it would be advantageous to leverage the power of Tukey’s test if it is indeed a plausible model. Given as such, we propose to model GEI using a shrinkage estimator that combines estimates from Tukey’s model and from the saturated interaction model. An adaptive framework is utilized similar to that described by Mukherjee et al. [2]. This estimator will shrink the maximum likelihood estimates (MLEs) under a flexible interaction structure toward Tukey’s model estimates. The amount of shrinkage is data adaptive, so that in large samples, such estimator is unbiased even if Tukey’s assumption is violated. More importantly, when compared to a saturated model, the shrinkage estimator has reduced mean squared error (MSE) for small samples [20]. Although Tukey’s model has been used to model GEI or GGI under a generalized linear model setting [15, 18, 19], no prior work has been carried out to data-adaptively combine Tukey’s model and saturated interaction model to take advantage of both models for testing GEI. Thus, the shrinkage approach is not only novel for longitudinal data but also a new approach for cross-sectional data.

In Section 2, we introduce notations for GEI models using a mixed-effects model framework. The parameter estimation for Tukey’s model with repeated measures data is described in Section 3. In Section 4, we propose a shrinkage estimator and derive its approximate variance estimate. In Section 5, we summarize the test for interaction corresponding to each method. In Section 6, we evaluate the performance of our proposed methods via simulation studies. In particular, we compare the average performance by generating GEIs with different interaction structures to mimic a hypothetical GEI search study involving multiple genetic markers. In Section 7, we apply the proposed methods to search GEI between 105 single-nucleotide polymorphisms (SNPs) within 22 genes in the iron metabolism pathway and cumulative lead exposure on pulse pressure using the Normative Aging Study (NAS) data. We also test GEI between 27 SNPs and energy intake and intentional exercise on body mass index (BMI) using data from the Multi-Ethnic Study of Atherosclerosis (MESA). These 27 SNPs have been shown to be significantly associated with BMI in previous genome wide association studies (GWAS). In NAS, genes are chosen in relation to the exposure pathway. In MESA, the question is whether the loci identified by GWAS (with marginal effects) modify the effect of certain exposures. Another distinction between the two data examples is that one of the exposure variables considered in MESA, intentional exercise, is a time-varying variable, while the other two, energy intake in MESA and cumulative lead exposure in NAS, are time-invariant (i.e., both are baseline measurements).

2. Model

Let y_kt be the value of the t-th repeated measure on a phenotypic response Y corresponding to the k-th individual (t = 1, …, n_k, k = 1, …, N). Define a mixed-effects model for the n_k × 1 response vector y_k = (y_k₁, y_k₂, …, y_{kn_k})^⊤ such that it is related to an n_k × ν matrix of explanatory variables X_k = (x_k₁, x_k₂, …, x_{k_{n_k}})^⊤, with each x_kt a ν × 1 vector associated with y_kt, through some nonlinear function f. Namely,

y_{k} = f (η, X_{k}) + Z_{k} b_{k} + e_{k},

(1)

where η is the p-dimensional vector of fixed effects, f(η, X_k) is the n_k × 1 mean vector, b_k ~ Inline graphic (0, Ψ) is the q-dimensional vector of random effects, Z_k is the design matrix of size n_k × q for the random effects satisfying rank(Z_k) = q ≤ n_k for all k, and e_k = (e_k₁, …, e_{kn_k})^⊤ ~ (0, Σ_k) is the n_k-dimensional vector of random errors. The random effects b_k are assumed to be independent of e_k. Let V_k(ω) be the variance matrix of y_k, $V_{k} (ω) = Z_{k} Ψ Z_{k}^{⊤} + \sum_{k}$ . Here ω consists of parameters in Ψ and Σ_k.

We use (1) to model the association between the phenotypic response of interest and genetic and environmental exposure factors. Let G_k be the genotype and E_kt be the exposure level for the k-th subject at the t-th measurement, G_k = i, i = 1, 2, …, I, E_kt = j, j = 1, 2, …, J. Both G_k and E_kt are assumed to be categorical variables. Without considering any covariates, the mean structure for y_kt under Tukey’s model [14] has the following form

f (η, x_{k t}) = f (β, θ, x_{k t}) = β_{0} + \sum_{i = 1}^{I} β_{i}^{G} I (G_{k} = i) + \sum_{j = 1}^{J} β_{j}^{E} I (E_{k t} = j) + θ \sum_{i = 1}^{I} \sum_{j = 1}^{J} β_{i}^{G} β_{j}^{E} I (G_{k} = i, E_{k t} = j) .

(2)

Here η has two components, η = (β^⊤, θ)^⊤. β consists of the intercept β₀, the parameters for genetic main effects, $β^{G} = {(β_{1}^{G}, \dots, β_{I}^{G})}^{⊤}$ , and exposure main effects, $β^{E} = {(β_{1}^{E}, \dots, β_{J}^{E})}^{⊤}$ . θ is a scale parameter representing the interaction effect. A saturated interaction model, on the other hand, allows for separate interaction parameters for each GEI configuration:

f (η, x_{k t}) = f (β, τ, x_{k t}) = β_{0} + \sum_{i = 1}^{I} β_{i}^{G} I (G_{k} = i) + \sum_{j = 1}^{J} β_{j}^{E} I (E_{k t} = j) + \sum_{i = 1}^{I} \sum_{j = 1}^{J} τ_{i j} I (G_{k} = i, E_{k t} = j),

(3)

where τ = (τ₁₁, …, τ_IJ)^⊤ is the interaction parameter vector with length IJ. Due to the constraints for parameter identifiability, $\sum_{i} β_{i}^{G} = \sum_{j} β_{j}^{E} = 0$ , β^G and β^E are left with (I − 1) and (J − 1) independent parameters to be estimated, respectively. Similarly, Σ_i τ_ij = Σ_j τ_ij = 0, so (I − 1)(J − 1) parameters in τ are left to be estimated.

3. Parameter Estimation for Tukey’s Model with Repeated Measures Data

We describe the estimation strategy for the parameters of Tukey’s model. The log-likelihood for the data y₁, …, y_N is

ℓ (η, ω, ∣ y_{1}, \dots, y_{N}) = const . - \frac{1}{2} \sum_{k = 1}^{N} log ∣ V_{k} (ω) ∣ - \frac{1}{2} \sum_{k = 1}^{N} {{[y_{k} - f (η, X_{k})]}^{⊤} V_{k} {(ω)}^{- 1} [y_{k} - f (η, X_{k})]} .

(4)

Given V_k(ω), maximizing the likelihood is equivalent to minimizing the objective function

Q (η ∣ ω) = \sum_{k = 1}^{N} {[y_{k} - f (η, X_{k})]}^{⊤} V_{k} {(ω)}^{- 1} [y_{k} - f (η, X_{k})]

(5)

with respect to η. The solution for η is the generalized least squares (GLS) estimator. Since the estimation for fixed effects in Tukey’s model does not have a closed-form solution, the iterative linearization method is considered.

The linearization method uses a first-order Taylor series expansion to approximate solutions of a general function by a linear function [21], which has been applied to nonlinear mixed-effects models [22, 23, 24]. Let η^* = η̂⁽⁰⁾ = (β̂^(0)⊤, θ̂⁽⁰⁾)^⊤ denote the initial estimate of η = (β^⊤, θ)^⊤. The first-order Taylor series expansion of f(η, X_k) about η = η^* is

f (η, X_{k}) \approx f (η^{*}, X_{k}) + D_{k}^{*} (η - η^{*}),

(6)

where $D_{k}^{*}$ is an n_k × p matrix $D_{k}^{* ⊤} = D_{k}^{⊤} (η^{*}) = {{\partial f (η) / \partial η_{1}, \dots, \partial f (η) / \partial η_{p}} |}_{η^{*}}$ . Initial values of η^* can be obtained by fitting a saturated interaction model (via standard linear mixed effects model) and using the main effect estimates as β^*. After removing main effects, the residuals can then be regressed on the product term $β_{i}^{G *} β_{j}^{E *}$ (without intercept) to obtain θ^*. The mean function of Tukey’s model for the k-th subject at the t-th measurement is

f (η, x_{k t}) \approx f (η^{*}, x_{k t}) + (β_{0} - β_{0}^{*}) + \sum_{i} \sum_{j} [(1 + θ^{*} β_{j}^{E *}) (β_{i}^{G} - β_{i}^{G *}) + (1 + θ^{*} β_{i}^{G *}) (β_{j}^{E} - β_{j}^{E *}) + β_{i}^{G *} β_{j}^{E *} (θ - θ^{*})] I (G_{k} = i, E_{k t} = j),

where $f (η^{*}, x_{k t}) = β_{0}^{*} + \sum_{i} β_{i}^{G *} I (G_{k} = i) + \sum_{j} β_{j}^{E *} I (E_{k t} = j) + θ^{*} \sum_{i} \sum_{j} β_{i}^{G *} β_{j}^{E *} I (G_{k} = i, E_{k t} = j)$ . Following (1), the expansion in (6) yields the approximation

y_{k} = f (η^{*}, X_{k}) + D_{k}^{*} (η - η^{*}) + Z_{k} b_{k} + e_{k},

which can be expressed as a linear model

y_{k}^{*} = D_{k}^{*} η + Z_{k} b_{k} + e_{k},

(7)

where $y_{k}^{*} = y_{k} - f (η^{*}, X_{k}) + D_{k}^{*} η^{*}$ . Then the GLS estimator for η is given by

{\hat{η}}_{GLS} = {(\sum_{k = 1}^{N} D_{k}^{* ⊤} {\hat{V}}_{k}^{* - 1} D_{k}^{*})}^{- 1} \sum_{k = 1}^{N} D_{k}^{* ⊤} {\hat{V}}_{k}^{* - 1} y_{k}^{*},

(8)

where ${\hat{V}}_{k}^{*}$ is the assumed covariance matrix of $y_{k}^{*}$ evaluated at ω = ω^*. When η and ω are unknown, a common strategy is to replace V(ω) with a consistent estimate and minimize the corresponding weighted sum of squares to yield an initial estimate of η. The MLE of ω is obtained by maximizing (4) with respect to ω, after η is replaced by the estimate in (8).

This iteratively reweighted generalized least-squares (IRGLS) algorithm involves iterations between [a] Taylor series linearization – given the w-th iterates η̂⁽^w⁾ and ω̂⁽^w⁾, construct $D_{k}^{(w)} = D ({\hat{η}}^{(w)})$ and ${\hat{r}}_{k}^{(ω)} = y_{k} - f ({\hat{η}}^{(w)}, X_{k}) + D_{k}^{(w)} {\hat{η}}^{(w)}$ to yield a pseudo model that is of the form of (7) – and [b] updating estimates η̂⁽^w⁺¹⁾ in (8) and ω̂⁽^w⁺¹⁾. Steps [a] and [b] are repeated until a convergence criterion is achieved.

The linearization method provides an easy calculation for nonlinear models by translating the nonlinear estimation problem into a linear model. Only the first-order derivatives are required. Though the assumption of normality is not required for estimates from this IRGLS procedure, minimizing the objective function (5) is equivalent to maximizing the joint log-likelihood function of y_k in (4). Hence, this procedure yields MLEs [25]. Vonesh et al. [26] argued that the IRGLS estimator is consistent and asymptotically normal even when the variance-covariance structure is misspecified if the mean function f(η, X_k) is correctly specified. Our experience is that the proposed estimation algorithm for Tukey’s model converges relatively fast and the final estimates are insensitive to initial values. Nevertheless, seriously slow convergence or possibly non-convergence could occur when one or both of the main effects are truly absent, a situation where θ is not identifiable.

4. Shrinkage Estimator

We now construct a shrinkage estimator for interaction that is a weighted average of the estimators from Tukey’s model and a saturated interaction model. Denote the interaction parameters to be estimated for an I × J GEI table by τ = (τ₁₁, τ₂₁, …, τ₍_I₋₁₎₁, τ₁₂, …, τ₍_I₋₁₎₍_J₋₁₎)^⊤. Let τ_tuk and τ_sat be the asymptotic limits of the estimator of τ from Tukey’s model and saturated interaction model, respectively, each being a length-(I − 1)(J − 1) vector. When the true model is a Tukey’s one-df model, we have τ_tuk − τ_sat = δ(say) = 0. To relax the model assumption, let δ ~ Inline graphic (0, Θ). A conservative estimate of Θ is given by δ̂δ̂^⊤, where δ̂ = τ̂_tuk − τ̂_sat and ${\hat{τ}}_{tuk} = \hat{θ} {({\hat{β}}_{1}^{G} {\hat{β}}_{1}^{E}, {\hat{β}}_{2}^{G} {\hat{β}}_{1}^{E}, \dots, {\hat{β}}_{I - 1}^{G} {\hat{β}}_{J - 1}^{E})}^{⊤}$ . We define B = V̂_τ(V̂_τ + δ̂δ̂^⊤)⁻¹, where V̂_τ is the estimated variance-covariance matrix of τ̂_sat. Then the proposed shrinkage estimator for τ is given by

{\hat{τ}}_{shk} = {\hat{τ}}_{sat} + B ({\hat{τ}}_{tuk} - {\hat{τ}}_{sat}),

(9)

where τ̂_tuk and τ̂_sat are MLEs from (2) and (3), respectively.

The shrinkage factor B in (9) determines the amount of shrinkage of τ̂_sat toward τ̂_tuk. As δ̂ → 0 and B → I, τ̂_shk → τ̂_tuk (data are indicative of a Tukey’s interaction structure). On the other hand, as the bias of Tukey’s model estimator δ̂ increases, the largest eigenvalue of B goes to 0 and τ̂_shk → τ̂_sat (data are not in favor of Tukey’s form of interaction). Now express the shrinkage estimator in (9) as

{\hat{τ}}_{shk} = {\hat{τ}}_{sat} + {\hat{V}}_{τ} ({\hat{V}}_{τ}^{- 1} - \frac{{\hat{V}}_{τ}^{- 1} \hat{δ} {\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1}}{1 + {\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ}}) \hat{δ} = {\hat{τ}}_{sat} + \hat{δ} - \hat{δ} (\frac{{\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ}}{1 + {\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ}}) .

When data are under Tukey’s model, δ̂ → 0 as N → ∞. When data are not under Tukey’s model, the largest eigenvalue of V̂_τ goes to 0 and ${\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ} \to \infty$ as N → ∞. So, the term $({\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ}) / (1 + {\hat{δ}}^{⊤} {\hat{V}}_{τ}^{- 1} \hat{δ})$ converges to 1. This indicates that τ̂_shk is asymptotically equivalent to τ̂_sat, which is an unbiased estimator of τ. But with moderate sample size, δ̂ creates a small bias in τ̂_shk that can be traded for a larger decrease in variance, leading to an improvement in finite sample MSE [2]. In addition, when main effects are not present, the shrinkage estimator will guard against the instability of parameter estimates under Tukey’s model by shrinking τ̂_shk toward τ̂_sat.

4.1. Variance Estimation for the Shrinkage Estimator

We proceed to estimate the covariance matrix for τ̂_shk. As a result of asymptotic equivalence of τ̂_shk and τ̂_sat, the covariance matrix for τ̂_sat can be used as an estimator for the covariance matrix of τ̂_shk in large samples. Since this estimator is often too conservative in finite samples, we develop an approximate covariance matrix estimator for τ̂_shk using the delta method.

Define $\hat{ϕ} = {(τ_{sat}^{⊤}, {\hat{η}}_{tuk}^{⊤})}^{⊤}$ as the MLEs under a saturated form of interaction and Tukey’s model with ${\hat{η}}_{tuk} = {({\hat{β}}_{1}^{G}, \dots, {\hat{β}}_{I - 1}^{G}, {\hat{β}}_{1}^{E}, \dots, {\hat{β}}_{J - 1}^{E}, \hat{θ})}^{⊤}$ . Further define ξ̂ = (τ̂_sat, τ̂_tuk)^⊤ = h(ϕ̂) such that τ̂_shk = g(ξ̂) = g(h(ϕ̂)), where ξ̂ and g(ξ̂) have 2(I − 1)(J − 1) and (I − 1)(J − 1) elements, respectively. We first derive the joint distribution of the components in ϕ̂. Let Inline graphic be the information matrix with dimension (I − 1)(J − 1) × (I − 1)(J − 1) and ℓ be the log-likelihood corresponding to a saturated interaction model (3). Let be the information matrix with dimension (I + J − 1) × (I + J − 1) and ℓ₀ be the log-likelihood for Tukey’s model (2). By the consistency of ϕ̂, the MLE τ̂_sat has an asymptotic linear representation

\sqrt{N} ({\hat{τ}}_{sat} - τ) = \frac{1}{\sqrt{N}} \sum_{k = 1}^{N} I^{- 1} {\dot{ℓ}}_{k} + o_{p} (1) as N \to \infty, where {\dot{ℓ}}_{k} = \partial ℓ (X_{k}) / \partial τ .

Similarly,

\sqrt{N} ({\hat{η}}_{tuk} - η) = \frac{1}{\sqrt{N}} \sum_{k = 1}^{N} I_{0}^{- 1} {\dot{ℓ}}_{0 k} + o_{p} (1) as N \to \infty, where {\dot{ℓ}}_{0 k} = \partial ℓ_{0} (X_{k}) / \partial η_{tuk} .

Denote the asymptotic variance-covariance matrix of ϕ̂ by Σ_ϕ̂. Then by multivariate Taylor series expansion, the variance-covariance matrix of ξ̂ = h(ϕ̂) is approximated by

{\sum^{^}}_{\hat{ξ}} \approx {\nabla h (\hat{ϕ})}^{⊤} {\sum^{^}}_{\hat{ϕ}} \nabla h (\hat{ϕ}),

where ∇h = ∂h/∂ϕ is the gradient matrix of h evaluated at ϕ̂. Finally, the variance-covariance matrix of τ̂_shk is approximated by applying the delta method:

{\sum^{^}}_{{\hat{τ}}_{shk}} = c \hat{o} v ({\hat{τ}}_{shk}) = c \hat{o} v (g (\hat{ξ})) \approx {\nabla g (\hat{ξ})}^{⊤} {\sum^{^}}_{\hat{ξ}} \nabla g (\hat{ξ}),

(10)

where ∇g = ∂g/∂ξ evaluated at ξ̂ (refer to the supporting information for ∇h(ϕ̂) and ∇g(ξ̂).) Comparing Σ̂_{τ̂_shk} to the empirical estimate of variance-covariance matrix through simulations, we found that variance components can be estimated very well by Σ̂_{τ̂_shk} but not necessarily the covariance. Either a small variance for the random measurement errors or a large sample size is needed to obtain accurate estimates of covariance terms (see Table 1 in the supporting information). Since the magnitudes of covariance estimates are smaller compared to the variance estimates, the influence of covariance estimates on the Wald test statistic is expected to be small. Thus, the proposed shrinkage test (see below) is still an approximately valid test with conservative Type 1 error rates.

5. Tests for Interaction Effects

We are interested in testing the null hypothesis of no interaction effects H₀ : τ = 0 versus H₁ : τ ≠ 0. For Tukey’s model, it is equivalent to H₀ : θ = 0 versus H₁ : θ ≠ 0. A likelihood ratio test (LRT) statistic is given by T_L = −2(l₀ − l₁), where l₀ and l₁ are the maximized log-likelihoods obtained under H₀ and H₁, respectively. Under regularity conditions, $T_{L} ~ χ_{1}^{2}$ for Tukey’s model and $T_{L} ~ χ_{(I - 1) (J - 1)}^{2}$ for saturated model under H₀ for large samples. Based on (9) and [20], the limiting distribution of the shrinkage estimator is technically not normal. The simulation results, however, reveal that this estimator is well approximated by a normal density and the amount of departure from normality is small (see Figure 1 in the supporting information). Hence, the Wald test is used as an approximate test for interaction. The test statistic for H₀ : τ = 0 is given by ${\tilde{T}}_{W} = {\hat{τ}}_{shk}^{⊤} {\sum^{^}}_{{\hat{τ}}_{shk}}^{- 1} {\hat{τ}}_{shk}$ , where Σ̂_{τ̂_shk} can be found in (10). T̃_W approximately follows a χ² with df = (I − 1)(J − 1) under H₀ (see Figure 2 in the supporting information).

6. Simulation Study

6.1. Settings for Evaluation of Test Properties for a Single GEI Test

We investigated the Type I error and power properties of the following three test procedures for interaction: the LRT under Tukey’s model of interaction, the Wald test using the proposed adaptive shrinkage estimator, and the LRT using a saturated interaction model. Two null hypotheses of no interactions were considered: (i) the genetic main effects were present (additive) and (ii) the genetic main effect were absent (null). The main effects of the exposure were always present in our simulations to represent a study looking for genetic modification effects on an established phenotype-exposure association. For these comparisons, we used 3×3 table settings for GEI with N=1200. The number of repeated measurements per subject was generated from a multinomial distribution similar to the example data: n_ijk ∈ {2, 3, 4, 5, 6}, n = {n_ijk : 1 ≤ k ≤ N_ij, 1 ≤ i ≤ I, 1 ≤ j ≤ J} ~ Mult(N, p), p = (0.15, 0.2, 0.3, 0.2, 0.15), which implies that dropouts are missing completely at random. Data were simulated under a first-order autoregressive (AR-1) correlation structure: σ²ρ^|t−t′| for the (t, t′)-th element in Σ_k (σ² = 4, 8 and ρ = 0.7). Additionally, the test properties were evaluated under misspecification of correlation structure. Again, data were still generated under the AR-1 correlation structure but were analyzed using a compound symmetric covariance structure. A total of 1000 datasets were generated for each setting. Type I error and power were estimated by the sample proportions of null hypothesis being rejected under various simulation settings.

In the 3×3 GEI table settings, three genotype categories were considered for G with minor allele frequency 0.4 and following the Hardy-Weinberg equilibrium. An environmental exposure with three categories (with probabilities 0.25, 0.25, and 0.50) was considered. Cell means for all GEI configurations were first generated under a pre-specified interaction model. Given a mean and covariance structure, the vector of observations per individual were generated from a multivariate normal distribution. In addition to Tukey’s and saturated models, we considered simulations under additive main effects and multiplicative interaction (AMMI) models [27, 28]. AMMI models are a class of interaction models that have a flexible structure, which essentially entails a singular value decomposition (SVD) of the cell residual matrix after removing the additive main effects. Following the notations in (2), the mean structure for y_kt under an AMMI model is given by

f (η, x_{k t}) = f (β, d, α, γ, x_{k t}) = β_{0} + \sum_{i = 1}^{I} β_{i}^{G} I (G_{k} = i) + \sum_{j = 1}^{J} β_{j}^{E} I (E_{k t} = j) + \sum_{i = 1}^{I} \sum_{j = 1}^{J} \sum_{m = 1}^{M} d_{m} α_{i m} γ_{j m} I (G_{k} = i, E_{k t} = j) .

The m-th interaction factor is subject to the constraints $\sum_{i = 1}^{I} α_{i m}^{2} = \sum_{j = 1}^{J} γ_{j m}^{2} = 1$ and $\sum_{i = 1}^{I} α_{i m} = \sum_{j = 1}^{J} γ_{j m} = 0$ , as well as the 2(M − 1) orthogonality restrictions Σ_i α_imα_im_′ = Σ_i γ_jmγ_jm_′ = 0 for m ≠ m′. Specifically, AMMI models with M = 1 (AMMI1) were considered in the simulation as an intermediate model between Tukey and saturated model. AMMI2 would be equivalent to a saturated interaction model in the 3×3 table settings. We compared test performance under AMMI1 models because Tukey’s test may not be capable of capturing interaction of AMMI1 form. Though AMMI1 is nested within the saturated interaction model, the test based on a saturated interaction model may not have as much power to detect the interaction.

6.2. Settings for Assessment of Average Performance for Multiple GEI Tests

When GEI tests are conducted across a moderately large number of SNPs within several gene regions, the average performance of each method over many GEI tests is of particular interest rather than a single specific GEI test. As such, we assessed the Type I error and power of the tests for interaction using Tukey’s model, saturated interaction model, and the proposed shrinkage estimator, averaged over a set of genetic markers. We based our simulation studies on the setting of the NAS data example where the candidate genes were chosen based on some pathway analysis. For each dataset, one exposure factor and 100 independent SNPs (without LD) were generated, with the minor allele frequencies ranging from 0.3 to 0.5. (Simulations with a wider range of minor allele frequencies can be found in the supporting information.) The exposure had five categories, each with probability 0.2. Thus, a 3×5 table was constructed for each GEI test.

We considered two simulation schemes for multiple GEI tests: (i) 100 marginal models, Y_i|G_i,E, i = 1, …, 100, were generated with a common E for each subject, and (ii) a joint multivariate model, Y|G₁,G₂, …, G₁₀₀, E, was generated. In both (i) and (ii), 15 out of 100 SNPs were assigned to have GEI effects on Y. Another five SNPs were generated to have only additive main effects on Y. The rest 80 SNPs were not associated with Y. The simulation design represents a study where GEI over multiple SNPs are being tested, the majority of SNPs do not have GEI effects and only a relatively small number of SNPs exhibit GEI effects.

To assess the sensitivity of tests in response to the underlying composition of different interaction models, we created three scenarios by assigning each of the 15 GEI to have either a Tukey’s or a saturated form of interaction: Scenario (A): all 15 were of Tukey’s form of interaction; Scenario (B): 10 were of Tukey’s form, and 5 had saturated interaction structures; Scenario (C): 10 had saturated interaction structures, and 5 were of Tukey’s form. For example, the mean function of the simulation model for subject k under scenario (B) in simulation scheme (ii), following the notations in (3), is given by

\begin{matrix} f (η, x_{k t}) = β_{0} + \sum_{j = 1}^{J} β_{j}^{E} I (E_{k t} = j) + \sum_{i = 1}^{I} \sum_{j = 1}^{J} {\sum_{s = 1}^{10} [β_{i}^{G s} I (G_{s k} = i) + θ^{s} β_{i}^{G s} β_{j}^{E} I (G_{s k} = i, E_{k t} = j)] \\ + \sum_{s = 11}^{15} [β_{i}^{G s} I (G_{s k} = i) + τ_{i j}^{s} (G_{s k} = i, E_{k t} = j)]} + \sum_{i = 1}^{I} \sum_{s = 15}^{20} β_{i}^{G s} I (G_{s k} = i), \end{matrix}

where $β_{i}^{G s}$ represents the genetic main effect of the i-th genotype from the s-th SNP, G_sk is the genotype of the s-th SNP for the k-th subject, and θ^s and $τ_{i j}^{s}$ are the interaction parameter corresponding to the s-th SNP. An individual-level outcome Y with repeated measures were generated for 1000 subjects in each simulation using (1) with $e_{k} ~ N (0, σ_{e}^{2} I_{n_{k}})$ , b_k = b_k1_{n_k}, $b_{k} ~ N (0, σ_{b}^{2})$ . We set $σ_{b}^{2} = 2.8$ , and $σ_{e}^{2} = 1.2$ . The number of repeated measurements per subject was generated using the same multinomial distribution described previously.

The average performance for each test procedure was quantified by true positive rate (TPR) and false positive rate (FPR). The TPR is defined as the proportion of interactions detected in the 15 simulated SNPs with GEI associations. The FPR is the proportion of interactions detected among the 85 simulated SNPs without GEI effects. The TPR and FPR were then averaged over 10,000 simulation datasets. To control the family-wise error rate (FWER), the significance level was adjusted according to the total number of SNPs (i.e., number of GEI tests) using Bonferroni correction, α^* = 0.05/100 = 5 × 10⁻⁴.

6.3. Power and Type I Error

The upper panel of Table 1 shows the power and Type I error of tests using Tukey’s, the saturated model, and the shrinkage estimator for GEI. In general, the saturated interaction model has less power to detect interactions when the true interaction has a Tukey’s form. For example, the LRT for Tukey’s form of interaction has power 0.76 for σ² = 4, while the saturated model has a power of 0.54. On the other hand, when the true interaction has a saturated form, Tukey’s model can hardly detect the interaction effects. The saturated model has a power of 0.81 for σ² = 4, but Tukey’s model using the LRT only has power 0.09. Under both situations, the interaction test using the shrinkage estimator has power 0.69. When the true interaction has an AMMI1 form, the saturated interaction and the shrinkage estimator can detect 82% and 72% of interactions, respectively, but Tukey’s model can only detect 30% of interactions. The Type I error rates are maintained at the nominal level for all testing procedures under additive models except the Wald test using the shrinkage estimator being a slightly conservative test. However, both Tukey’s test and the shrinkage estimator have inflated Type I error under the completely null model when one of the main effects is not present.

Table 1.

Power for detecting GEI and Type I error rates using Tukey’s model, the proposed adaptive shrinkage estimator, and saturated interaction model under different interaction structures in 3×3 table settings (N=1200). Data were simulated under an autoregressive-1 (AR-1) correlation structure while analysis was performed under correctly specified and misspecified correlation structures (see Section 6.1 for simulation details).

	σ² = 4			σ² = 8

Test Model	Tukey LRT	Shrinkage Wald	Saturated LRT	Tukey LRT	Shrinkage Wald	Saturated LRT
True Model	Tukey LRT	Shrinkage Wald	Saturated LRT	Tukey LRT	Shrinkage Wald	Saturated LRT
Correctly Specified Correlation Structure (AR-1)
Tukey’s one-df	0.758	0.686	0.540	0.479	0.409	0.273
AMMI1	0.303	0.720	0.817	0.211	0.398	0.514
Saturated	0.094	0.690	0.806	0.086	0.325	0.459

H₀ : θ = 0 (Additive)	0.047	0.042	0.053	0.053	0.043	0.051
H₀ : θ = 0 (Null)	0.104	0.081	0.049	0.107	0.087	0.052

Misspecified Correlation Structure (Compound Symmetric)
Tukey’s one-df	0.730	0.640	0.496	0.435	0.376	0.244
AMMI1	0.287	0.708	0.792	0.185	0.372	0.489
Saturated	0.060	0.646	0.775	0.065	0.308	0.432

H₀ : θ = 0 (Additive)	0.048	0.046	0.053	0.070	0.048	0.060
H₀ : θ = 0 (Null)	0.118	0.064	0.053	0.143	0.065	0.061

Open in a new tab

When the within-subject correlation structure is misspecified (lower panel of Table 1), the patterns of power comparison are similar to the upper panel. Under the null hypothesis of an additive model where both main effects are present, the Type I error rates for the two LRTs are still maintained at the 0.05 level when σ² = 4 but are inflated when σ² = 8. Only the proposed Wald test using the shrinkage estimator maintains the nominal level of Type I erorr. Under the null that genetic main effects are absent, the Type I error is no longer maintained at 0.05 for all of the tests.

6.4. Average Performance for Multiple GEI Tests

The upper panel of Table 2 shows the average performance of the three GEI tests for marginal models under three scenarios. Under scenario (A) where all 15 simulated GEI are of Tukey’s form, the LRT using Tukey’s model has a TPR of 0.72, whereas the saturated model has a TPR of 0.43. Under scenario (B) where 2/3 of the simulated GEI are of Tukey’s form, the LRT using Tukey’s model and the saturated interaction test have comparable performance. Under scenario (C) where 2/3 of the interactions are of saturated forms, the Wald test using the shrinkage estimator and the saturated interaction tests have comparable performance, but the TPR for the LRT using Tukey’s model is substantially lower. The FPRs are maintained at the nominal level for the tests using a saturated model and slightly inflated for the shrinkage estimator. However, the LRT for Tukey’s model has the highest FPR.

Table 2.

Average performance of tests using Tukey’s model, saturated interaction model, and the adaptive shrinkage estimator for detecting GEI across 100 simulated SNPs under scenarios (A): all simulated GEI are of Tukey’s form, (B): 2/3 of simulated GEI are of Tukey’s form and 1/3 are of saturated form, and (C): 2/3 of simulated GEI are of saturated form and 1/3 are of Tukey’s form

Measure	Scenario	Tukey LRT	Shrinkage Wald	Saturated LRT
Marginal Models
True Positive Rate	(A)	0.7221	0.5766	0.4302
	(B)	0.5611	0.6317	0.5769
	(C)	0.4923	0.6699	0.7357
False Positive Rate		0.0024	0.0007	0.0006

Multivariate Models
True Positive Rate	(A)	0.3264	0.2810	0.0706
	(B)	0.2882	0.2602	0.2247
	(C)	0.2073	0.2507	0.2911
False Positive Rate		0.0045	0.0027	0.0006

Open in a new tab

The lower panel of Table 2 shows the results of a multivariate model (single outcome) from 100 simulated GEI. The LRT using a saturated interaction form yields relatively low TPRs. The test based on the shrinkage estimator still maintains at the same level of TPR across scenarios. In summary, the GEI test using the shrinkage estimator has the most robust average performance with respect to various GEI structures compared to the tests using Tukey’s and saturated interaction models.

7. Application

7.1. Normative Aging Study (NAS)

The Normative Aging Study (NAS) is a multidisciplinary longitudinal study initiated by the U.S. Veterans Administration in 1963 to investigate the effects of aging on various health outcomes [29]. We focus on pulse pressure (PP), which is an important risk factor for heart disease [30]. Several studies have indicated a relationship between iron deficiency and increased lead absorption [31, 32], and increased cumulative lead exposure has been shown to be associated with elevated PP [33]. Thus, it may be reasonable to hypothesize that genes responsible for iron metabolism could potentially alter lead absorption and modify the effect of lead exposure on PP. The objective of this pathway-driven GEI study was to test the GEI between cumulative lead exposure and the iron metabolic genes on PP.

Zhang et al. [34] observed a significant interaction between polymorphisms in the hemochromatosis (HFE) gene (rs1799945) and cumulative lead exposure on PP. We revisited the study to include 105 SNPs in 22 genes with minor allele frequency>0.1 in the iron metabolic pathway to test for GEI using the proposed shrinkage estimation framework. Candidate genes were chosen based on a priori knowledge of iron metabolism and previous studies on iron-related genes [35, 36]. We analyzed 729 participants from a subset of the NAS data who were successfully genotyped for the iron metabolism genes and had baseline measurements of cumulative lead concentrations (measured at the tibia bone and patella bone). The majority (97%) of the participants were Caucasian. The average age was 66.37±7.12 (range 48–93) at the time of bone lead measurement. Since 1991, blood pressure had been measured every 3–5 years until 2011 with a median follow-up time of 12 years. More than 94% of subjects had repeated measurements of blood pressure, and over 48% of them had at least four measurements during the study period contributing to a total of 3013 observations (see Table 7 in the supporting information).

Each of the 105 SNPs had three possible genotypes (homozygous wild-type, heterozygous, and homozygous mutant). For illustration purposes, we categorized bone lead concentrations into three groups – Low: ≤15, Medium: (15, 25], and High: >25 μg/g for the tibia bone lead and Low: ≤20, Medium: (20, 32], and High: >32 μg/g for the patella bone lead. We used Tukey’s model, saturated interaction model, and the shrinkage approach to model the GEI structures for each SNP×Lead interaction. Covariates in the model included baseline age, time since baseline, and squared time. According to the Akaike information criterion (AIC) for model fit, we chose a random-intercept mixed-effects model for analysis given by y_k = f(η, X_k) + b_k1_{n_k}+ e_k, where $b_{k} ~ N (0, σ_{b}^{2}), e_{k} ~ N (0, σ_{e}^{2} I_{n k})$ .

Given that these SNPs are located in a small number of genomic regions, they are in close proximity to each other and thus may exhibit LD. To control for the FWER while accounting for the potentially correlated SNPs in the multiple testing procedure, we adjusted the significance level according to the effective number of independent tests (denoted by M_eff) using the simple M method [37]. This method involves first estimating the correlation matrix among the 105 SNPs by the composite LD, calculating the corresponding eigenvalues, λ₁ ≥ λ₂ ≥ ··· ≥ λ₁₀₅, and then finding M_eff through principal component analysis: $\sum_{s = 1}^{M_{eff}} λ_{s} / \sum_{s = 1}^{105} λ_{s} > C$ . We chose M_eff = 89 so that the corresponding eigenvalues explained at least C = 99.5% of the variation for the SNP data. Thus, the adjusted significance level was α^* = 0.05/M_eff = 0.05/89 = 5.6 × 10⁻⁴.

Table 3 lists the smallest p-values of GEI tests for the three top-ranked SNPs by using Tukey’s model, the proposed shrinkage estimator, and saturated interaction model within iron gene regions in the NAS data. The Wald test via the shrinkage estimator yielded the smallest p-values across all top ranked SNPs listed in the table (and three of which reached statistical significance), compared to Tukey’s and saturated interaction models. For tibia bone lead, we found a significant modifying effect of SNPrs1799945 in the HFE gene using the shrinkage estimator (p = 1 × 10⁻⁴). For the wild-type participants, mean PP remained nearly unchanged between the High and the Low tibia lead groups. In contrast, mean PP was estimated to be 20.35 mmHg (95% CI = [14.53, 26.17]) higher for the High tibia lead group than the Low tibia lead group among the homozygous mutant carriers. The results replicate the findings in Zhang et al. [34] that the positive association between PP and lead exposure was strongest among HFE homozygous mutant carriers. For patella bone lead, significant modifying effects of SNP rs17484524 in the IREB2 (iron-responsive element binding protein 2) gene (p = 3 × 10⁻⁴) and SNP rs7165535 in the B2M (beta-2-microglobulin) gene (p = 4 × 10⁻⁴) were detected using the Wald test based on the shrinkage estimator (but were not captured by the LRTs using Tukey’s or saturated interaction model). For the wild-type and the heterozygous mutant participants, higher lead levels corresponded to higher mean PP (the estimated difference in mean PP between High and Low patella lead groups ranged from 3.12 to 4.32 mmHg at both SNPs). However, mean PP was estimated to be 3.90 (95% CI = [1.45, 6.35]) and 7.73 (95% CI = [1.88, 13.58]) mmHg lower for the High lead group than the Low lead group among the homozygous mutant carriers at SNP rs17484524 in the IREB2 gene and SNP rs7165535 in the B2M gene, respectively. As such, the two homozygous mutant genotypes may indicate protective effects (i.e., preventing PP from elevating with increased lead exposure).

Table 3.

The p-values of GEI tests for the top three (ranks in parentheses) single-nucleotide polymorphisms (SNPs) by using Tukey’s model, the proposed shrinkage estimator, and saturated interaction model within iron gene regions in the NAS data (adjusted α = 5.6 × 10⁻⁴).

Bone Lead	SNP ID	Gene	Tukey LRT	Shrinkage Wald	Saturated LRT
Tibia	rs1799945	HFE	0.003 (1)	1 × 10⁻⁴ (1)	0.006 (1)
	rs2285228	DMT1	0.005 (2)	0.001 (2)	0.017 (2)
	rs3821716	MFI2	0.014 (3)	0.012	0.120
	rs422982	DMT1	0.016	0.003 (3)	0.072 (3)

Patella	rs7165535	B2M	0.001 (1)	4 × 10⁻⁴ (2)	0.014 (1)
	rs17484524	IREB2	0.002 (2)	3 × 10⁻⁴ (1)	0.021 (2)
	rs7866419	ACO1	0.009 (3)	0.005	0.054
	rs1358024	TF	0.016	0.004 (3)	0.038
	rs2304704	SLC40A1	0.044	0.030	0.034 (3)

Open in a new tab

7.2. Multi-Ethnic Study of Atherosclerosis (MESA)

The Multi-Ethnic Study of Atherosclerosis (MESA) is a longitudinal study to investigate characteristics related to progression of subclinical to clinical cardiovascular disease [38]. More than 6,800 men and women aged 45–84 years were recruited from six U.S. communities. Participants had a baseline examination (exam 1) in 2000–2002 and three additional follow-up examinations 18–24 months apart (exams 2–4). We aimed to explore GEI effects on BMI in the four race groups: Caucasians (N=2526), Chinese (N=775), African Americans (N=1611), and Hispanics (N=1449). Most (84%) of the participants had four BMI measurements, and over 92% had at least two measurements during the study period from 2000 to 2007 (see Table 3 in the supporting information). A total of 27 SNPs that have demonstrated significant and replicated evidence of marginal association with BMI were selected as the candidate SNPs [39]. The environmental exposures of interest were energy intake, measured at exam 1, and total intentional exercise, measured at exams 1–3. Both exposure variables were categorized into five groups: 0, (0, 7], (7, 14], (14, 28], >28 (MET-hr/week) for total intentional exercise and <1000, (1000, 1300], (1300, 1600], (1600, 2000], >2000 (kcal/day) for energy intake.

We applied Tukey’s model, saturated interaction model, and the shrinkage test to examine the GEI structure for each SNP×Energy Intake and SNP×Exercise interaction. Covariates in the model included age at the time of data collection (centered), squared age, gender, having a college degree, household income, and the exposure variable (either intentional exercise or energy intake). We also accounted for population stratification by including the first two principal components. Except age, BMI, and intentional exercise that changed with time, all other variables were time-invariant. We chose an unstructured covariance matrix for this analysis based on AIC. A random gender effect was added to allow men and women to have different variances in BMI. Let F = 1_{n_k} for women and F = 0_{n_k} for men. The analysis model is given by y_k = f(η, X_k) + F_kb_k + e_k, where $b_{k} ~ (0, σ_{b}^{2})$ , e_k ~ Inline graphic (0, Σ_k). We first analyzed data by race group (see Table 8 in the supporting information) and then applied Fisher’s method [40] to combine four race groups into a single meta-analysis p-value for each SNP. Not every race group allowed for GEI tests across all 27 SNPs because of small sample size in certain GEI configurations. The df for deriving the combined p-values was based on the number of available race groups. The adjusted p-value to control for the FWER was set at 0.05/27 = 0.0019.

Table 4 lists the combined p-values for significant SNPs using the three interaction tests. For the association of energy intake with BMI, significant modifying effect of SNP rs543874 on the SEC16B gene was observed using all three tests. SNP rs1558902 within the FTO gene was detected by Tukey’s model (p = 4.8 × 10⁻⁵) and the shrinkage test (p = 7.4 × 10⁻⁴). SNP rs10767664 (on the BDNF gene) was also detected by Tukey’s model (p = 1.2 × 10⁻³). For the association between intentional exercise and BMI, we found significant modifying effect of SNP rs206936 within the NUDT3 and HMGA1 genes using Tukey’s model (p = 1.4 × 10⁻⁴). Overall, only one interaction was detected by a standard saturated interaction model used in the current practice. Both the examples illustrate the utility of enhancing power of a test for interaction by leveraging Tukey’s model. The shrinkage estimator also offers protection against false positive. The findings require further replication studies.

Table 4.

Findings of GEI with significant meta-analysis p-values for the single-nucleotide polymorphisms (SNPs) that have demonstrated significant and replicated evidence of marginal association with BMI in the MESA data (adjusted α = 1.9 × 10⁻³).

Exposure	SNP ID	Gene	Tukey LRT	Shrinkage Wald	Saturated LRT
Energy Intake	rs543874	SEC16B	<1.0 × 10⁻⁸	<1.0 × 10⁻⁸	1.8×10⁻⁴
	rs1558902	FTO	4.8 × 10⁻⁵	7.4 × 10⁻⁴	0.130
	rs10767664	BDNF	1.2 × 10⁻³	0.103	0.124

Exercise	rs206936	NUDT3, HMGA1	1.4 × 10⁻⁴	0.006	0.005

Open in a new tab

8. Discussion

We proposed a novel adaptive shrinkage estimator that combines estimates from Tukey’s one-df model and a saturated interaction model for GEI effects. The shrinkage estimator shrinks the MLEs under a general, saturated interaction structure toward Tukey’s one-df model estimator that allows for data-adaptive relaxation of the structural assumption in Tukey’s product form.

The unique simulation setting of multiple GEI tests represents the search for GEI over many candidate SNPs with different interaction patterns. The results indicate that the test based on the shrinkage estimator can be considered as a robust and unified approach for interaction detection. More importantly, the shrinkage method not only can be applied to the context of GEI or GGI detection but also can be extended to any two-way table.

We evaluated MSE and bias of these estimators of interaction effects through simulations (Table 2 in the supporting information). The performance of the shrinkage estimator was compared with the MLE under a general saturated interaction model using the ratio of MSE,

\hat{E} {\sum_{i} \sum_{j} {({\hat{τ}}_{{shk}_{i j}} - τ_{i j})}^{2}} / \hat{E} {\sum_{i} \sum_{j} {({\hat{τ}}_{{sat}_{i j}} - τ_{i j})}^{2}} .

Based on simulation results, the ratio is uniformly less than 1, suggesting an efficiency advantage for the shrinkage estimator via bias-variance trade-off. In our simulation studies, we noted that the Wald test using the shrinkage estimator is slightly conservative, so the small bias of the shrinkage estimator in finite samples does not lead to inflated Type I error. In addition, we compared the shrinkage estimates of interaction parameters using only the diagonal elements of B (i.e., scalar shrinkage) versus using the whole B matrix (i.e., multivariate shrinkage). We found that multivariate shrinkage is required under certain situations (see Table 3 in the supporting information). Chen et al. [20] proposed both multivariate and scalar shrinkage estimators in case-control studies, and they also found that the scalar shrinkage estimator can lead to appreciable bias.

Although the methods we discussed have been developed for a two-dimensional interaction structure (i.e., the genetic and interaction effects are assumed to be invariant with time), they can be easily modified to allow for time-dependent effects. To allow for temporal changes in the main effects and interaction effects, one may use spline functions. For example, the mean function for Tukey’s model at time (or age) of measurement t can be expressed as

f (η (t), x_{k t}) = f (β (t), θ (t), x_{k t}) = β_{0} (t) + β^{G} (t) g_{k} + β^{E} (t) e_{k t} + θ (t) β^{G} (t) β^{E} (t) g_{k} e_{k t},

where the genotype g_k and the exposure variable e_kt for subject k at time t can be treated as continuous, β₀(t) is the baseline function, β^G(t) and β^E(t) are the time-varying genetic and exposure function, and θ(t) is the time-varying interaction function. These functions can be approximated by a linear combination of basis functions [41]. We plan to address the issues of estimation and testing for the temporal dynamic changes in interaction effects using alternative models in future studies.

We have proposed a new approach in the area of longitudinal GEI cohort studies. The Tukey’s one-df test for non-additivity can be very powerful in terms of detecting GEI for studies where the search for GEI is based on the presence of genetic main effects (e.g., MESA), but the test can suffer from misspecification of interaction structure. The proposed shrinkage estimation procedure, on the other hand, is useful for pathway-driven GEI studies (e.g., NAS) where there is no prior knowledge of the existence of genetic main effects. It also performs well across many scenarios. Despite the advantage of efficiency, the adaptive shrinkage estimation approach still uses the same df for interaction parameters as a saturated model. As such, the increase in power by shrinking parameter estimates toward Tukey’s model estimates may be limited. However, the robust performance across multiple loci with different interaction structures remain an appealing feature of such adaptive screening tests.

Supplementary Material

supplementary

NIHMS615558-supplement-supplementary.pdf^{(534.6KB, pdf)}

Acknowledgments

This research was supported by the NSF grant DMS-1007494 and the NIH grants CA156608 and ES020811. Support was also provided by the MESA Stress Study (R01 HL101161) and the Michigan Center for Integrative Approaches to Health Disparities (P60 MD002249). The VA Normative Aging Study is supported by the Cooperative Studies Program/Epidemiology Research and Information Center of the U.S. Department of Veterans Affairs and is a component of the Massachusetts Veterans Epidemiology Research and Information Center, Boston, Massachusetts. The authors would like to thank Dr. Joel Schwartz, Dr. Howard Hu, and NAS participants for sharing the data resources. The MESA SHARe project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts N01-HC-95159 through N01-HC-95169 and UL1-RR-024156. Funding for genotyping was provided by NHLBI Contract N02-HL-6-4278 and N01-HC-65226.

References

1.Kraft P, Yen Y, Stram D, Morrison J, Gauderman W. Exploiting gene-environment interaction to detect genetic associations. Human Heredity. 2007;63(2):111–119. doi: 10.1159/000099183. [DOI] [PubMed] [Google Scholar]
2.Mukherjee B, Chatterjee N. Exploiting gene-environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64(3):685–694. doi: 10.1111/j.1541-0420.2007.00953.x. [DOI] [PubMed] [Google Scholar]
3.Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. American Journal of Epidemiology. 2012;175(3):177–190. doi: 10.1093/aje/kwr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wang Y, Huang C, Fang Y, Yang Q, Li R. Flexible semiparametric analysis of longitudinal genetic studies by reduced rank smoothing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012;61(1):1–24. doi: 10.1111/j.1467-9876.2011.01016.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Fan R, Zhang Y, Albert PS, Liu A, Wang Y, Xiong M. Longitudinal association analysis of quantitative traits. Genetic Epidemiology. 2012;36(8):856–869. doi: 10.1002/gepi.21673. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Zhang H. Multivariate adaptive splines for analysis of longitudinal data. Journal of Computational and Graphical Statistics. 1997;6(1):74–91. [Google Scholar]
7.Zhang H. Mixed effects multivariate adaptive splines model for the analysis of longitudinal and growth curve data. Statistical Methods in Medical Research. 2004;13(1):63–82. doi: 10.1191/0962280204sm353ra. [DOI] [PubMed] [Google Scholar]
8.Zhu W, Cho K, Chen X, Zhang M, Wang M, Zhang H. BMC Proceedings. Vol. 3. BioMed Central Ltd; 2009. A genome-wide association analysis of Framingham Heart Study longitudinal data using multivariate adaptive splines; p. S119. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Xu S. An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics. 2007;63(2):513–521. doi: 10.1111/j.1541-0420.2006.00711.x. [DOI] [PubMed] [Google Scholar]
10.Malzahn D, Schillert A, Müller M, Bickeböller H. The longitudinal nonparametric test as a new tool to explore gene-gene and gene-time effects in cohorts. Genetic Epidemiology. 2010;34(5):469–478. doi: 10.1002/gepi.20500. [DOI] [PubMed] [Google Scholar]
11.Mukherjee B, Ko Y, VanderWeele T, Roy A, Park S, Chen J. Principal interactions analysis for repeated measures data: application to gene–gene and gene–environment interactions. Statistics in Medicine. 2012;31(22):2531–2551. doi: 10.1002/sim.5315. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ko Y, Chudhuri P, Park S, Vokonas P, Mukherjee B. Novel likelihood ratio tests for screening gene-gene and geneenvironment interactions with unbalanced repeated-measures data. Genetic Epidemiology. 2013;37:581–591. doi: 10.1002/gepi.21744. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Moreno-Macias H, Romieu I, London SJ, Laird NM. Gene-environment interaction tests for family studies with quantitative phenotypes: A review and extension to longitudinal measures. Human Genomics. 2010;4(5):302–326. doi: 10.1186/1479-7364-4-5-302. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Tukey J. One degree of freedom for non-additivity. Biometrics. 1949;5(3):232–242. [Google Scholar]
15.Maity A, Carroll R, Mammen E, Chatterjee N. Testing in semiparametric models with interaction, with applications to gene–environment interactions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2009;71(1):75–96. doi: 10.1111/j.1467-9868.2008.00671.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kooperberg C, LeBlanc M. Increasing the power of identifying gene × gene interactions in genome-wide association studies. Genetic Epidemiology. 2008;32(3):255–263. doi: 10.1002/gepi.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Murcray C, Lewinger J, Gauderman W. Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology. 2009;169(2):219–226. doi: 10.1093/aje/kwn353. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. The American Journal of Human Genetics. 2006;79(6):1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Barhdadi A, Dubé M. Testing for gene-gene interaction with AMMI models. Statistical Applications in Genetics and Molecular Biology. 2010;9(Article 2):1–27. doi: 10.2202/1544-6115.1410. [DOI] [PubMed] [Google Scholar]
20.Chen Y, Chatterjee N, Carroll R. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. Journal of the American Statistical Association. 2009;104(485):220–233. doi: 10.1198/jasa.2009.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bates D, Watts D. Nonlinear Regression Analysis and Its Applications. Wiley; New York: 1988. [Google Scholar]
22.Lindstrom M, Bates D. Nonlinear mixed effects models for repeated measures data. Biometrics. 1990;46:673–687. [PubMed] [Google Scholar]
23.Vonesh E, Carter R. Mixed-effects nonlinear regression for unbalanced repeated measures. Biometrics. 1992;48(1):1– 17. [PubMed] [Google Scholar]
24.Crainiceanu C, Ruppert D. Likelihood ratio tests for goodness-of-fit of a nonlinear regression model. Journal of Multivariate Analysis. 2004;91(1):35–52. [Google Scholar]
25.Gallant A. Nonlinear Statistical Models. Wiley; New York: 2009. [Google Scholar]
26.Vonesh E, Wang H, Majumdar D. Generalized least squares, Taylor series linearization and fisher’s scoring in multivariate nonlinear regression. Journal of the American Statistical Association. 2001;96(453):282–291. [Google Scholar]
27.Gollob H. A statistical model which combines features of factor analytic and analysis of variance techniques. Psychometrika. 1968;33(1):73–115. doi: 10.1007/BF02289676. [DOI] [PubMed] [Google Scholar]
28.Mandel J. A new analysis of variance model for non-additive data. Technometrics. 1971;13(1):1–18. [Google Scholar]
29.Bell B, Rose C, Damon A. The veterans administration longitudinal study of healthy aging. The Gerontologist. 1966;6(4):179–184. doi: 10.1093/geront/6.4.179. [DOI] [PubMed] [Google Scholar]
30.Franklin SS, Khan SA, Wong ND, Larson MG, Levy D. Is pulse pressure useful in predicting risk for coronary heart disease? The Framingham Heart Study. Circulation. 1999;100(4):354–360. doi: 10.1161/01.cir.100.4.354. [DOI] [PubMed] [Google Scholar]
31.Kwong WT, Friello P, Semba RD. Interactions between iron deficiency and lead poisoning: epidemiology and pathogenesis. Science of The Total Environment. 2004;330(1):21–37. doi: 10.1016/j.scitotenv.2004.03.017. [DOI] [PubMed] [Google Scholar]
32.Bradman A, Eskenazi B, Sutton P, Athanasoulis M, Goldman LR. Iron deficiency associated with higher blood lead in children living in contaminated environments. Environmental Health Perspectives. 2001;109(10):1079–1084. doi: 10.1289/ehp.011091079. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Perlstein T, Weuve J, Schwartz J, Sparrow D, Wright R, Litonjua A, Nie H, Hu H. Cumulative community-level lead exposure and pulse pressure: the Normative Aging Study. Environmental Health Perspectives. 2007;115(12):1696. doi: 10.1289/ehp.10350. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Zhang A, Park S, Wright R, Weisskopf M, Mukherjee B, Nie H, Sparrow D, Hu H. HFE H63D polymorphism as a modifier of the effect of cumulative lead exposure on pulse pressure: the Normative Aging Study. Environmental Health Perspectives. 2010;118(9):1261–1266. doi: 10.1289/ehp.1002251. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Knutson M, Wessling-Resnick M. Iron metabolism in the reticuloendothelial system. Critical Reviews in Biochemistry and Molecular Biology. 2003;38(1):61–88. doi: 10.1080/713609210. [DOI] [PubMed] [Google Scholar]
36.Chung J, Wessling-Resnick M. Molecular mechanisms and regulation of iron transport. Critical Reviews in Clinical Laboratory Sciences. 2003;40(2):151–182. doi: 10.1080/713609332. [DOI] [PubMed] [Google Scholar]
37.Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology. 2008;32(4):361–369. doi: 10.1002/gepi.20310. [DOI] [PubMed] [Google Scholar]
38.Bild DE, Bluemke DA, Burke GL, Detrano R, Roux AVD, Folsom AR, Greenland P, Jacobs DR, Jr, Kronmal R, Liu K, et al. Multi-ethnic study of atherosclerosis: objectives and design. American Journal of Epidemiology. 2002;156(9):871–881. doi: 10.1093/aje/kwf113. [DOI] [PubMed] [Google Scholar]
39.Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Mägi R, et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics. 2010;42(11):937–948. doi: 10.1038/ng.686. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Fisher R. Statistical Methods for Research Workers. Oliver & Boyd; Edinburgh: 1925. [Google Scholar]
41.Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–822. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary

NIHMS615558-supplement-supplementary.pdf^{(534.6KB, pdf)}

[R1] 1.Kraft P, Yen Y, Stram D, Morrison J, Gauderman W. Exploiting gene-environment interaction to detect genetic associations. Human Heredity. 2007;63(2):111–119. doi: 10.1159/000099183. [DOI] [PubMed] [Google Scholar]

[R2] 2.Mukherjee B, Chatterjee N. Exploiting gene-environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64(3):685–694. doi: 10.1111/j.1541-0420.2007.00953.x. [DOI] [PubMed] [Google Scholar]

[R3] 3.Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. American Journal of Epidemiology. 2012;175(3):177–190. doi: 10.1093/aje/kwr367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Wang Y, Huang C, Fang Y, Yang Q, Li R. Flexible semiparametric analysis of longitudinal genetic studies by reduced rank smoothing. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2012;61(1):1–24. doi: 10.1111/j.1467-9876.2011.01016.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Fan R, Zhang Y, Albert PS, Liu A, Wang Y, Xiong M. Longitudinal association analysis of quantitative traits. Genetic Epidemiology. 2012;36(8):856–869. doi: 10.1002/gepi.21673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Zhang H. Multivariate adaptive splines for analysis of longitudinal data. Journal of Computational and Graphical Statistics. 1997;6(1):74–91. [Google Scholar]

[R7] 7.Zhang H. Mixed effects multivariate adaptive splines model for the analysis of longitudinal and growth curve data. Statistical Methods in Medical Research. 2004;13(1):63–82. doi: 10.1191/0962280204sm353ra. [DOI] [PubMed] [Google Scholar]

[R8] 8.Zhu W, Cho K, Chen X, Zhang M, Wang M, Zhang H. BMC Proceedings. Vol. 3. BioMed Central Ltd; 2009. A genome-wide association analysis of Framingham Heart Study longitudinal data using multivariate adaptive splines; p. S119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Xu S. An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics. 2007;63(2):513–521. doi: 10.1111/j.1541-0420.2006.00711.x. [DOI] [PubMed] [Google Scholar]

[R10] 10.Malzahn D, Schillert A, Müller M, Bickeböller H. The longitudinal nonparametric test as a new tool to explore gene-gene and gene-time effects in cohorts. Genetic Epidemiology. 2010;34(5):469–478. doi: 10.1002/gepi.20500. [DOI] [PubMed] [Google Scholar]

[R11] 11.Mukherjee B, Ko Y, VanderWeele T, Roy A, Park S, Chen J. Principal interactions analysis for repeated measures data: application to gene–gene and gene–environment interactions. Statistics in Medicine. 2012;31(22):2531–2551. doi: 10.1002/sim.5315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Ko Y, Chudhuri P, Park S, Vokonas P, Mukherjee B. Novel likelihood ratio tests for screening gene-gene and geneenvironment interactions with unbalanced repeated-measures data. Genetic Epidemiology. 2013;37:581–591. doi: 10.1002/gepi.21744. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Moreno-Macias H, Romieu I, London SJ, Laird NM. Gene-environment interaction tests for family studies with quantitative phenotypes: A review and extension to longitudinal measures. Human Genomics. 2010;4(5):302–326. doi: 10.1186/1479-7364-4-5-302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Tukey J. One degree of freedom for non-additivity. Biometrics. 1949;5(3):232–242. [Google Scholar]

[R15] 15.Maity A, Carroll R, Mammen E, Chatterjee N. Testing in semiparametric models with interaction, with applications to gene–environment interactions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2009;71(1):75–96. doi: 10.1111/j.1467-9868.2008.00671.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Kooperberg C, LeBlanc M. Increasing the power of identifying gene × gene interactions in genome-wide association studies. Genetic Epidemiology. 2008;32(3):255–263. doi: 10.1002/gepi.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Murcray C, Lewinger J, Gauderman W. Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology. 2009;169(2):219–226. doi: 10.1093/aje/kwn353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. The American Journal of Human Genetics. 2006;79(6):1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Barhdadi A, Dubé M. Testing for gene-gene interaction with AMMI models. Statistical Applications in Genetics and Molecular Biology. 2010;9(Article 2):1–27. doi: 10.2202/1544-6115.1410. [DOI] [PubMed] [Google Scholar]

[R20] 20.Chen Y, Chatterjee N, Carroll R. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. Journal of the American Statistical Association. 2009;104(485):220–233. doi: 10.1198/jasa.2009.0104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Bates D, Watts D. Nonlinear Regression Analysis and Its Applications. Wiley; New York: 1988. [Google Scholar]

[R22] 22.Lindstrom M, Bates D. Nonlinear mixed effects models for repeated measures data. Biometrics. 1990;46:673–687. [PubMed] [Google Scholar]

[R23] 23.Vonesh E, Carter R. Mixed-effects nonlinear regression for unbalanced repeated measures. Biometrics. 1992;48(1):1– 17. [PubMed] [Google Scholar]

[R24] 24.Crainiceanu C, Ruppert D. Likelihood ratio tests for goodness-of-fit of a nonlinear regression model. Journal of Multivariate Analysis. 2004;91(1):35–52. [Google Scholar]

[R25] 25.Gallant A. Nonlinear Statistical Models. Wiley; New York: 2009. [Google Scholar]

[R26] 26.Vonesh E, Wang H, Majumdar D. Generalized least squares, Taylor series linearization and fisher’s scoring in multivariate nonlinear regression. Journal of the American Statistical Association. 2001;96(453):282–291. [Google Scholar]

[R27] 27.Gollob H. A statistical model which combines features of factor analytic and analysis of variance techniques. Psychometrika. 1968;33(1):73–115. doi: 10.1007/BF02289676. [DOI] [PubMed] [Google Scholar]

[R28] 28.Mandel J. A new analysis of variance model for non-additive data. Technometrics. 1971;13(1):1–18. [Google Scholar]

[R29] 29.Bell B, Rose C, Damon A. The veterans administration longitudinal study of healthy aging. The Gerontologist. 1966;6(4):179–184. doi: 10.1093/geront/6.4.179. [DOI] [PubMed] [Google Scholar]

[R30] 30.Franklin SS, Khan SA, Wong ND, Larson MG, Levy D. Is pulse pressure useful in predicting risk for coronary heart disease? The Framingham Heart Study. Circulation. 1999;100(4):354–360. doi: 10.1161/01.cir.100.4.354. [DOI] [PubMed] [Google Scholar]

[R31] 31.Kwong WT, Friello P, Semba RD. Interactions between iron deficiency and lead poisoning: epidemiology and pathogenesis. Science of The Total Environment. 2004;330(1):21–37. doi: 10.1016/j.scitotenv.2004.03.017. [DOI] [PubMed] [Google Scholar]

[R32] 32.Bradman A, Eskenazi B, Sutton P, Athanasoulis M, Goldman LR. Iron deficiency associated with higher blood lead in children living in contaminated environments. Environmental Health Perspectives. 2001;109(10):1079–1084. doi: 10.1289/ehp.011091079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Perlstein T, Weuve J, Schwartz J, Sparrow D, Wright R, Litonjua A, Nie H, Hu H. Cumulative community-level lead exposure and pulse pressure: the Normative Aging Study. Environmental Health Perspectives. 2007;115(12):1696. doi: 10.1289/ehp.10350. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Zhang A, Park S, Wright R, Weisskopf M, Mukherjee B, Nie H, Sparrow D, Hu H. HFE H63D polymorphism as a modifier of the effect of cumulative lead exposure on pulse pressure: the Normative Aging Study. Environmental Health Perspectives. 2010;118(9):1261–1266. doi: 10.1289/ehp.1002251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Knutson M, Wessling-Resnick M. Iron metabolism in the reticuloendothelial system. Critical Reviews in Biochemistry and Molecular Biology. 2003;38(1):61–88. doi: 10.1080/713609210. [DOI] [PubMed] [Google Scholar]

[R36] 36.Chung J, Wessling-Resnick M. Molecular mechanisms and regulation of iron transport. Critical Reviews in Clinical Laboratory Sciences. 2003;40(2):151–182. doi: 10.1080/713609332. [DOI] [PubMed] [Google Scholar]

[R37] 37.Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology. 2008;32(4):361–369. doi: 10.1002/gepi.20310. [DOI] [PubMed] [Google Scholar]

[R38] 38.Bild DE, Bluemke DA, Burke GL, Detrano R, Roux AVD, Folsom AR, Greenland P, Jacobs DR, Jr, Kronmal R, Liu K, et al. Multi-ethnic study of atherosclerosis: objectives and design. American Journal of Epidemiology. 2002;156(9):871–881. doi: 10.1093/aje/kwf113. [DOI] [PubMed] [Google Scholar]

[R39] 39.Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, Allen HL, Lindgren CM, Luan J, Mägi R, et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics. 2010;42(11):937–948. doi: 10.1038/ng.686. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Fisher R. Statistical Methods for Research Workers. Oliver & Boyd; Edinburgh: 1925. [Google Scholar]

[R41] 41.Hoover DR, Rice JA, Wu CO, Yang LP. Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika. 1998;85(4):809–822. [Google Scholar]

PERMALINK

Testing Departure from Additivity in Tukey’s Model using Shrinkage: Application to a Longitudinal Setting

Yi-An Ko

Bhramar Mukherjee

Jennifer A Smith

Sung Kyun Park

Sharon LR Kardia

Matthew A Allison

Pantel S Vokonas

Jinbo Chen

Ana V Diez-Roux

Abstract

1. Introduction

2. Model

3. Parameter Estimation for Tukey’s Model with Repeated Measures Data

4. Shrinkage Estimator

4.1. Variance Estimation for the Shrinkage Estimator

5. Tests for Interaction Effects

6. Simulation Study

6.1. Settings for Evaluation of Test Properties for a Single GEI Test

6.2. Settings for Assessment of Average Performance for Multiple GEI Tests

6.3. Power and Type I Error

Table 1.

6.4. Average Performance for Multiple GEI Tests

Table 2.

7. Application

7.1. Normative Aging Study (NAS)

Table 3.

7.2. Multi-Ethnic Study of Atherosclerosis (MESA)

Table 4.

8. Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases