Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 May 18.
Published in final edited form as: J Am Stat Assoc. 2008 Dec 1;103(484):1693–1704. doi: 10.1198/016214508000001084

Multiply robust inference for statistical interactions

Stijn Vansteelandt 1, Tyler J VanderWeele 2, James M Robins 3
PMCID: PMC3097121  NIHMSID: NIHMS120876  PMID: 21603124

Abstract

A primary focus of an increasing number of scientific studies is to determine whether two exposures interact in the effect that they produce on an outcome of interest. Interaction is commonly assessed by fitting regression models in which the linear predictor includes the product between those exposures. When the main interest lies in the interaction, this approach is not entirely satisfactory because it is prone to (possibly severe) bias when the main exposure effects or the association between outcome and extraneous factors are misspecified. In this article, we therefore consider conditional mean models with identity or log link which postulate the statistical interaction in terms of a finite-dimensional parameter, but which are otherwise unspecified. We show that estimation of the interaction parameter is often not feasible in this model because it would require nonparametric estimation of auxiliary conditional expectations given high-dimensional variables. We thus consider ‘multiply robust estimation’ under a union model that assumes at least one of several working submodels holds. Our approach is novel in that it makes use of information on the joint distribution of the exposures conditional on the extraneous factors in making inferences about the interaction parameter of interest. In the special case of a randomized trial or a family-based genetic study in which the joint exposure distribution is known by design or by Mendelian inheritance, the resulting multiply robust procedure leads to asymptotically distribution-free tests of the null hypothesis of no interaction on an additive scale. We illustrate the methods via simulation and the analysis of a randomized follow-up study.

Keywords: Double robustness, Gene-environment interaction, Gene-gene interaction, Longitudinal data, Semiparametric inference

1 Introduction

A primary focus of an increasing number of scientific studies is to determine whether two given exposures interact to produce their effect, i.e. to determine whether the effect of one exposure is modified by the second. For instance, in many longitudinal studies, the question of whether the time evolution of the response differs for subjects with different baseline characteristics/interventions is of primary interest. In genetic association studies of complex disorders, the discovery of gene-enviroment and gene-gene interactions is of great interest, because most complex disorders are thought to be caused by numerous genes and environmental factors, a subset of which may act synergistically. The development of robust and powerful tests of gene-gene interaction is of special interest to geneticists, since, when the effect of one locus is modified by alleles at another locus, the power to detect a phenotypic association with the first locus can be greatly reduced unless the interaction is explicitly modelled (Cordell, 2002).

When the outcome is continuous or positive-constrained and uncensored, the presence of effect modification between exposures A1 and A2 is commonly assessed by fitting a linear or loglinear conditional mean model for the outcome Y, in which the linear predictor includes the product between these exposures. To be specific, let X be a vector of measured pre-exposure variables such that conditioning on X suffices to control for confounding when estimating the effects of A1 and A2 on outcome Y. Then the term β* in the conditional mean model

E(YA,X)=g(γ0+γ1A1+γ2A2+γ3X+βA1A2) (1)

with A = (A1, A2)’ and g (x) = x or g (x) = ex known, encodes the degree to which exposure A2 modifies the effect of A1 on outcome (on the scale g), and vice versa. Specifically, the choice β* = 0 expresses that the effect of exposure A1 on outcome is the same (on the scale g), regardless of the other exposure A2. It thus encodes the absence of effect modification (on scale g). More generally, one may fit a conditional mean model of the form

E(YA,X)=g{m(A,X;β,γ)} (2)

where

m(A,X;β,γ)=q3(A,X;β)+q2(X,A2;γ2)+q1(X,A1;γ1)+h(X;γ0)

with g defined as before, with q3 (A, X; β) a known function smooth in β and satisfying q3 (A, X; β) = 0 when A1A2 = 0, with q2 (X, A2; γ2), q1 (X, A1; γ1) and h (X; γ0) known functions smooth in γ=(γ0,γ1,γ2) (and γ0, γ1 and γ2 variation independent parameters), satisfying q1 (X, 0; γ1) = q2 (X, 0; γ2) = 0, with β* ∈ Rp and γ* ∈ Rq unknown parameters and with the joint law of (A, X) unrestricted. In this model, the term q3 (A, X; β) encodes the statistical interaction between exposures A1 and A2 (possibly as a function of X). Without loss of generality, we can require q3 (A, X; β) to satisfy q3 (A, X; 0) = 0 so that β* = 0 continues to encode the absence of statistical interaction. The functions q2 (X, A2; γ2) and q1 (X, A1; γ1) encode the main effects (possibly as functions of X) of the exposures A2 and A1, respectively. Finally, h (X; γ0) encodes the main effect of the extraneous factors X. For instance, model (1) is the special case in which q3 (A, X; β) = βA1A2, q2 (X, A2; γ2) = γ2A2, q1 (X, A1; γ1) = γ1A1 and h(X; γ0) = γ0 + γ3X.

In observational studies, X will typically be high-dimensional with a number of continuous components. For instance, in genetic association studies, X might include a high-dimensional collection of substructure-informative loci (Epstein, Allen and Satten, 2007). This makes models for the main exposure effects q2 (X, A2; γ2), q1 (X, A1; γ1) and for the association h(X; γ0) of extraneous factors X with outcome prone to misspecification. These models are not in themselves of scientific interest when the primary goal is to test for statistical interaction between the exposures A1 and A2. As such, standard tests of β* = 0 and inference for statistical interaction under the above model is less than satisfactory. This is so because standard tests for statistical interaction tend to be heavily sensitive to misspecification of the models for the main exposure effects (Greenland, 1993). In particular, they may fail to attain the nominal significance level when these nuisance models are misspecified. Similarly as demonstrated in the simulation study of Section 5, estimates of statistical interaction may be severely biased under misspecification of these nuisance models. As a consequence, in longitudinal studies, standard tests to determine whether the change in outcome mean over time depends on a particular baseline exposure may be compromised when the time evolution or main exposure effect is misspecified, or when important interactions with extraneous variables X have been neglected or mismodelled. In genetic association studies, tests for gene-gene or gene-environment interaction may be biased when the main effect of the gene/environment is incorrectly modelled (e.g. when a dominant genetic model was assumed but not appropriate) or interactions with extraneous confounders have been inadvertently omitted.

Our concern about the consequence of misspecifying the main exposure effects or the association between outcome and extraneous factors in statistical interaction tests is additionally motivated by a problem arising in the Sufficient-component cause framework (Rothman, 1976). It is well known that whether two variables statistically interact may depend on the particular model being used (e.g. on the chosen scale g in model (1)) (Mantel et al., 1977, Greenland, 1993). Specifically, two variables that have an interaction under one statistical model, may not have an interaction under a different model (e.g. with a different link function). When the outcome and exposures are dichotomous, it has been argued (Rothman, 1976; Koopman, 1981) that there is a natural, scale-independent way in which to assess the presence of interactions between two exposures, based on the Sufficient-component cause framework. This framework makes reference to the actual causal mechanisms involved in bringing about the outcome: when two or more binary causes participate in the same causal mechanism, it becomes proper to speak of Sufficient cause interactions. In recent work, VanderWeele and Robins (2007, 2008) derived various conditions which necessarily entail the presence of Sufficient cause interactions. When the exposure effects are assumed to be monotone (Greenland, 1993; VanderWeele and Robins, 2007), these conditions involve testing for effect modification on the risk difference scale. This scale suggests a Bernoulli regression model with linear link as the natural choice to test for sufficient cause interactions. Our interest in semiparametric tests now stems from the fact that such models are likely misspecified because for dichotomous outcomes they may not yield expected outcomes between 0 and 1. See the supplemental material (Vansteelandt et al., 2008) for further discussion of the relation between the estimators we derive below and interactions in the Sufficient cause framework (see also VanderWeele (2008)).

A number of approaches have been developed which avoid modelling the effects of extraneous factors and/or main exposure effects. Robins, Mark and Newey (1992) propose G-estimation which avoids modelling the effects of extraneous factors when a model is specified for the conditional mean exposure given these extraneous factors. Correlated data methods, such as conditional likelihood estimation (Verbeke, Spiessens and Lesaffre, 2001), regression of changes (Louis, 1988) and regression on within- and between-cluster effects (Neuhaus and Kalbfleisch, 1998) can be viewed as variants of this approach in the case of the linear link (Goetgeluk and Vansteelandt, 2007). However, all these approaches require a correct model for the main exposure effects.

In the context of longitudinal studies, Zeger and Diggle (1994) avoid modelling a main exposure effect (i.e. the time effect) via a backfitting algorithm which iterates between kernel estimation of the main time effect and generalized least squares estimation of the remaining parameters. When measurements are collected at discrete time points, Lin and Ying (2001) avoid nonparametric smoothing via a weighted least squares approach that is equivalent to G-estimation. Fan and Li (2004) relax these authors’ restriction of measurements being taken at discrete time points via an approximate regression of changes and profile least squares estimation. These approaches were specifically designed for longitudinal data and, with exception of the approximate regression of changes (Fan and Li, 2004), they require modelling the effects of extraneous factors X. Multivariate adaptive regression splines (MARS) (Friedman, 1991) and interaction spline models (Chen, 1994) avoid parametric modelling assumptions on all exposures and extraneous factors. Although well suited for high-dimensional problems, they still suffer from the curse of dimensionality when the predictor space is large.

In this article, we develop a novel semiparametric approach that can perform well in moderate sized samples even when X is high dimensional. In contrast to previous approaches, the performance guarantees offered by our new approach depend on the extent of prior knowledge concerning the joint exposure distribution f (A1, A2|X) conditional on the covariates. Previous approaches do not make use of information concerning this law; as described below, by incorporating such information our estimators have certain multiple robustness properties described in detail in the next section. By incorporating information on the joint conditional exposure distribution f(A1, A2|X), the class of estimators derived below also essentially encompasses a ‘propensity score’ approach to the estimation of interaction parameters.

Specifically suppose first that the joint law of A1 and A2 is known, as could be the case in either a clinical trial with A1 and A2 both randomly assigned or in a family-based gene-gene interaction study where the law of the genetic markers A1 and A2 is determined by Mendelian inheritance. In this setting, when g is the identity link, our approach delivers consistent and asymptotically normal (CAN) estimators of the interaction β* and an asymptotically distribution free (ADF) test of the hypothesis β* = 0 of no interaction, even when the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0) are all misspecified. In contrast, we prove that, even with f (A1, A2|X) known, if the vector X has a continuously distributed component and none of the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0) is guaranteed to be correct, it is impossible to obtain either a CAN estimator of β* or an ADF test of no interaction when g is the exponential function.

Suppose next A1 and A2 are correctly assumed to be conditionally independent given X so f (A1, A2|X)=f (A1|X) f (A2|X), but neither the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0) nor models f (A1|X; α1) and f (A2|X; α2) (with α1 and α2 variation independent parameters) for f (A1|X) and f (A2|X) are guaranteed correct. This would often be the case in population-based gene-environment interaction studies of a genetic marker score A1 and an environmental exposure A2 or in a population-based gene-gene interaction study where the two genetic markers A1 and A2 are unlinked, provided Sufficient information on ethnicity and geographic origin, or on parental genetic markers are recorded in X to remove the effects of population stratification. In this setting, with g the identity link, we construct a CAN estimator of β* under a union model that assumes at least one of the following four statements is true: (i) the models f (A1|X; α1) and f (A2|X; α2) are both correct, (ii) the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0) are all correct, (iii) the models f (A1|X; α1) and q1 (X, A1; γ1) are both correct, or (iv) the models f (A2|X; α2) and q2 (X, A2; γ2) are both correct. We refer to our estimation approach as quadruply robust as only one of (i)−(iv) need to hold to obtain a CAN estimator of β*. For g the exponential link, it is only triply robust, delivering CAN estimators of β* if at least one of (ii), (iii), or (iv) holds.

Finally suppose A1 and A2 are not known to be conditionally independent, given X, and we therefore specify a model f (A1, A2|X; α) = f (A1|A2, X; α1) × f (A2| X; α2) that allows for conditional dependence. None of the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0), f (A1|A2, X; α1), f (A2| X; α2) is guaranteed correct. Then we shall see that, even for the identity link, quadruply robust estimators are not possible, because there do not exist compatible models f (A1|A2, X; α1) and f (A2|A1, X; α2) with variation independent parameters. Specifically in this setting, with g the identity link, we construct a CAN estimator of β* which we refer to as triply robust because it is CAN under a union model that assumes at least one of the following three statements is true: (i) the models f (A1|A2, X; α1) and f (A2|X; α2) are both correct, (ii) the models q2 (X, A2; γ2), q1 (X, A1; γ1), h(X; γ0) are all correct, (iii) the models f (A1|A2, X; α1) and q1 (X, A1; γ1) are both correct. For g the exponential link, our approach is also triply robust when (i) just above is replaced by the more restrictive condition that the models f (A1|A2, X; α1), f (A2|X; α2) and q2 (X, A1; γ2) are all correct.

In summary, because of the multiple robustness property enjoyed by our approach, we would recommend that, when X is high dimensional, it be used quite generally because an inference concerning an interaction e ect under our approach, unlike under previous approaches, has multiple chances, rather than only one chance, to be correct or nearly correct.

The paper is organized as follows. In Section 2 we introduce semiparametric statistical interaction models. These parameterize the statistical interaction between exposures A1 and A2 (on the chosen scale g) as a function of exposures and extraneous variables X in terms of a finite number of parameters, but leave the observed data law otherwise unrestricted. In particular, the proposed models leave the main effects of both exposures on the outcome unspecified, along with their interactions with extraneous variables. We examine properties of these models. We show that, due to the curse of dimensionality, no general ADF test for statistical interaction exists that is guaranteed to perform well in realisitic-sized samples because estimation of the interaction parameters requires the auxiliary estimation of conditional expectations given high-dimensional variables. We therefore introduce parametric models that we characterize as ‘working’ models because they are not guaranteed to be correct. In Sections 3 and 4, we show how to construct the multiply robust estimators described above. In Section 3, we do so under the assumption that A1 and A2 are conditionally independent given X. In Section 4 we allow for conditional dependence. We illustrate the performance of our methods via simulation studies in Section 5 and the analysis of a randomized follow-up study in Section 6.

2 Model and inference

Consider a study whose design calls for measurements on a vector of variables (Yi, Ai, Xi) to be recorded for each of i = 1, …, n independent subjects. Here, Yi is the outcome of interest, Ai = (Ai1, Ai2)’ is a vector of exposure variables Ai1 and Ai2, and Xi is a vector of extraneous variables, such as confounders for the association between exposure Ai and outcome Yi. The goal of the study is to assess whether the association between the exposure A1 and the outcome Y is modified by A2 on either an additive or multiplicative scale.

To investigate whether there exist ADF tests of the null hypothesis that the interaction parameter β* = 0, we consider the semiparametric interaction model A which relaxes some of the parametric restrictions of model (2). Specifically, model A is defined by the conditional mean model

E(YA,X)=g{m(A,X;β)} (3)

where

m(A,X;β)=q3(A,X;β)+q2(X,A2)+q1(X,A1)+h(X)

with q3 (A, X; β) defined as before, q2 (X, A2), q1 (X, A1) and h(X) being unknown functions satisfying q1 (X, 0) = q2 (X, 0) = 0, with the joint law of (A, X) unrestricted, with g(.) known and either the identity or exponential function, and with β* ∈ Rp an unknown parameter vector. For instance, we may postulate that

E(YA,X)=g{βA1A2+q2(X,A2)+q1(X,A1)+h(X)} (4)

for unknown functions q2 (X, A2), q1 (X, A1) and h(X).

Theorem 1 gives the influence functions of regular asymptotically linear (RAL) estimators of β* in model A and will form the basis of our argument as to why estimation of β* in model A is infeasible when X is high dimensional. The proof of this and other results are given in the supplemental material (Vansteelandt et al., 2008).

Theorem 1. If β^ is a regular asymptotically linear (RAL) estimator of β* in model A, then there exists a p × 1 function d(A, X) in the set D of all p × 1 functions of (A, X) satisfying

E{d(A,X)A1,X}=E{d(A,X)A2,X}=0, (5)

such that β^ has influence function d(A,X)(β), where (β) = Ym(A,X; β) when g (x) = x and (β) = Y exp {−m(A,X; β)} − 1 when g (x) = ex. That is, n12(β^β)=n12i=1nd(Ai,Xi)i(β)+op(1).

By standard results from semiparametric theory in Bickel et al. (1993), Theorem 1 implies that all regular and asymptotically linear (RAL) estimators of β* in model A can be obtained (up to asymptotic equivalence) as the solution β~(d) to the equation

i=1nd(Ai,Xi)i(β)=0, (6)

for some dD. The solution β~(d) to this equation is an infeasible estimator as the set of functions D satisfying (5) depends on the unknown conditional law f(Ai|Xi) of exposure Ai, given Xi, and i(β) depends on the unknown functions q2 (Xi, Ai2), q1 (Xi, Ai1) and h(Xi). A feasible RAL estimator is not possible unless some of the unknowns q2 (Xi, Ai2), q1 (Xi, Ai1), h(Xi), and f(Ai|Xi) can be consistently estimated. While smoothing methods could in principle be used, with the sample sizes found in practice, the data available to estimate the density f(Ai|Xi) and the main effects q2 (Xi, Ai2), q1 (Xi, Ai1) and h(Xi) will be sparse when Xi is a vector with at least several continuous components. As a consequence any feasible estimator of β* under model A will exhibit poor finite sample performance when the predictor space is large. It follows that in general inference about β* in model A is infeasible due to the curse of dimensionality and that dimension-reducing (e.g. parametric) working models must be used to estimate the unknowns q2 (Xi, Ai2), q1 (Xi, Ai1), h(Xi) and f(Ai|Xi). In the following 2 sections, we demonstrate that multiply robust estimators of β* are obtained when the parameters of these models are estimated in an appropriate fashion. In Section 3, we assume that A1 and A2 are conditionally independent given X. This assumption is dropped in Section 4.

3 Conditionally independent exposures

As discussed in the introduction, there are important settings in which A1 and A2 are known to be conditionally independent given X. Therefore define Acip like model A, but with the additional assumption that A1A2|X. Under this model, the set of estimating equations (6) with dD can equivalently be rewritten as

0=i=1n[d(Ai,Xi)E{d(Ai,Xi)Ai1,Xi}E{d(Ai,Xi)Ai2,Xi}+E{d(Ai,Xi)Xi}]i(β) (7)

where d = d(Ai, Xi) is a member of the set of p × 1 functions of (Ai, Xi). The solution β~(d) to this equation is still an infeasible estimator for the reasons discussed previously.

We consider 4 possible dimension-reducing strategies based on working models. The first strategy is to postulate the parametric model (2), i.e., to postulate a parametric model My for q2(X,A2)=q2(X,A2;γ2), q1(X,A1)=q1(X,A1;γ1) and h(X)=h(X;γ0) with γ(γ0,γ1,γ2) unknown finite dimensional parameters, and with γ0, γ1 and γ2 variation independent. The second strategy is to postulate a parametric model Ma for the conditional densities of Aj, given X for j = 1, 2, i.e.

f(AjX)=f(AjX;αj),

where f(A1|X; α1) and f(A2|X; α2) are known densities smooth in variation independent parameters α1 and α2, α1, α2 are unknown finite-dimensional parameters and α(α1,α2). The third (fourth) strategy is to postulate the model Myaj, j = 1 (j = 2) that assumes qj(X,Aj)=qj(X,Aj;γj) and f(AjX)=f(AjX;αj).

Since we cannot be certain that any of these 4 models are correct, we aim to find an estimator β^ of β* that is guaranteed to be CAN when any one of them (but not necessarily more than 1 of them) is correct. That is, we wish to find estimators β^ that are CAN in the union submodel BcipidAcip(MyMaMya1Mya2) of model Acip that assumes that at least one of My, Ma, Mya1 and Mya2 is true. In line with Robins, Rotnitzky and van der Laan (2000), Robins and Rotnitzky (2001) and van der Laan and Robins (2003), we will refer to such estimators as quadruply robust and, more generally, as multiply robust estimators (Vansteelandt, Rotnitzky and Robins, 2007). Part (i) of Theorem 2 below shows that, under mild regularity conditions, when g(.) is the identity link, the estimators β^cipβ^cip(d) are multiply robust (in the sense of being CAN for β* under model Bcipid) for β^cip(d) the solution to

0=i=1n[d(Ai,Xi)E{d(Ai,Xi)Ai1,Xi;α^2}E{d(Ai,Xi)Ai2,Xi;α^1}+E{d(Ai,Xi)Xi;α^}]i(β,γ^(β)) (8)

with i(β, γ) = Yim(Ai, Xi; β, γ), d(Ai, Xi) an arbitrary p × 1 function of (Ai, Xi), α^=(α^1,α^2), with α^j satisfying

0=i=1nHij(α^j)i=1nαjlnf(AijXi;αj)αj=α^j

for j = 1, 2, and γ^(β) solving the system of equations

0=i=1nGi0(β,γ)i=1nc0(Ai,Xi)i(β,γ) (9)
0=i=1nGi1(β,γ,α^1)i=1n[c1(Ai,Xi)E{c1(Ai,Xi)Ai2,Xi;α^1}]i(β,γ) (10)
0=i=1nGi2(β,γ,α^2)i=1n[c2(Ai,Xi)E{c2(Ai,Xi)Ai1,Xi;α^2}]i(β,γ) (11)

for arbitrary vector functions c0(Ai, Xi), c1(Ai, Xi) and c2(Ai, Xi) of bthe dimension of γ0, γ1 and γ2, respectively. The arguments of Robins and Rotnitzky (2001) imply that a necessary condition for the existence of such quadruply robust estimator of β* in model Acip(MyMaMya1Mya2) is that there exists an unbiased estimating equation for β* (with non-trivial power against local alternatives) were any of the following four statements to hold: (1) q2 (X, A2), q1 (X, A1) and h (X) are all known, (1) f(A2|X) and f(A1|X) are both known, (3) q1 (X, A1) and f(A1|X) are both known, (4) q2 (X, A2) and f(A2|X) are both known. The main step in the proof of Theorem 2 is showing that, for j = 1, 2, 3, 4, (8) is an unbiased estimating equation for β* when statement j holds and the known values of the functions specified in statement j are substituted for their estimated values in (8). The proof is then completed by showing that all of the following are true: f(AijXi;α^j) is a CAN estimator of f(Aij|Xi) in models Ma and Myaj, j = 1, 2, qj(X,Aj;γ^j(β)) is a CAN estimator of qj (X, Aj) in models My and Myaj, j = 1, 2, and h(X;γ^0(β)) is a CAN estimator of h (X) in model My.

Part (i) of Theorem 2 further shows that when g(.) is the exponential link, all estimators β^cipβ^cip(d) obtained by solving (8) with i(β, γ) = Yi exp {—m(Ai, Xi; β, γ)} — 1 are multiply robust in the sense of being CAN in the union model Bcipexp=Acip(MyMya1Mya2) when the above conditions hold. As discussed in the introduction, unlike under the identity link, the estimators β^cip(d) are not CAN in model AcipMa. In fact, as mentioned above, a necessary condition for any estimator to be CAN in model AcipMa, and thus in model Acip(MyMaMya1Mya2), is that an unbiased estimating equation for β* (with non-trivial power against local alternatives) exists when f(A2|X) and f(A1|X) are known. But in Lemmas 1-3 of the supplemental materials (Vansteelandt et al., 2008) we show that no such unbiased estimating equation need exist when g is the exponential function and X has continuous components. The lack of an unbiased estimating equation in this setting is connected with the following non-collapsibility property of multiplicative interactions.

Remark

Non-Collapsibility of Multiplicative Interactions: Consider again model (4) and suppose A and X are independent. The model E(YA)=g(κA1A2+η1A1+n2A2+η0) is derived from model (4) by collapsing over X. Note that this model is saturated when A1 and A2 are dichotomous. If g(x) = x, then β* = κ*, so additive interactions are collapsible over X. However, a trivial calculation shows that if g(x) = exp(x), even β* = 0 fails to imply κ* = 0.

Theorem 2

Suppose that the regularity conditions stated in the supplemental materials (Vansteelandt et al., 2008) hold and that β, γ1, γ2, α1 and α2 are variation independent.

(i) Then, when g(x) = x (g(x) = exp(x)), n(β^cipβ) is RAL under model Bcipid(Bcipexp) with influence function

E1{βUi(β,γ~(β),α~)β=β}Ui(β,γ~(β),α~)

and thus converges in distribution to a N (0,γ), where

γ=E([E1{βUi(β,γ~(β),α~)β=β}Ui(β,γ~(β),α~)]2)

with γ~(β) and α~ denoting the probability limits of the estimators γ^(β) and α^ respectively, and

Ui(β,γ,α)=Ui(β,γ,α)E{γUi(β,γ,α)}E1{γGi(β,γ,α)}Gi(β,γ,α)[E{αUi(β,γ,α)}E{γUi(β,γ,α)}E1{γGi(β,γ,α)}]×E[{αGi(β,γ,α)}]E1{αHi(α)}Hi(α) (12)

with Hi(α)(Hi1(α),Hi2) and Gi(β,γ,α)(Gi0(β,γ),Gi1(β,γ,α1),Gi2(β,γ,α2)).

(ii) Furthermore, let β^(d,G(1),H(1)) and β^(d,G(2),H(2)) be 2 estimators of β* under model Bcipid(Bcipexp) corresponding to the same index functions d, but different unbiased estimating functions G(1) and G(2) for γ under model My and H(1) and H(2) for α* under model Ma. Then, n{β^(G(1)),H(1)β^(G(2),H(2))}=op(1) at the intersection submodel AcipMyMa.

Part (i) of Theorem 2 suggests that multiply robust estimators of β* in model Bcipid(Bcipexp) can be obtained by solving an equation of the form (8). General results on doubly robust estimation in Robins and Rotnitzky (2001) further imply that any regular CAN estimator of β* in model Bcipid(Bcipexp) has the same asymptotic distribution as β^cip(d) can be obtained in this way. Part (ii) of Theorem 2 suggests that the choice of estimators for α* and γ* has no impact on the efficiency of Bcipid(Bcipexp) when the models My and Ma are correctly specified. Thus the fact that γ1 and γ2 are estimated by G-estimators solving (10) and (11), respectively, rather than by their more effcient maximum likelihood estimators under model AcipMy has no effect on the asymptotic variance of β^cip when the law of the data lies in AcipMyMa. Nonetheless, the use of such G-estimators is critical to control bias. Indeed, while the solution to (8) is a CAN estimator under model Acip(MyMa) with γ* is replaced by an arbitrary CAN estimator under model AcipMy, it is not CAN under the less restrictive model Bcipid (or Bcipexp).

It follows as a corollary of Theorem 4 in Section 4 that, when the residual outcome variance is constant in A, i.e. Var{(βA,X)}=σ2(X) for some function σ2(X), the efficient estimating equation at β in model Acip is obtained by replacing d (Ai, Xi) in equation (8) with

σ2(Xi)βq3(Ai,Xi;β)

For example, when q3(Ai, Xi;β) = q3(Xi;β)Ai1Ai2, we obtain

0=i=1nq3(Xi;β)β{Ai1E(Ai1Xi;α^1)}{Ai2E(Ai2Xi;α^2)}i(β,γ^(β))σ2(Xi) (13)

It can be deduced from Robins and Rotnitzky (2001) that the semiparametric variance bound in models Acip and Bcipid(Bcipexp) are identical whenever the model MyMa is true, and thus that solving (13) then yields a semiparametric efficient estimator under model Bcipid(Bcipexp) at the intersection model MyMa. Note that (13) merely requires specifying the conditional means of A1 and A2, given X, and not the entire conditional distribution. In practice, unless the variance function σ2(X) is further assumed not to depend on X, the unknown function σ2(X) in (13) must be replaced by an estimator.

The homoscedasticity assumption that MyMa does not depend on A, may often be implausible and is logically impossible for count data with g(x) = exp(x). When this assumption fails, the efficient estimating equation at β* in model Acip can be obtained following the methods developed in the next section.

4 Conditionally dependent exposures

In this section, we relax the previous assumptions by allowing for the exposures A1 and A2 to be conditionally dependent given X.

4.1 Estimation

We first consider the special case of binary exposures. When A1 and A2 are dichotomous, as when testing for gene-gene interaction between 2 possibly linked bi-allelic markers each with dominant or recessive mode of inheritance, then an arbitrary function d(Ai, Xi) can be written as Ai2Ai1d11 (Xi) + Ai2(1 − Ai1)d01 (Xi) + (1 − Ai2)Ai1d10 (Xi) + (1 − Ai2)(1 − Ai1)d00 (Xi) for given functions dkl(Xi), k, l = 0, 1. It follows that the set D of functions (A, X) satisfying (5) is the set D = {d(X)Δ(A, X);d(X) ∈ Rp} where

Δ(A,X)A1A2E{A1A2X}+(1A1)(1A2)E{(1A1)(1A2)X}A1(1A2)E{A1(1A2)X}(1A1)A2E{(1A1)A2X} (14)
={f(AX)}1[I{A1=A2}I{A1A2}] (15)

Hence the estimating equations (6) with dD can equivalently be written as

0=i=1nd(Xi)Δ(Ai,Xi)i(β) (16)

where d is a member of the set of all p × 1 functions of X.

When the exposures Ai1 and Ai2 are not both dichotomous and dependent conditional on Xi, we can use the following characterization of the set D of functions (A, X) satisfying (5).

Lemma (Tchetgen and Robins, 2008)

Let f* (A|X) = f* (A1|X) f* (A2|X) be any fixed density for A|X with A1 and A2 conditionally independent given X that is absolutely continuous with respect to the true density f (A|X). Then the set of functions D satisfying (5) is the set {d(A, X, r); r = r(A,X) ∈ Rp} where

d(A,X,r)=f(AX)f(AX)[r(A,X)E{r(A,X)A1,X}E{r(A,X)A2,X}+E{r(A,X)X}] (17)

and where the expectations E*(.) are taken w.r.t. f* (A|X).

When A1 and A2 are dichotomous and we choose f* (A|X) ≡ 1/4 w.p.1 and r (A, X) = 4d(Xi) [I {A1 = A2} - I {A1A2}], we obtain d(A, X, r) = d(Xi)Δ(Ai, Xi), thus establishing equation (16) as a special case of equation (17). For non-dichotomous A1 and A2, given any (user-supplied) density f* (A|X) = f* (A1|X) f* (A2|X) satisfying the conditions of the lemma, we can apply equation (17) to an arbitrary (user-supplied) function, say d(1)(A, X), to obtain a function d(Ai, Xi, d(1)) that satisfies equation (5).

An alternative way to map an arbitrary function d(1)(A, X) to an element of D is to apply the alternating conditional expectations (ACE) algorithm (Breiman and Friedman, 1985; Bickel et al., 1993). This is an iterative algorithm which, starting from d(1)(Ai, Xi), computes the repeating conditional expectations

d(2m)(Ai,Xi)=d(2m1)(Ai,Xi)E{d(2m1)(Ai,Xi)Ai1,Xi} (18)
d(2m+1)(Ai,Xi)=d(2m)(Ai,Xi)E{d(2m)(Ai,Xi)Ai2,Xi} (19)

for m = 1, 2, … until convergence at d(Ai, Xi, d(1)) = limm→∞ d(2m+1) (Ai, Xi). The function d(Ai, Xi, d(1)) then satisfies equation (5). Although both in D, d(Ai, Xi, d(1)) and d(Ai, Xi, d(1)) will generally differ. The function d(Ai, Xi, d(1)) exists in closed form and is easy to compute. In contrast, the function d(Ai, Xi, d(1)) cannot be expressed in closed form when A1 and A2 are both continuous, unlike when A1 and/or A2 is discrete (Bickel et al., 1993); even so, in general, d(Ai, Xi, d(1)) remains more difficult to compute than d(Ai, Xi, d(1)). Furthermore, we shall see below that (a weighted version) of d(Ai, Xi, d(1)) is needed to obtain a locally semiparametric efficient estimator.

Unlike in the previous section, there do not exist compatible models f (A1|A2, X; α1) and f (A2|A1, X; α2) with variation independent parameters when conditional dependence between both exposures is allowed for. Inference for β* therefore cannot be made robust to misspecification of either one of these conditional densities, and thus no consistent estimators can be obtained under model A(Mya1Mya2Mya1).

Remark

More precisely, it can be shown that we could construct compatible models for f (A1|A2, X) and f (A2|A1, X) with variation independent parameters α1 and α2 if we assume that, for chosen values a10, a20, the generalized odds ratio function ρ (A1, A2, X) = f(A1|A2, X)f(A1 = α10|A2 = α20, X)/{f(A1 = α10|A2, X)f(A1|A2 = α20, X)} is a known function, simply by specifying models f(A1|A2 = α20, X; α1) and f (A2|A1 = α10, X; β2) for f (A1|A2 = α20, X) and f (A2|A1 = α10, X). However, in practice, the assumption that ρ (A1, A2, X) is known would never be reasonable, except in the special case that the generalized odds ratio function is the constant function 1, which is equivalent to again assuming A1 and A2 are conditionally independent given X. However, if we did not restrict attention to variation independent models, it is possible to drop the assumption that ρ (A1, A2, X) is known and specify a model ρ (A1, A2, X; ς) for ρ (A1, A2, X) depending on a parameter vector ς. Then the model ρ (A1, A2, X; ς) together with the aforementioned models f (A1|A2 = α20, X; α1) and f (A2|A1 = α10, X; α2) induce compatible models for f (A1|A2, X) and f (A2|A1, X) with the parameter ς occurring in both. We could then construct consistent estimators of β* when either the model for f (A1|A2, X) or the model for f (A2|A1, X) is correct, because, using methods described in Chen (2007) and Tchetgen and Robins (2008), the common parameter ς can be consistently estimated if either the model f (A1|A2 = α20, X; α1) or the model f (A2|A1 = α10, X; α2) is correct.

We will therefore conduct inference for β* under model BidA(MyMaMya1), where we redefine Ma to be a parametric model for the conditional density of A, given X, of the form

f(AX)=f(AX;α)=f(A1X,A2;α1)f(A2X;α2).

Here, f(A1|X, A2; α1) and f(A2|X; α2) are known densities smooth in α1 and α2, and α=(α1,α2) is an unknown finite-dimensional parameter. Further, we define Mya2Mya2Ma. Let α^=(α^1,α^2) satisfy

0=i=1nHi1(α^1)i=1nα1lnf(Ai1Ai2,Xi;α1)α1=α^1 (20)
0=i=1nHi2(α^2)i=1nα2lnf(Ai2Xi;α2)α2=α^2 (21)

Hence α^2 is the MLE of α2 under model Ma, while α^1 is the MLE of α1 under both models Ma and Mya1.

Let Δ(Ai,Xi;α^), d(Ai,Xi,d(1);α^) and d(Ai,Xi,d(1);α^) be Δ(Ai, Xi), d(Ai, Xi, d(1)), and d(Ai,Xi, d(1)), except with the expectations now evaluated under f(A,X;α^). Given d(Xi) and d(1)(Ai, Xi), let d(Ai,Xi;α^) be d(Xi)Δ(Ai,Xi;α^) when Ai1 and Ai2 are binary and let d(Ai,Xi;α^) be either d(Ai,Xi,d(1);α^) or d(Ai,Xi,d(1);α^) otherwise, where the dependence of d(Ai,Xi;α^) on d or d(1) is suppressed. In all cases, the function d(α^)=d(Ai,Xi;α^) is an element of D(α^), where the set D(α^) is defined like the set D but with f(Ai,Xi;α^) replacing f (Ai|Xi) in equation (5).

Theorem 3 below shows that when g(.) is the identity link, the estimators β^β^(d(α^)) for a given d(α^)=d(Ai,Xi;α^) are multiply robust (in the sense of being CAN for β* under model Bid), where β^(d(α^)) solves

0=i=1nUi(β,γ^(β),α^)=i=1nd(Ai,Xi;α^)i(β,γ^(β)) (22)

with β^β^(d(α^)) still defined as in Section 3. Theorem 3 further shows that when g(.) is the exponential link, the estimators β^β^(d(α^)) obtained by solving (22) with i(β, γ) = Yi exp {−m(Ai, Xi,; β, γ)} − 1 are multiply robust in the sense of being CAN in the union model Bexp.

Theorem 3

Suppose that the regularity conditions stated in the supplemental materials (Vansteelandt et al., 2008) hold and that, β1, γ2, γ1 and α2 are variation independent. Suppose d(α^)D(α^). Then Parts 1 and 2 of Theorem 2 continue to hold with β^β^(d(α^)) replacing β^cip and model Bid(Bexp) replacing Bcipid(Bcipexp), with Ai1(α^1) and Ai2(α^2) now defined as in (20) and (21).

We propose two practical strategies for implementing the ACE algorithm when when A1 and A2 are both continuous. The first strategy is a numerical integration approach whereby the integrals

E{d(Ai,Xi;α)Aij,Xi;α}=d(Ai,Xi;α)f(AijAij,Xi;α)dAij

for j, j’ = 1, 2, jj’ in the ACE algorithm are approximated via numerical integration methods, such as the composite Simpson’s rule (with α replaced by α~). This requires that we can evaluate the function d(2m)(Ai, Xi; α) (and thus that we run the ACE algorithm) at a Sufficient number M of points (ai11, ai21), …, (ai1M, ai2M ), spread across the support of (A1, A2). These may be chosen for each given Xi separately by drawing a random sample from the joint distribution of (Ai1, Ai2), given Xi, and should additionally include the observed data points at the given Xi. Note that we opt for the composite Simpson’s rule because this merely requires knowing the function values of d(2m)(Ai, Xi; α) at the selected M points.

The second strategy is an ad-hoc approach which involves postulating separate high-dimensional models for the conditional expectations in (18) and (19) and fitting these each time using standard regression techniques (thus without postulating a model for f(A|X)), as in Breiman and Friedman (1985). A drawback of this approach is that it does not guarantee congenial models for the conditional expectations in (18) and (19) (i.e. there may be no joint law f(A|X) for which the postulated conditional expectations (18) and (19) hold for m = 1, 2, …). Nevertheless, we recommend this approach for data analysis because the numerical integration approach is computer intensive, generally does not lead to improved results in simulation studies (see Section 5) and, to the best of our knowledge, its convergence properties have not been studied, unlike those of the ad-hoc approach (Breiman and Friedman, 1985). Furthermore, while there may be concerns over using automatic model fitting for the conditional expectations in (18) and (19) in the sense that these may be more likely misspecified, these concerns are mitigated to some extent by the robustness property of our estimators.

Remark

Using (12) to estimate the asymptotic variance of β~ requires knowing the derivative E{∂Ui(β, γ, α)/∂α}. This is difficult when d(Ai, Xi; α) is obtained via the ACE algorithm because it then has no closed-form expression. However, a variance estimate can still be obtained by noting that, as shown in the supplemental materials (Vansteelandt et al., 2008), under models Bid (Bexp),

E{αUi(β,γ~(β),α)}α=α~=E{d(Ai,Xi;α~)Si(α~)i(β,γ~(β))} (23)

where Si(α~) is the score for under model Ma, evaluated at α~.

Expression (23) is not useful under the ad-hoc implementation of the ACE algorithm because the score Si(α) is then unknown. In that case, one might for simplicity choose to ignore estimation of α* when calculating the standard error of β^. Indeed, Theorem 2.3 in van der Laan and Robins (2003) assures that, if model Ma holds, ignoring efficient estimation of α* leads to conservative inferences for β* under model Bid (Bexp). Furthermore, because E{Ui(β,γ~,α)α}α=α~=0 and E{Gi(β,γ~,α)α}α=α~=0 when model My is correctly specified, estimation of α* does not affect the distribution of our estimator for β* at model My (see expression (12)). This approach is not attractive, however, because simulation studies in Section 5 show that ignoring estimation of α* in constructing our variance estimator may imply a serious loss of power when model My is misspecified. We therefore recommend the nonparametric bootstrap for inference under ad-hoc implementations of the ACE algorithm, as the bootstrap always provides a consistent estimator of the asymptotic variance under our assumptions.

4.2 Local semiparametric efficiency

We now consider how to obtain locally semiparametric efficient estimators. The key to doing so is the following characterization of the efficient score in model A. Let σ2(A, X) ≡ Var {∊(β*)|A, X}. In Theorem 4, we show that when Ai1 and Ai2 are binary, the efficient score for β* is Seff = dopt(Ai, Xi)∊i(β*) in model A with dopt(Ai,Xi)=dopt(Xi)Δ(Ai,Xi) and

dopt(Xi)=E{Δ2(Ai,Xi)σ2(Ai,XiXi)}1E{Δ(Ai,Xi)βq3(Ai,Xi;β)β=βXi}.

When A1i and A2i are both continuous, the efficient score Seff = dopt(Ai,Xi)∊i(β*) does not exist in closed form. However, regardless of the sample spaces of A1 and A2, we show in Theorem 4 that dopt(Ai,Xi) = limm→∞ d(2m+1)(Ai,Xi) is always the function to which the ν—weighted ACE algorithm defined by

d(2m)(Ai,Xi)=d(2m1)(Ai,Xi)υ(Ai,Xi)E{d(2m1)(Ai,Xi)Ai1,Xi}E{υ(Ai,Xi)Ai1,Xi} (24)
d(2m+1)(Ai,Xi)=d(2m)(Ai,Xi)υ(Ai,Xi)E{d(2m)(Ai,Xi)Ai2,Xi}E{υ(Ai,Xi)Ai2,Xi} (25)

for m = 1, 2, …, converges for the choices ν (Ai,Xi) = σ−2(Ai,Xi) and d(1)(Ai,Xi) = σ−2(Ai,Xi)∂q3(Ai,Xi; β)β=β*/∂β. The unweighted ACE algorithm defined by equations (18) and (19) is the special case of the ν—weighted ACE algorithm with ν (Ai,Xi) = ν* (Xi) only a function of Xi. For any d(1)(Ai,Xi) ∊ Rp and any always-positive function ν (Ai,Xi), the ν—weighted algorithm, like the unweighted algorithm converges to a function dν(Ai,Xi; d(1)), that satisfies equation (5). This last statement follows from the following arguments. First, d(Ai,Xi)υ(Ai,Xi)E{d(Ai,Xi)Aij,Xi}E{υ(Ai,Xi)Aij,Xi} for j ∊ {1, 2}, is the orthogonal projection of the univariate function d(Ai, Xi) on the closed linear subspace Λj = {d(Ai, Xi); E {d(Ai, Xi)|Aij, Xi} = 0} in the Hilbert space of functions of (Ai, Xi) with inner product ⟨d1, d2⟩ ≡ E [{ν (Ai, Xi)}−1 d1(Ai, Xi)d2(Ai, Xi)]. It then follows from a theorem of Von Neumann (Bickel et al., 1993, p.436) that dν(Ai, Xi; d(1)) is the projection of d(1)(Ai, Xi) on the linear space Λ = Λ1 ⋂ Λ2, which is precisely the subspace satisfying equation (5).

We now explain how to obtain a locally efficient estimator of β*. Consider the model

Var((β)A,X)=σ2(A,X;η) (26)

where σ2(A, X; η) is a known function, smooth in η, and η* is an unknown parameter vector. Let η^ satisfy

0=i=1nHi3(η^)=i=1ns(Ai,Xi){i2(β^,γ^(β^))σ2(Ai,Xi;η^)}

where s(Ai, Xi) is a vector of user-supplied functions of the dimension of η, and β^=β^(d(α^)) for a given d(α^)=d(Ai,Xi;α^)D(α^). Note that, with any positive ν (Ai, Xi) and any d(1)(Ai, Xi) as input, the ν—weighted ACE algorithm, based on f(AiXi;α^) rather than on f (Ai|Xi), outputs a function d(Ai,Xi;α^)D(α^).

Theorem 4

(i)The efficient score for β* in model A is dopt(Ai,Xi)∊i(β*)

  1. with dopt(Ai,Xi)=dopt(Xi)Δ(Ai,Xi) for binary Ai1 and Ai2;

  2. with dopt(Ai,Xi) = limm→∞ d(2m+1)(Ai,Xi) in the ν—weighted ACE algorithm with ν (Ai, Xi) = σ−2(Ai, Xi) and d(1) (Ai, Xi) = σ−2(Ai, Xi) ×∂q3(Ai, Xi; β)|β=β*|/∂β in general.

(ii) Let β^(d(α^)) and β^(dopt(α^,η^)) solve (22) where d(α^)d(Ai,Xi;α^)D(α^) and dopt(α^,η^)=dopt(Ai,Xi;α^,η^)D(α^) is the function to which the ν—weighted ACE algorithm based on f(AiXi;α^) converges for υ(Ai,Xi)=σ2(Ai,Xi;η^) and d(1)(Ai,Xi)=σ2(Ai,Xi;η^)q3(Ai,Xi;β)β=ββ. Then, β^(d(α^)) and β^(dopt(α^,η^)) are RAL estimators in models Bid or Bexp. If, in addition, the true distribution of the data lies in the intersection submodel AMyMa and model (26) holds, then the difference between the asymptotic variance matrices of β^(d(α^)) and β^(dopt(α^,η^)) is non-negative definite, with the asymptotic variance of β^(dopt(α^,η^)) equalling {Var (Seff)}−1 = [Var {dopt(Ai,Xi)∊i(β*)}]−1.

It follows from Part (ii) of Theorem 4 that β^(dopt(α^,η^)) is a locally semiparametric efficient of β* in model A (and following the general results in Robins and Rotnitzky (2001) then also in models Bid or Bexp) at the intersection submodel in which model (26) and models My and Ma all hold.

5 Simulation study

We conducted a simulation experiment to evaluate the behaviour in finite samples of the multiply robust estimators for statistical interaction parameters. Each experiment was based on 1000 replications of random samples of size 500 generated as follows. Exposures were generated as A1 = 1 + X + δU + ∊1 and A2 = 1 — X + δU + ∊2, where X, U, ∊1 and ∊2 are four independent standard normal variates and where δ was set to 0 or 1 to represent settings without and with conditionally independent exposures, given X, respectively. The outcome was generated as Y = −1 + A1 + A2A1A2 + X + λ(A1A2)X + ∊, where ∊ is a standard normal variate and λ was set to 0 and −2.

In each simulation experiment, 4 estimators were calculated under model A with q3(A, X; β) = βA1A2. The first is an ordinary least squares (OLS) estimate under working model My, which is defined by q2(A2,X;γ2)=γ2A2, q1(A1,X;γ1)=γ1A1 and h(X;γ0)=γ0,0+γ0,1X. The second is an efficient G-estimate (G) (Robins, Mark and Newey, 1992), assuming that q2(A2,X;γ2)=γ2A2 and q1(A1,X;γ1)=γ1A1 and that, in addition, model My holds or model M2G holds, which is defined by (correctly specified) second-order linear regression models for E(Aj|X), j = 1, 2 and a (correctly specified) third-order linear regression model for E(A1A2|X). The third (CI) is obtained by solving (7) assuming that either model My holds, or model M2CI holds, which is defined by second-order linear regression models for E(Aj|X), j = 1, 2 and the assumption that A1A2|X. The fourth (ACE) is obtained by solving (22) under working model My, having first applied the ACE algorithm under the ad-hoc strategy of Section 4, using linear regression models for the conditional expectations in (18) and (19) which involve third-order poly-nomials in Aj (j = 1 and 2, respectively) along with interactions with X, and third-order polynomials in X, and assuming a constant residual variance.

The results of the simulation study are summarized in Table 1 and Figure 1. Variance estimates were obtained via the ordinary nonparametric bootstrap based on 500 resamples for the ACE- and CI-estimate, using sandwich estimators for the G-estimator and using the Fisher information matrix for the OLS estimator. Reported coverage for the ACE and CI-estimates is based on 95% basic bootstrap intervals.

Table 1.

Bias, variance, empirical variance and Type I error rate (α) of tests performed at the 5% signi cance level.

(δ, λ) Estimator Bias Variance Empirical Var α
(0,0) ACE 9 10-5 0.0023 0.0023 0.045
CI 6 10-4 0.0025 0.0024 0.037
G 8 10-4 0.00067 0.00072 0.066
OLS 4 10-4 0.00041 0.00042 0.058
NI 6 10-4 - 0.0010 -

(0,-2) ACE 9 10-5 0.0023 0.0023 0.045
CI -0.024 0.14 0.11 0.066
G 1.32 0.0056 0.0058 1.00
OLS 2.40 0.0050 0.019 1.00
NI 0.063 - 0.028 -

(1,0) ACE 3 10-4 0.0013 0.0013 0.050
CI 3 10-3 0.00059 0.00052 0.053
G 1 10-4 0.00028 0.00029 0.054
OLS 5 10-5 0.00023 0.00023 0.050
NI 5 10-4 - 0.0011 -

(1,-2) ACE 0.00029 0.0013 0.0013 0.050
CI 0.12 0.016 0.016 0.24
G 0.57 0.0062 0.0063 1.00
OLS 1.33 0.0056 0.029 1.00
NI -0.30 - 0.035 -

Figure 1.

Figure 1

Power of 4 statistical interaction tests of the null hypothesis that β* = −1: OLS (long-short dashed), G (long dashed), CI (dotted), ACE (solid).

The results indicate that the ad-hoc implementation of the ACE algorithm yields unbiased estimators for the statistical interaction parameter under each of the four data-generating models. This is because the chosen conditional mean models for (18) and (19) in the ACE algorithm were Sufficiently flexible to yield approximately correctly specified models. None of the other estimators shares this property: the OLS and G-estimates are biased whenever the main effects of A1 and A2 are misspecified (i.e. λ ≠ 0), although the OLS estimates are more severely biased. Estimate CI is biased when, in addition, A1 and A2 are conditionally dependent, given X (i.e. λ ≠ 0 and δ ≠ 0). The price to pay for the increased robustness of our estimators is a loss of efficiency. This loss can be important when the conditional mean model for the outcome is correctly specified, but overall, reasonable efficiency was obtained with the semiparametric approach. Estimates obtained via the ACE algorithm were substantially more precise than those obtained under the conditional independence assumption (CI) when the conditional mean model My was incorrectly specified, even when the exposures were conditionally independent given X. This is in conformity with the fact that, whenever model My is incorrectly specified, one may gain efficiency by estimating the exposure distribution under a model that fails to impose a priori known restrictions such as the conditional independence of the exposures (van der Laan and Robins, 2003). Curiously, the CI estimate is much more precise than the estimate obtained using the ACE algorithm when the conditional independence assumption fails and the conditional mean model My is correctly specified. This is because the index function d(Ai, Xi) of the CI estimate is much more variable than the corresponding function obtained via the ACE algorithm when the exposures are conditionally dependent, given X. For example, in the extreme case that A1 = A2 w.p.1, d(Ai, Xi) = 0 is the only solution to (5) and hence no multiply robust root-n estimators for β* exist under laws at which A1 = A2 w.p.1, whilst the estimating functions in (7) yield root-n estimators of β* under such laws (however, only when the conditional mean model My is correctly specified). We also evaluated doubly robust estimators obtained by replacing γ* by an ordinary least squares estimate (instead of a G-estimate). This had no impact on the bias and variance of the doubly robust estimators obtained by the ACE algorithm because these are based on correctly specified models for the exposure distribution f(A|X). However, it did impact the bias and variance of the CI estimators under the simulation experiments with conditional dependence: (δ, λ) = (1, 0) (bias −3 10−4, bootstrap variance 0.00034, empirical variance 0.00035, Type I error rate 0.063) and (δ, λ) = (1, −2) (bias 0.23, bootstrap variance 0.017, empirical variance 0.019, Type I error rate 0.48).

Table 1 further shows results for the numerical integration approach of Section 4 with m = 100 and using (correctly specified) third-order linear regression models with constant variance and normal errors for the conditional distributions of A1, given (A2, X), and of A2, given (A1, X). The complexity of these models warrants use of the bootstrap for inference. However, no bootstrap-based variance estimates are reported because the numerical integration approach was extremely time-consuming. Table 1 shows that the obtained estimates (NI) are more efficient than those obtained under the ad-hoc strategy when the conditional mean model for the outcome is correctly specified, but they are biased and have greater imprecision otherwise. This is due to numerical approximation error and the fact that, when the conditional mean model for the outcome is incorrectly specified, the estimation procedure relies more heavily on restriction (5), and thus on the numerical integration. Indeed, the bias of the estimates diminished noticeably upon repeating the numerical integration approach for m = 200, at the expense of a serious increase in computation time.

6 Data analysis

To illustrate the methods, we re-analyze data from a placebo-controlled randomized trial conducted in 1989-1990 in the UK to study blood pressure reduction, as described in Goetghebeur and Lapp (1997). The trial started with a run-in period of 4 weeks whereby all patients received placebo tablets and after which they were randomized to 4 weeks of one of two active treatments (A or B) or placebo. Diastolic blood pressure measurements were taken every 2 weeks. For illustration, we analyze the subset of 105 patients randomized to treatment A or placebo, ignoring 2 patients with missing outcome data. Figure 2 shows a profile plot of the data.

Figure 2.

Figure 2

Profile plot of diastolic blood pressure in 2 treatment arms.

Let Y denote diastolic blood pressure, A1 be a binary variable taking the value 1 for patients randomized to the experimental treatment A during the active study period and 0 otherwise, A2 denote time in days since enrollment into the study and X measure centered body weight (in kg). Fitting the following model

E(YA1,A2,X)=γ0+γ1X+γ2A2+β1A1A2+β2A1A2X

using generalized estimating equations with exchangeable working correlation, yields β^1=0.12 (SE 0.029) and β^2=0.0089 (SE 0.0044). This suggests that the average change in diastolic blood pressure per day is 0.12 (95% CI 0.067 - 0.18) higher in the experimental treatment arm than in the placebo arm among patients of average body weight. This difference reduces with 0.0089 (95% CI 0.00027 - 0.017) per kg increase in body weight.

To examine whether these results continue to hold, even under possible misspecification of the time evolution and possible interactions of time with body weight, we use the methods developed in this article. These methods, including the efficient score expressions, continue to hold for correlated data provided that scalar outcomes Yi are replaced by vectors that contain all outcome measurements for the ith cluster, and likewise for the remaining data Ai, Xi, etc. In the analyses below, we let σ2(Xi; β, γ) in (26) be the working covariance matrix obtained via generalized estimating equations and use the bootstrap for inference. Under the valid assumption that the observation times are independent of assigned treatment, given body weight, we now estimate that the average change in diastolic blood pressure per day is 0.14 (95% CI 0.075 - 0.18) higher in the experimental treatment arm than in the placebo arm among patients with average body weight. This difference reduces with 0.014 (95% CI −0.0044 - 0.023) per kg increase in body weight. These estimates are distribution-free because E(A1|X) = E(A1) by the fact that randomization happened independently of body weight, and likewise, because E(A2|X) = E(A2) by the fact that the study design was completely balanced in time. In particular, the obtained estimates will be valid, even if the main effects of time and body weight (and possible interactions between both) have been incorrectly specified. Using the ACE algorithm we obtain similar, but slightly less efficient estimates of 0.13 (95% CI 0.075 - 0.18) for the main effect and 0.017 (95% CI −0.0078 - 0.028) for the interaction. All results confirm that the reduction in blood pressure over time is significantly different in both treatment arms. Given that standard estimates for statistical interactions can be very sensitive to the model for the main effects, these new results are more trustworthy, at the expense of a relatively limited degree of precision loss.

Alternatively we could have used randomization inference (Rosenbaum, 2002) for estimating the 2 considered interactions. This would also yield distribution-free inference by the fact that the ‘main’ effect of A1 can be assumed to be zero and thus its misspecification is not at issue. Even so, the proposed multiply robust estimators enjoy a greater attraction than estimators obtained from a computationally more involved randomization inference, because they are available in closed-form. Furthermore, it is unclear how randomization inference could protect against misspecification of both main exposure effects. For further discussions on randomization inference versus semiparametric inference, see Robins (2002).

7 Conclusion

In this article, we have developed multiply robust estimators for statistical interaction parameters indexing additive or multiplicative conditional mean models. The estimators in the additive model are especially attractive in settings where the distribution of exposure given the extraneous covariates X is known, as is generally the case in randomized follow-up studies and family-based genetic association studies, because they can be used to construct asymptotically distribution-free tests of the no-interaction hypothesis, even when the vector X is high dimensional with continuous components. This makes our approach distinct from existing approaches, which ignore prior information on the exposure distribution. Our proposed approach can be used quite generally, even when, as in most observational studies, no such prior information is available, because an inference concerning an interaction effect under our approach has multiple chances, rather than only one chance, to be correct or nearly correct. In future work, we will apply the proposed estimators to develop scale-invariant interaction tests based on the Sufficient-component cause framework. In addition, we will extend the proposed methods to allow for ascertainment conditions, such as frequently encountered in genetic association studies, whereby data are sampled conditional on the outcome.

Supplementary Material

2

Acknowledgements

We are grateful to Eric Tchetgen, the Editor, the Associate Editor and 2 referees for helpful comments. The first author acknowledges support from IAP research network grant nr. P06/03 from the Belgian government (Belgian Science Policy).

References

  1. Chamberlain G. Asymptotic efficiency in estimation with conditional moment restrictions. Journal of Econometrics. 1987;34:305–334. [Google Scholar]
  2. Chen HY. A semiparametric odds ratio model for measuring association. Biometrics. 2007;63:413–421. doi: 10.1111/j.1541-0420.2006.00701.x. [DOI] [PubMed] [Google Scholar]
  3. Chen ZH. Fitting multivariate regression-functions by interaction spline models. Journal of the Royal Statistical Society Series B. 1993;55:473–491. [Google Scholar]
  4. Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics. 2002;11:2463–2468. doi: 10.1093/hmg/11.20.2463. [DOI] [PubMed] [Google Scholar]
  5. Epstein MP, Allen AS, Satten GA. A simple and improved correction for population stratification in case-control studies. American Journal of Human Genetics. 2007;80:921–930. doi: 10.1086/516842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fan J, Li R. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association. 2004;99:710–723. [Google Scholar]
  7. Flanders WD. On the relationship of sufficient component cause models with potential outcome (counterfactual) models. European Journal of Epidemiology. 2006;21:847–853. doi: 10.1007/s10654-006-9048-3. [DOI] [PubMed] [Google Scholar]
  8. Friedman J. Multivariate adaptive regression splines (with discussion) Annals of Statistics. 1991;19:1–141. [Google Scholar]
  9. Goetghebeur E, Lapp K. The effect of treatment compliance in a placebo-controlled trial: regression with unpaired data. Applied Statistics. 1997;46:351–364. [Google Scholar]
  10. Goetgeluk S, Vansteelandt S. Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics. 2007 doi: 10.1111/j.1541-0420.2007.00944.x. doi: 10.1111/j.1541-0420.2007.00944.x. [DOI] [PubMed] [Google Scholar]
  11. Greenland S. Basic problems in interaction assessment. Environmental Health Perspectives. 1993;101:59–66. doi: 10.1289/ehp.93101s459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Greenland S, Brumback B. An overview of relations among causal modelling methods. International Journal of Epidemiology. 2002;31:1030–1037. doi: 10.1093/ije/31.5.1030. [DOI] [PubMed] [Google Scholar]
  13. Greenland S, Poole C. Invariants and noninvariants in the concept of interdependent e ects. Scandinavian Journal of Work Environment and Health. 1988;14:125–129. doi: 10.5271/sjweh.1945. [DOI] [PubMed] [Google Scholar]
  14. Koopman JS. Interaction between discrete causes. American Journal of Epidemiology. 1981;113:716–724. doi: 10.1093/oxfordjournals.aje.a113153. [DOI] [PubMed] [Google Scholar]
  15. Lin DY, Ying Z. Semiparametric and nonparametric analysis of longitudinal data (with discussion) Journal of the American Statistical Association. 2001;96:103–126. [Google Scholar]
  16. Louis TA. General methods for analyzing repeated measures. Statistics in Medicine. 1988;7:29–45. doi: 10.1002/sim.4780070108. [DOI] [PubMed] [Google Scholar]
  17. Mantel N, Brown C, Byar DP. Tests for homogeneity of effect in an epidemiologic investigation. American Journal of Epidemiology. 1977;106:125–129. doi: 10.1093/oxfordjournals.aje.a112441. [DOI] [PubMed] [Google Scholar]
  18. Miettinen OS. Causal and preventive interdependence: Elementary principles. Scandinavian Journal of Work Environment and Health. 1982;8:159–168. doi: 10.5271/sjweh.2479. [DOI] [PubMed] [Google Scholar]
  19. Miettinen OS. Modern Epidemiology. John Wiley; New York: 1985. [Google Scholar]
  20. Neuhaus JM, Kalbfleisch JD. Between- and within-cluster covariate effects in the analysis of clustered data. 1998;54:638–645. [PubMed] [Google Scholar]
  21. Robins JM, Mark SD, Newey WK. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed] [Google Scholar]
  22. Robins JM, Rotnitzky A, van der Laan M. Comment on ‘On Profile Likelihood’ by S. A. Murphy and A. W. van der Vaart. Journal of the American Statistical Association. 2000;95:431–435. [Google Scholar]
  23. Robins JM, Rotnitzky A. Inference for semiparametric models: Some questions and an answer - Comments. Statistica Sinica. 2001;11:920–936. [Google Scholar]
  24. Robins JM. Comment on ‘Covariance adjustment in randomized experiments and observational studies’, by P. R. Rosenbaum. Statistical Science. 2002;17:286–327. [Google Scholar]
  25. Robins J, Li L, Tchetgen E, van der Vaart A. IMS Collections Probability and Statistics: Essays in Honor of David A. Freedman. Vol. 2. 2008. Higher order influence functions and minimax estimation of nonlinear functionals; pp. 335–421. [Google Scholar]
  26. Rosenbaum PR. Covariance adjustment in randomized experiments and observational studies. Statistical Science. 2002;17:286–304. [Google Scholar]
  27. Rothman KJ. Causes. American Journal of Epidemiology. 1976;104:587–592. doi: 10.1093/oxfordjournals.aje.a112335. [DOI] [PubMed] [Google Scholar]
  28. Rothman KJ, Greenland S. Modern Epidemiology. Lippincott-Raven; Philadelphia, PA: 1998. [Google Scholar]
  29. Tchetgen ET, Robins JM. Technical report. Dept. of Epidemiology, Harvard school of Public Health; 2008. On doubly robust estimation in a semiparametric odds ratio model. [Google Scholar]
  30. Umbach DH, Weinberg CR. The use of case-parent triads to study joint meets of genotype and exposure. American Journal of Human Genetics. 2000;66:251–261. doi: 10.1086/302707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. van der Laan MJ, Robins JM. Unified Methods for Censored Longitudinal Data and Causality. Springer-Verlag; New-York: 2003. [Google Scholar]
  32. van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
  33. VanderWeele TJ. Sufficient cause interactions and statistical interactions. Epidemiology. 2008 doi: 10.1097/EDE.0b013e31818f69e7. in press. [DOI] [PubMed] [Google Scholar]
  34. VanderWeele TJ, Robins JM. The identification of synergism in the Sufficient-component cause framework. Epidemiology. 2007;18:329–339. doi: 10.1097/01.ede.0000260218.66432.88. [DOI] [PubMed] [Google Scholar]
  35. VanderWeele TJ, Robins JM. Empirical and counterfactual conditions for Sufficient cause interactions. Biometrika. 2008;95:49–61. [Google Scholar]
  36. Vansteelandt S, Rotnitzky A, Robins JM. Estimation of regression models for the mean of repeated outcomes under non-ignorable non-monotone non-response. Biometrika. 2007;94:841–860. doi: 10.1093/biomet/asm070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Vansteelandt S, VanderWeele TJ, Robins JM. Supplemental materials for “Multiply robust inference for statistical interactions”. 2008 doi: 10.1198/016214508000001084. http://www.amstat.org/publications/jasa/supplemental_materials [DOI] [PMC free article] [PubMed]
  38. Verbeke G, Spiessens B, Lesaffre E. Conditional linear mixed models. The American Statistician. 2001;55:25–34. [Google Scholar]
  39. Zeger SL, Diggle PJ. Semi-parametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics. 1994;50:689–699. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

2

RESOURCES