Abstract
A sensitivity analysis technique is developed to assess the sensitivity of interaction analyses to unmeasured confounding. Bias formulas for sensitivity analysis for interaction under unmeasured confounding are given on both additive and multiplicative scales. Simplified formulas are provided in the case in which either one of the two factors does not interact with the unmeasured confounder in its effects on the outcome. An interesting consequence of the results are that if the two exposures of interest are independent (e.g. gene-environment independence) then even under unmeasured confounding if the estimate of interaction is non-zero then either there is a true interaction between the two factors or there is an interaction between one of the factors and the unmeasured confounder; an interaction must be present in either scenario. The results are applied to two examples drawn from the literature.
Keywords: Bias Analysis, Gene-Environment, Independence, Interaction, Sensitivity Analysis, Unmeasured Confounding
Introduction
Unmeasured confounding is a challenge in epidemiologic research and can bias effect measures. Various sensitivity analysis techniques have been developed and employed in the literature to assess the extent to which an unmeasured confounder would have to affect both the exposure and the outcome in order to change the qualitative conclusions drawn from an analysis [1-7]. Most of this literature has focused on sensitivity analysis for the overall effect. To the best of our knowledge there is at present no sensitivity analysis technique available to assess the impact of unmeasured confounding for interactions.
Biological and chemical exposures may interact with one another in producing their effects. The effects of such exposures may also be modified by various genetic factors and as genetics research progresses, gene-gene and gene-environment interaction are gaining increasing prominence in the literature. Although genetic factors often are assumed effectively randomized, environmental factors and biological and chemical exposures are subject to the same confounding as they would be in any observational study. In some studies the effects of genetic factors may be confounded by population stratification if adequate control for this has not been made. Unmeasured confounding is clearly an issue in such gene-environment interaction analyses.
In this paper we develop a sensitivity analysis approach using bias formulas to assess the sensitivity of interaction estimates to the presence of an unmeasured confounder. We consider a general setting in which one or more unmeasured confounders may affect both factors of interest. We give results for both additive and multiplicative scales. In addition to the general case, we also consider several more specific cases such that: (a) the unmeasured confounder affects only one of the two exposures, or (b) the unmeasured confounder does not interact with one of the exposures in its effects on the outcome or (c) the two exposures are independent of one another (e.g. gene-environment independence).
Notation and Definitions
We will let G and E denote our two factors or exposures of interest. These might well represent genetic and environmental factors respectively but nothing in our development will restrict the application to only gene-environment interaction. The two factors might both be environmental or both genetic. We will let Y denote the outcome of interest. The two exposures and the outcome may be binary or continuous. We let Yge denote the counterfactual outcome or potential outcome [8,9] for Y for each individual if possible contrary to fact the first exposure had been set to g and the second to e. Thus, if the two exposures were both binary then, for each individual there would be four counterfactual or potential outcomes, Y11, Y10, Y01 and Y00. We do not know the counterfactual outcomes for each individual but we can hope to estimate them on average for the population. For example, if the two factors were both randomized we could consistently estimate by , by , etc.
In an observational study, the exposures are not randomized and estimates are potentially subject to confounding. Thus an investigator instead typically tries to collect data on a set of covariates C that suffices to control for this confounding. Essentially, within strata of C the groups with different exposure status should be comparable. More formally we use A ∐ B|C to denote that A is independent of B conditional on C. We say that the effects of G and E on Y are unconfounded given C if for all g and e, Yge∐{G, E}|C. If the effects of G and E on Y are unconfounded given C then we can consistently estimate by . Often the set of measured covariates C will not suffice to control for confounding. Instead we might hypothesize a set of unmeasured confounders U such that the effects of G and E on Y are unconfounded given {C, U: i.e. Yge∐{G, E}|{C, u}. Unfortunately if we do not have data on U we cannot stratify on or otherwise adjust for U.
If we only have data on C and we are interested in interaction on the additive scale then we would typically use the following measure for additive interaction
(1) |
An additive interaction measure of 0 corresponds to exact additivity (i.e. no additive interaction). If the effects of G and E on Y are unconfounded given C then this will consistently estimate the true causal interaction on the additive scale:
(2) |
If however there are one or more unmeasured confounding variables U such that the effects of G and E on Y are unconfounded given {C, U} but the effects are not unconfounded given only C, then the estimate in (1) will not be consistent for the causal interaction in (2). If the effects of G and E on Y are unconfounded given {C, U} i.e. Yge∐{G, E}|{C, u} then the true causal interaction in (2) is equal to:
(3) |
However, without data on U we cannot estimate the expression in (3).
In the remainder of this paper we will develop sensitivity analysis results to attempt to address this problem. We will express the difference between (1) and (2) as the bias for the interaction on the additive scale and we will derive expressions for the bias in terms of sensitivity analysis parameters that relates the effects of the unmeasured confounder(s) U to the exposures and to the outcome. One can consider a variety of different sensitivity analysis parameters and assess how an unmeasured confounder with the properties indicated by the sensitivity analysis parameters would affect conclusions drawn about the causal interaction in (2) from the estimated interaction in (1). All inferences will be for interaction parameters for the overall population within strata of covariates C, though we also make comments below about interaction parameters marginalized over C.
In the next section we will give sensitivity analysis results for interaction on the additive scale. In the following section we will then consider and give analogous results for the mul tiplicative interaction scale. The additive scale is generally considered the scale that is most relevant for assessing public health implications of interaction [10-13]; the additive scale is also most closely related to the notion of synergism within the sufficient cause framework and these relations are described in detail elsewhere [13-16]. The sensitivity analysis technique for additive interaction will thus also be useful if investigators want to reason about synergism in the sufficient cause framework. Although additive interaction is most relevant for both public health purposes and for assessing mechanistic interaction, the multiplica tive scale is what is most often used in practice (generally out of convenience) and thus we consider also sensitivity analysis techniques for interaction parameters on the multiplicative scale. Finally, sometimes case-control data is used and a multiplicative model is fit but the risk ratios or odds ratio from the multiplicative model are used to estimate measures of additive interaction using Rothman's “relative excess risk due to interaction” [17-19] and we thus give sensitivity analysis techniques for this measure as well.
Before proceceding, one final point deserves attention. In this paper we consider measures of causal interaction, that is to say measures that examine the extent to which an outcome would change under interventions on both of the factors of interest. This is different than mere “effect heterogeneity” in which the effect of one intervention varies across strata of another variable (which is not intervened upon) [20,21]. The sensitivity analysis techniques in this paper apply to causal interaction. If effect heterogeneity is in view, then sensitivity analysis techniques for causal effects of a single exposure [7] could be applied to each stratum of the secondary factor separately. No additional theory or results are needed for sensitivity analysis in this setting.
Sensitivity Analysis for Interactions on the Additive Scale
The results that follow will relate the causal interaction contrast in (2) to the interaction estimate with the data in (1). We define the bias on the additive scale as
(4) |
The following Theorem 1 gives a general formula for the bias for the interaction on the additive scale, Badd, in terms of various sensitivity analysis parameters. Proofs of all results are given in the Appendix.
Theorem 1. Suppose that for all g and e, Yge∐{G, E}|{C, u} and for any particular reference level u′ of U define . We then have that
The use of Theorem 1 requires specifying what might be interpreted as the effect of U in each strata of G and E, , and also the distribution of U in each strata of G and E, P (ujgi; ej; c), along with the prevalence of U overall, P (ujc). Each of these could be taken as sensitivity analysis parameters. Theorem 1 applies to measures of additive interaction conditional on C = c. If the sensitivity analysis parameters are assumed to be constant over C then the result also applies immediately to measures of additive interaction marginalized over C. Alternatively, an investigator could specificy different sensitivity analysis parameters for each level of C and marginalize the stratum-specific corrected estimates over C. However, as can be seen the use of Theorem 1 in its most general form requires the specification of a large number of sensitivity analysis parameters. Similar comments apply also to the results below.
Under the simplifying assumption that U does not interact with one of the two factors on the additive scale, the expression for the bias on the additive scale, Badd, simplifies considerably as stated in the next corollary.
Corollary 1A. Suppose that the effect of G and E on Y are unconfounded conditional on {C, U}. Suppose further that U is binary and that for fixed c, and are constant across strata of g so that G does not interact with U on the additive scale and let δ1 = P (U = 1|g1, e1, c) – P (U = 1|g0, e1, c) and δ0 = P (U = 1|g1, e0, c) – P (U = 1|g0, e0, c) then
To use Corollary 1A, one needs to specify far fewer parameters than in Theorem 1. One simply needs to specify the effect of U, , for E = e1 and E = e0, along with the prevalence difference of U comparing G = g1 and G = g0, δj = P (U = 1|g1, ej, c) – P (U = 1|g0, ej, c), for E = e1 and E = e0. The use of Corollary 1A is far more straightforward than that of Theorem 1 but requires the stronger assumptions that U is binary and that G does not interact with U on the additive scale.
Note by symmetry if E does not interact with U on the additive scale then where , .
Suppose now that U were only a confounder for E and that we had G×E independence in the sense that {E, U}∐G|C we then have the following result.
Corollary 1B. Suppose that the effects of G and E on Y are unconfounded conditional on {C, U} and we have G×E independence in the sense that {E, U}∐G|C then if U does not interact with G on the additive scale in the sense that is constant across g then Badd = 0.
By symmetry, if U were only a confounder for G and we had G×E independence in the sense that {E, U}∐G|C then if U does not interact with E on the additive scale then Badd = 0. Note that Corollary 1B does not assume that U is binary.
Remark 1. Suppose that U were an environmental factor that was a confounder only for E, not G. An interesting consequence of Corollary 1B is that if we have G×E independence and if we found that the our estimated measure of interaction in (1) were non-zero then if there is no interaction between U and G on the additive scale then Badd = 0. Thus if we found our estimated measure of interaction in (1) were non-zero then either there is an actual G×E interaction (because Badd = 0 and the estimated interaction is equal to the causal interaction) or there is a G×U interaction, another form of gene-environment interaction. Essentially, under gene-environment independence, even with unmeasured confounding we have some form gene-environment interaction either with E or with U.
A result similar to Corollary 1B holds under G×E independence if there is an unmeasured genetic confounder U1 for G and another unmeasured environmental confounder U2 for E that are binary and independent of one another. In this case, if G doesn't interact with U2 on the additive scale, and E doesn't interact with U1 on the additive scale and if U1 doesn't interact with U2 on the additive scale then Badd = 0. Thus if the estimated interaction measure in (1) were non-zero one could include either a true causal G×E interaction or a G×U1 interaction or a E×U2 interaction or a U1×U2 i.e. some form of gene-environment interaction would be present. A formal statement of the result is given in the Appendix as Corollary 1C.
Sensitivity Analysis for Interactions on the Multiplicative Scale
On the multiplicative scale if we had data on covariates C we would typically use the following measure for multiplicative interaction on the risk ratio scale
(5) |
An multiplicative interaction measure of 1 corresponds to exact multiplicativity (i.e. no multiplicative interaction).If the effects of G and E on Y are unconfounded given C then this will consistently estimate the true causal interaction on the multiplicative scale:
(6) |
If however there are one or more unmeasured confounding variables U such that the effects of G and E on Y are unconfounded given {C, U} i.e. i.e. Yge∐(G, E}|(C, U} but the effects are not unconfounded given only C, then the estimate in (5) will not be consistent for the causal interaction in (6). We can then define the bias on the multiplicative scale as
The following theorem gives a general formula for the bias for the interaction on the multi plicative scale, Bmult, in terms of various sensitivity analysis parameters.
Theorem 2. Suppose that for all g and e, Yge∐(G, E}|(C, U} and for any particular reference level u′ of U define then we have that
The use of Theorem 2 in its most general form requires the specification of a large number of sensitivity analysis parameters. Under the simplifying assumption that U does not interact on the multiplicative scale with one of the two factors, the expression for the bias on the multiplicative scale, Bmult, simplifies considerably as stated in the next corollary. Note that it is not in general possible for U to not interact with a specific factor, say G, on both the additive and the multiplicative scales (unless it has no effect on the outcome or if e.g. the baseline risk in the absence of U is the same in all strata of G). In general, at most, either no additive or no multiplicative interaction between G and U would hold.
Corollary 2A. Suppose that the effect of G and E on Y are unconfounded conditional on {C, U}. Suppose further that U is binary and that and constant across strata of g so that G does not interact with U on the multiplicative scale then
A similar result holds by symmetry if E does not interact with U on the multiplicative scale. If U were only a confounder for E and that we also had G×E independence in the sense that {E, U}∐G|C we then have the following result.
Corollary 2B. Suppose that the effect of G and E on Y are unconfounded conditional on {C, U} and we have G×E independence in the sense that {E, U}∐G|C then if U does not interact with G on the multiplicative scale in the sense that is constant across g, then Bmult = 1.
Remark 2. From Corollary 2B it follows that if under G×E independence we found that our estimated measure of interaction in (5) were non-null then either there is an actual G×E interaction or there is a G×U interaction. The result would apply to multiplicative interaction estimates from both a case-control or a case-only study design. Note that by symmetry, if U were only a confounder for E and that we had G×E independence in the sense that {E, U}∐G|C then if U does not interact with G on the multiplicative scale then Bmult = 1. Note also that Corollary 1B does not assume that U is binary.
A result similar to Corollary 2B holds under G×E independence if there is an unmeasured genetic confounder U1 for G and another unmeasured environmental confounder U2 for E that are binary and independent of one another. In this case, if G doesn't interact with U2 on the multiplicative scale, and E doesn't interact with U1 on the multiplicative scale and if U1 doesn't interact with U2 on the multiplicative scale then Bmult = 1. Thus if the estimated interaction measure in (5) were non-null one could include either a true causal G×E interaction or a G×U1 interaction or a E×U2 interaction or a U1xU2 i.e. some form of gene-environment interaction would be present. A formal statement of the result is given in the Appendix as Corollary 2C.
Sensitivity Analysis for the Relative Excess Risk Due to Interaction
Often in case-control studies, logistic regression is used to accommodate the case-control design. In such studies, if investigators want to assess interaction on the additive scale for public health purposes [10-13] or to assess biologic interaction [13-16] then a measure referred to as the relative excess risk due to interaction (RERI) [17-19] is sometimes used. The measure is also sometimes used when a logistic regression model is fit to the data out of convenience rather than by necessity due to a case-control design. The RERI conditional on c would generally be estimated by
(7) |
If the outcome is rare so that odds ratios approximate risk ratios then each term can be approximated by the estimated odds ratio from the logistic regression. define the causal RERI conditional on C = c by
(8) |
If the effects of G and E on Y were unconfounded conditional on (C, U) but data were only available on C we might estimate the RERI by (7) but this would be biased for the true quantity in (8) because of the unmeasured confounding due to U. The following results can help reason about the causal RERIc.
Theorem 3. Suppose that for all g and e, Yge∐(G, E}|(C, U} and for any particular reference level u′ of U define then we have that RERIc
To apply the result one would again specify the effect of U, , in each of the G × E strata along with the distribution of U, P (u|g1, e0, c), for each of the G × E strata. The corrected causal RERI could then be computed using the expression in Theorem 3. Under simplifying assumptions that U is binary with a constant effect across G×E, a more straightforward adjustment approach is possible as stated in the following Corollary which follows immediately from Theorem 3.
Corollary 3A. Suppose that for all g and e, Yge∐(G, E}|(C, U} and suppose that U is binary and is constant over g and e then we have that RERIc
A simpler result is also possible under G×E independence as stated in the next Corollary.
Corollary 3B. Suppose that for all g and e, Yge∐(G, E}|(C, U} and we have G×E independence in the sense that {E, U}∐G|C. Suppose further that U is binary and that and are constant across strata of g so that G does not interact with U on the multiplicative scale then
where
Corollary 3B assumes that the effect of U does not interact with G on the risk ratio scale. Under Corollary 3B, it is only necessary to specify γ1 and γ0 (the effect of U when E = e1 and E = e0 respectively) and the probability of U = 1 when E = e1 and E = e0 respectively, the value of P (U = 1|c) could be calculated from these two probabilities. Once these are specified one can calculate and use (9) to obtain the corrected measure of the causal relative excess risk due to interaction. Note that even under the assumptions of Corollaries 3A or 3B, the confidence interval for the corrected RERI cannot simply be obtained by applying a formula to the confidence limits of the uncorrected RERI. Confidence limits for the corrected RERI could be obtained either by using the delta method or by bootstrapping. Finally, note that if the estimated RERI in (7) were found to be non-zero then it would also follow that the quantity in (1) was non-zero. If in addition we have G × E independence we could still apply Corollary 1B to conclude that there was either a true causal G × E interaction or a G × U interaction.
As noted above, the additive scale is most useful for assessing causal notions of interaction in the sufficient cause framework. Simple relations hold between the causal relative excess risk due to interaction and the presence of synergism in the sufficient cause framework. Specifically, if an investigator is interested in detecting “sufficient cause interactions” [14,15] corresponding to individuals with response patterns such that D11 = 1 but D10 = D01 = 0 then if both exposures are never preventive for any individual, then RERIc > 0 implies the presence of this response pattern [14,15]. Without this no-preventive-action assumption, RERIc > 1 still implies the presence of this response pattern [14,15]. If we are interested in detecting an even stronger notion of interaction that D11 = 1 but D10 = D01 = D00 = 0 (i.e. individuals for whom the outcome occurs if and only if both exposures are present, “epistatic interactions”, [16]) then RERIc > 2 suffices without any assumptions about preventive action, RERIc > 1 suffices if at least one of the exposures is never preventive for any individual, and RERIc > 0 suffices if both are never preventive [16].
Applications
Using a case-only design [22,23], Bennett et al. [24] studied the interaction between passive smoking and glutathione S-transferase M1 (GSTM1) on lung cancer risk among non-smokers. Investigators genotyped 106 lung cancer cases and estimated a case-only measure of interaction. Let G denote GSTM1 (g1 present, g0 absent), E passive smoking (e1 present, e0 absent). Using a case-only estimate, the investigators found
The 95% confidence interval for the estimate was (1.1, 6.1). The case-only study design assumes that the genetic factor is independent of passive smoking. The estimate itself and the confidence interval suggest a gene-environment interaction between passive smoking and glutathione S-transferase M1 (GSTM1) on lung cancer risk. The effect of smoking may, however, be confounded by air pollution. Poorer neighborhoods in which air pollution is high say may also have a higher prevalence of smoking or more extensive advertising for cigarettes.
Suppose that the genetic factor (GSTM1) is independent of both passive smoking and air pollution (note that the case-only design itself assumes that the genetic factor is independent of passive smoking). By Corollary 2B, it would then follow from the case-only estimate that either there is a true causal gene × passive smoking interaction or there is an interaction between the genetic factor and air pollution in the sense that there are some levels of air pollution, u and u′, say such that the effect of air pollution when the genetic factor is present, , differs from the effect when the genetic factor is absent, .
As a second example, we will consider a study of Ahsan et al. [25] examining the evidence for additive interaction between the effects of arsenic exposure in well-water and BMI in producing pre-malignant skin lesions [25]. Data come from a large cohort study of 11,746 individuals in Bangladesh, many of whom had been exposed to various doses of arsenic through drinking well water. Following their analysis, let G = 1 for high vs. low arsenic (< 8 vs. > 175 μg/L) and let E = 1 for low vs. high BMI (< 18:1 vs. > 20:4) with Y = 1 denoting the presence of pre-malignant skin lesions. Ahsan et al. [25] adjust for gender, age, education, cigarette smoking, hukka smoking, sun exposure and land ownership. Ahsan at al. [25] used logistic regression to estimate the relative excess risk due to interaction in assessing potential additive interaction between BMI and arsenic exposure. Compared with the reference of G = 0, E = 0, the odds ratio for G = 1, E = 1 was 5.25 (95% CI: 3.07, 8.99), for G = 1, E = 0 was 2.96 (95% CI: 1.63, 5.37), and for for G = 0, E = 1 was 0.71 (95% CI: 0.38, 1.32). The overall prevalence of skin lesions is 6.3% which is generally considered sufficiently small so that odds ratio approximate risk ratios. The estimated RERI was thus 5.25 – 2.96 – 0.71+1 = 2.59 with a 95% confidence interval of (0.75, 4.24) suggesting evidence for positive additive interaction. Until the study was conducted, there was very little knowledge of which wells had high levels of arsenic; the correlation between arsenic exposure and other covariates is thus very weak; it is unlikely the effects of arsenic are subject to substantial confounding. The effects of BMI on skin lesions are, however, likely confounded by, say, nutritional intake. The conditional association between BMI and arsenic exposure is not statistically significant in the sample and we could therefore potentially employ Corollary 1B or Corollary 3B. By Corollary 1B, we would have that either there is an interaction between arsenic and BMI or between arsenic and the confounders of the effect of BMI e.g. nutritional intake. If we further wanted corrected estimates of the RERI between arsenic and BMI, we could use the sensitivity analysis technique in Corollary 3B. Let U denote a hypothetical binary unmeasured confounder with U = 1 indicating high vs. low nutritional intake. Suppose high nutritional intake decreased the likelihood of skin lesions by 3-fold (γ1 = γ0 = 1=3) for all strata of arsenic and BMI, with prevalence of high nutritional intake of 0.6 in those with high BMI and a prevalence of 0.2 in those with low BMI. We then have that:
Under the rare outcome assumption so that odds ratios approximate risk ratios we would have, by Corollary 3B, a corrected RERI of:
As an alternative scenario assuming weaker confounding, if high nutritional intake decreased the likelihood of skin lesions by 2-fold, with prevalence of 0.4 in those with high BMI and a prevalence of 0.2 in those with low BMI, we would have κ = 1.13 with a corrected RERIc = 2.06.
Discussion
In this paper we have provided several methods for sensitivity analysis for unmeasured confounding in studies of interaction between biological or chemical exposures or between such exposures and genetic factors. We have considered both the additive and multiplicative scales along with additive interaction obtained from a multiplicative model (the relative excess risk due to interaction). We have given results in considerable generality and have also provided much simpler and easier to use techniques that can be employed under some simplifying assumptions such as that of a single binary unmeasured confounder. The tech niques will likely be useful in a wide range of interaction studies and can be applied across numerous different study designs.
The results on additive interaction (Theorem 1 and Corollaries 1A-1C) are applicable to cohort designs. The multiplicative interaction results will also be applicable to cohort designs and will moreover be applicable case-control designs when the outcome is rare so that odds ratio approximate risk ratio. The multiplicative results will likewise be applicable to case-only designs, when the genetic and environmental factors are independent as under such an independence assumption, the case-only design allows one to estimate interaction on the multiplicative scale. The multiplicative results are also applicable to family-based study designs which estimate the interaction on the log scale or in settings in which the outcome is rare. The results on the relative excess risk due to interaction will likewise also be applicable to cohort designs, case-control and case-only designs with a rare outcome, and family-based genetic designs.
The techniques may prove to be especially useful in assessing gene-environment interaction. Many such studies have not paid much attention to potential confounding of the environmental factor. Ideally better control for such confounding will be made. However, in settings in which the requisite data is not available, the techniques here will allow investigators to assess the extent to which unmeasured environmental confounding may affect unadjusted results. One issue that has not been examined here is the extent to which un-measured confounding may affect the power of interaction analyses [26]. This is especially important since power to detect interaction is often quite low. Considerations of power in interaction analyses will be left to future work.
In many studies of gene-environment interaction, the genetic and environmental factors are assumed to be independent. We have seen above that, under this assumption, interaction findings are particularly robust to unmeasured confounding insofar as if we are concerned about unmeasured confounding of the environmental factor by another unmeasured environmental exposure and with the observed data we find interaction then either there must be a true causal interaction between the genetic and environmental factor or there is interaction between the genetic factor and the unmeasured environmental confounding variable; in either case, we have gene-environment interaction. It is hoped that these various results will facilitate inference about interaction in genetic and epidemiologic practice and assist in assessing the robustness of findings that do not account for the possibility of unmeasured confounding.
Acknowledgements
The authors thank two anonymous referees for helpful comments and for the editors of the special issue, Enrique Schistermann and Paul Albert for facilitating this research. This research was partially supported by the Long-Range Research Initiative of the American Chemistry Council and the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health and the National Institute of Environmental Health Sciences.
Appendix. Proofs
Proof of Theorem 1. If for all g and e, Yge∐(G, E}|(C, U}, then
where the first equality follows by the law of iterated expectations, the second by Yge∐(G, E}|(C, U} and the third by consistency. Thus, for any fixed reference value of u′ of U
By applying this equality for (g1, e1), (g1, e0), (g0, e1) and (g0, e0), the result for Badd follows.
Proof of Corollary 1A. If and are constant across strata of g then
Proof of Corollary 1B. If there no interaction between G and U on the additive scale then we have γ1j(u) = γ0j(u). By Theorem 1 and {E, U}∐G|C we have
This completes the proof.
Corollary 1C. Suppose for all g and e and for binary U1, U2 we have Yge∐(G, E}|(C, U1, U2} and we have G×E independence in the sense that {G, U1}∐E|C and {E, U2}∐G|C then if G does not interact with U2 on the additive scale in the sense that does not vary with g, and if E does not interact with U1 on the additive scale in the sense that does not vary with e, and if U1 does not interact with U2 on the additive scale in the sense that does not vary with u1 then Badd = 0.
Proof of Corollary 1C. If we let U = (U1, U2) and u′ = (0, 0) then by Theorem 1, we have that Badd
Now
(A1) |
Let so that
Summing (A1) over i = 0, 1 and j = 0, 1 and noting that because G and U2 do not interact on the additive scale, τ1j = τ0j and γ1j(0, 1) = 0j(0, 1) and because G and U2 do not interact on the additive scale, γi1(1, 0) = γi0(1, 0), we then have that Badd
Furthermore, because U1 and U2 do not interact on the additive scale, τ1j = γ1j(0, 1) we can group the first and seventh term, the second and eighth, the third and fifth, and the fourth and sixth to get Badd
This completes the proof.
Proof of Theorem 2. If for all g and e, Yge∐(G, E}|(C, U}, then as in the proof of Theorem 1, and thus for any fixed value of u′ of U we have that:
By applying this equality for (g1, e1), (g1, e0), (g0, e1) and (g0, e0) and taking ratios the result for Bmult follows.
Proof of Corollary 2A. If U is binary and and are constant across strata of g, then
Proof of Corollary 2B. If there no interaction between G and U on the additive scale then we have γ1j(u) = γ0j(u). By Theorem 2 and {E, U2}∐G|C we have
This completes the proof.
Corollary 2C. Suppose for all g and e and for binary U1, U2 we have Yge∐(G, E}|(C, U1, U2} and we have G×E independence in the sense that {G, U1}∐E|C and (E, U2}∐G|C then if G does not interact with U2 on the multiplicative scale in the sense that does not vary with g, and if E does not interact with U1 on the multiplicative scale in the sense that does not vary with e, and if U1 does not interact with U2 on the multiplicative scale in the sense that does not vary with u1 then Bmult = 1.
Proof of Corollary 2C. If we let U = (U1, U2) and u′ = (0, 0) then by Theorem 2, we have
Let so that γij(1, 1) = τij γij(1, 0). We then have that
Using this in Theorem 2 for Bmult for i = 0, 1 and j = 0, 1 and noting that because G and U2 do not interact on the multiplicative scale, τ1j = τ0j and γ1j(0, 1) = γ0j(0, 1) and because G and U2 do not interact on the multiplicative scale, γi1(1, 0) = γi0(1, 0), and because U1 and U2 do not interact on the multiplicative scale we have τ1j = γ1j(0, 1) and thus we then have that
This completes the proof.
Proof of Theorem 3. Using the result int the proof of Theorem 1 we have that
Thus, RERIc
Proof of Corollary 3B. If U does not interaction with G on the multiplicative scale then γ10(u) = γ00(u) = γ0(u) and γ11(u) = γ01(u) = γ1(u) and if we have G × E independence then by Theorem 3, RERIc
where
Contributor Information
Tyler J. VanderWeele, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston MA 02115, Phone: 617-432-7855; Fax: 617-432-1884; tvanderw@hsph.harvard.edu
Bhramar Mukherjee, Department of Biostatistics, University of Michigan Jinbo Chen, Department of Biostatistics, University of Pennsylvania.
References
- 1.Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder LL. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute. 1959;22:173–203. [PubMed] [Google Scholar]
- 2.Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society Series B. 1983;45:212–218. [Google Scholar]
- 3.Flanders WD, Khoury MJ. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiology. 1990;1:239–246. doi: 10.1097/00001648-199005000-00010. [DOI] [PubMed] [Google Scholar]
- 4.Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948–63. [PubMed] [Google Scholar]
- 5.Robins JM, Scharfstein D, Rotnitzky A. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran E, Berry D, editors. Statistical Models for Epidemiology, the Environment, and Clinical Trials. Springer-Verlag; New York: 2000. pp. 1–95. [Google Scholar]
- 6.Rosenbaum PR. Observational Studies. 2nd Edition Springer-Verlag; New York: 2002. [Google Scholar]
- 7.VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments and confounders. Epidemiology. 2011;22:42–52. doi: 10.1097/EDE.0b013e3181f74493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701. [Google Scholar]
- 9.Neyman J. Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. (1923). Excerpts reprinted in English (D. Dabrowska and T. Speed, Trans.) Statistical Science. 1990;5:463–472. [Google Scholar]
- 10.Blot WJ, Day NE. Synergism and interaction: are they equivalent? American Journal of Epidemiology. 1979;110:99–100. doi: 10.1093/oxfordjournals.aje.a112793. [DOI] [PubMed] [Google Scholar]
- 11.Rothman KJ, Greenland S, Walker AM. Concepts of interaction. American Journal of Epidemiology. 1980;112:467–470. doi: 10.1093/oxfordjournals.aje.a113015. [DOI] [PubMed] [Google Scholar]
- 12.Saracci R. Interaction and synergism. American Journal of Epidemiology. 1980;112:465–466. doi: 10.1093/oxfordjournals.aje.a113014. [DOI] [PubMed] [Google Scholar]
- 13.Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Lippincott Williams & Wilkins; Philadelphia: 2008. [Google Scholar]
- 14.VanderWeele TJ, Robins JM. The identification of synergism in the sufficient-component cause framework. Epidemiology. 2007;18:329–339. doi: 10.1097/01.ede.0000260218.66432.88. [DOI] [PubMed] [Google Scholar]
- 15.VanderWeele TJ. Sufficient cause interactions and statistical interactions. Epidemiology. 2009;20:6–13. doi: 10.1097/EDE.0b013e31818f69e7. [DOI] [PubMed] [Google Scholar]
- 16.VanderWeele TJ. Empirical tests for compositional epistasis. Nature Reviews Genetics. 2010;11:166. doi: 10.1038/nrg2579-c1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rothman KJ. Modern Epidemiology. 1st ed. Little, Brown and Company; Boston, MA: 1986. [Google Scholar]
- 18.Hosmer DW, Lemeshow S. Confidence interval estimation of interaction. Epidemiology. 1992;3:452–56. doi: 10.1097/00001648-199209000-00012. [DOI] [PubMed] [Google Scholar]
- 19.Assmann SF, Hosmer DW, Lemeshow S, Mundt KA. Confidence intervals for measures of interaction. Epidemiology. 1996;7:286–90. doi: 10.1097/00001648-199605000-00012. [DOI] [PubMed] [Google Scholar]
- 20.VanderWeele TJ. On the distinction between interaction and effect modification. Epidemiology. 2009;20:863–871. doi: 10.1097/EDE.0b013e3181ba333c. [DOI] [PubMed] [Google Scholar]
- 21.VanderWeele TJ, Knol MJ. The interpretation of subgroup analyses in randomized trials: heterogeneity versus secondary interventions. Annals of Internal Medicine. 2011;154:680–683. doi: 10.7326/0003-4819-154-10-201105170-00008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! American Journal of Epidemiology. 1996;144:207–213. doi: 10.1093/oxfordjournals.aje.a008915. [DOI] [PubMed] [Google Scholar]
- 23.Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Statistics in Medicine. 1994;13:153–162. doi: 10.1002/sim.4780130206. [DOI] [PubMed] [Google Scholar]
- 24.Bennett WP, Alavanja MCR, Blomeke B, Vähäkangas KH, CastrØn K, Welsh JA, Bowman ED, Khan MA, Flieder DB, Harris CC. Environmental tobacco smoke, genetic susceptibility, and risk of lung cancer in never-smoking women. Journal of the National Cancer Institute. 1999;91:2009–2014. doi: 10.1093/jnci/91.23.2009. [DOI] [PubMed] [Google Scholar]
- 25.Ahsan H, Chen Y, Parvez F, Zablotska L, Argos M, Hussain I, Momotaj H, Levy D, Cheng Z, Slavkovich V, van Geen A, Howe GR, Graziano JH. Arsenic exposure from drinking water and risk of premalignant skin lesions in Bangladesh: baseline results from the Health Effects of Arsenic Longitudinal Study. American Journal of Epidemiology. 2006;163:1138–1148. doi: 10.1093/aje/kwj154. [DOI] [PubMed] [Google Scholar]
- 26.Greenland S. Tests for interaction in epidemiologic studies: A review and a study of power. Statistics in Medicine. 1983;2:243–251. doi: 10.1002/sim.4780020219. [DOI] [PubMed] [Google Scholar]