Supplemental Digital Content is available in the text.
Abstract
Unmeasured confounding may undermine the validity of causal inference with observational studies. Sensitivity analysis provides an attractive way to partially circumvent this issue by assessing the potential influence of unmeasured confounding on causal conclusions. However, previous sensitivity analysis approaches often make strong and untestable assumptions such as having an unmeasured confounder that is binary, or having no interaction between the effects of the exposure and the confounder on the outcome, or having only one unmeasured confounder. Without imposing any assumptions on the unmeasured confounder or confounders, we derive a bounding factor and a sharp inequality such that the sensitivity analysis parameters must satisfy the inequality if an unmeasured confounder is to explain away the observed effect estimate or reduce it to a particular level. Our approach is easy to implement and involves only two sensitivity parameters. Surprisingly, our bounding factor, which makes no simplifying assumptions, is no more conservative than a number of previous sensitivity analysis techniques that do make assumptions. Our new bounding factor implies not only the traditional Cornfield conditions that both the relative risk of the exposure on the confounder and that of the confounder on the outcome must satisfy but also a high threshold that the maximum of these relative risks must satisfy. Furthermore, this new bounding factor can be viewed as a measure of the strength of confounding between the exposure and the outcome induced by a confounder.
Causal inference with observational studies is of great interest and importance in many scientific disciplines. Although unmeasured confounding between the exposure and the outcome may bias the estimation of the true causal effect, an approach often called “sensitivity analysis” or “bias analysis” over a range of sensitivity parameters sometimes allows researchers to make causal inferences even without full control of the confounders of the relationship between the exposure and outcome.
Sensitivity analysis plays a central role in assessing the influence of the unmeasured confounding on the causal conclusions. However, many sensitivity analysis techniques often require additional untestable assumptions. For instance, some authors assume a single binary confounder.1–6 Researchers also often assume a homogeneity assumption that there is no interaction between the effects of the exposure and the confounder on the outcome.5–9 Some sensitivity analysis techniques only allow one to assess how strong an unmeasured confounder would have to be to completely explain away an effect1–3,10,11 but do not allow one to assess what the effect estimate might be under weaker unmeasured confounding scenarios (i.e., do not allow one to do sensitivity analysis under alternative hypotheses). Performing sensitivity analysis under alternative hypotheses can be quite challenging due to more parameters needed in the sensitivity analysis. The Cornfield et al. early study1 on sensitivity analysis for the cigarette smoking and lung cancer association, which helped initiate the entire field of sensitivity analysis, in fact made all three simplifying assumptions: a single binary confounder, no interaction, and only sensitivity analysis for the null hypothesis of no causal effect. Although some sensitivity analysis results exist for general confounders,8,12 they are only easy to implement under some of the above simplifying assumptions.
In this article, we propose a new bounding factor and sensitivity analysis technique without any assumptions about the unmeasured confounder or confounders. None of the assumptions of the null hypothesis, a single binary confounder, or no interaction is required for using the bounding factor. Nonetheless, our new bounding factor, which makes no simplifying assumptions, is no more conservative than many previous sensitivity analysis techniques that do make assumptions and is furthermore easy to implement. Moreover, we show that the new bounding factor implies not only the classical Cornfield conditions1 that both the relative risk of the exposure on the confounder and that of the confounder on the outcome must satisfy but also a stronger condition that the maximum of these relative risks must satisfy. The new bounding factor can be viewed as a measure of the strength of confounding between the exposure and the outcome resulting from the confounder. We begin by considering outcomes that are binary and extend our results further to time-to-event and non-negative count or continuous outcomes. We consider both ratio and difference scales.
The claim that our technique is “without assumptions” requires some clarification. As we will see below, we will, without any assumptions, be able to make statements of the form: “For an observed association to be due solely to unmeasured confounding, two sensitivity analysis parameters must satisfy [a specific inequality].” We will also, without assumptions, be able to make statements of the form: “For unmeasured confounding alone to be able to reduce an observed association [to a given level], two sensitivity analysis parameters must satisfy [another specific inequality].” We believe the ability to make statements of this form without imposing any specific structure on the nature of the unmeasured confounder or confounders constitutes a major advance in the literature.
However, if statements are made of the form, “If the sensitivity analysis parameter take [specified values], then such unmeasured confounding can reduce the observed estimate by no more than [a specific level],” then the specification of the sensitivity analysis parameters could itself of course be viewed as an assumption. Moreover, when placing the results within a counterfactual or potential outcomes framework, the assumptions implicit within that framework of course would be needed also to give a potential outcomes interpretation to the sensitivity analysis. Thus, certain types of statements concerning the sensitivity of conclusions to unmeasured confounding can be made “without assumptions,” while other types of statements do require assumptions concerning the specification of the sensitivity analysis parameters themselves, or those implicit within the potential outcomes framework.
Our title perhaps merits one further qualification which is that what is called in this article “sensitivity analysis” is generally now referred to as “bias analysis” in the epidemiologic literature. Moreover, such “bias analysis” is relevant not only to problems of unmeasured confounding but also measurement error and selection bias, and our focus in this article only concerns unmeasured confounding. The term “sensitivity analysis” is, however, still employed in statistics, econometrics, and in many of the social sciences for issues of unmeasured confounding. We believe the technique presented in this article will be useful across this range of disciplines and have chosen to use the broader term, while acknowledging that terminology in epidemiology has shifted.
SENSITIVITY ANALYSIS: A NEW BOUNDING FACTOR
Let E denote the exposure, D denote a binary outcome, C denote the measured confounders, and U denote one or more unmeasured confounders. We will assume for what follows that the exposure E is binary, but all of the results on sensitivity analysis below are also applicable to a categorical or continuous exposure and could be applied comparing any two levels of E. For ease of notation, we assume that the unmeasured confounder U is categorical with levels 0, 1, ..., K – 1. But all the conclusions hold for U of general type (categorical, continuous, or mixed; single or multiple confounders). We provide proofs and theoretical technical details for general U in the eAppendix (http://links.lww.com/EDE/B16).
Let denote the observed relative risk of the exposure E on the outcome D within stratum of measured confounders C = c. Define as the relative risk of exposure on category k of the unmeasured confounder within stratum of measured confounders C = c. We use to denote the maximum of these relative risks between E and U, which we will call the maximal relative risk of E on U within stratum C = c. Define
as the maximum of the effect of U on D among the unexposed comparing any two categories of U (i.e., the ratio of the maximum and minimum of the probabilities of the outcome over strata of U without exposure and within stratum C = c); similarly, define
as the maximum of the effect of U on D among the exposed comparing any two categories of U (i.e., the ratio of the maximum and minimum of the probabilities of the outcome over strata of U with exposure and within stratum C = c). We use to denote the maximum of the relative risks between U and D with and without exposure, defined as the maximal relative risk of U on D within stratum C = c. Note that if U is a vector that contains multiple unmeasured confounders, then RREU|c and RRUD|c are defined as the maximum relative risk comparing any two categories of the vector U.
If C and U suffice to control for confounding for the effect of E on D, the standardized relative risk
is the true causal relative risk of the exposure E on the outcome D within stratum C = c. In the main text, we focus the discussion on the whole population. We further show in eAppendix 2 (http://links.lww.com/EDE/B16) that all the conclusions also hold for exposed and unexposed subpopulations. We will for the next several sections assume all analyses are carried out within strata of C, and thus the condition C = c is omitted and kept implicitly in all the conditional probabilities (e.g., is replaced by for notational simplicity). Later in the article, we will comment on how the results are applicable to estimation averaged over C, rather than conditional on C.
The relative risk pair (RREU, RRUD) measures the strength of confounding between the exposure E and the outcome D induced by the confounder U. Our main result ties the ratio of the observed relative risk adjusted only for measured confounders C and the true relative risk adjusted also for unmeasured confounders U, to the strength of confounding, (RREU, RRUD). Without any assumptions, we have the following result:
Result 1:
(1) |
Result 1 shows that even in the presence of unmeasured confounding, the true relative risk must be at least as large as In eAppendix 2 (http://links.lww.com/EDE/B16), we provide a proof for Result 1 and also show that the inequality is sharp in the sense that we can always construct a model with a confounder U to attain the equality. The quantity (RREU × RRUD)/(RREU + RRUD – 1) is a new joint bounding factor for the relative risk. Although quite simple, this bound using both RREU and RRUD has several important implications.
First, the result essentially allows for sensitivity analysis without assumptions insofar as for an unmeasured confounder to reduce an observed estimated to an actual relative risk of the sensitivity analysis parameters RREU and RRUD must be sufficiently large to satisfy the inequality
This statement holds without any assumptions about the nature of the unmeasured confounder. One could plot those values of RREU and RRUD that would be required to explain away the effect estimate (or the lower limit of a confidence interval). To conduct sensitivity analysis with prespecified strength of the unmeasured confounder, (RREU, RRUD), we can divide the observed relative risk and its confidence limits by (RREU × RRUD)/(RREU + RRUD – 1), to obtain a point estimate and confidence limits of the lower bound of the true causal effect of the exposure E on the outcome D. We will refer to the relative risk adjusted only for C, when divided by the bounding factor (RREU × RRUD)/(RREU + RRUD – 1) as the corrected relative risk. It is “corrected” in the sense that an unmeasured confounder cannot reduce the relative risk any further than what is obtained by division by its bounding factor. As an example, suppose we have an observed relative risk of 2.1 with a 95% confidence interval [1.4,3.1]. If we consider an unmeasured confounder with (RREU, RRUD) = (2,2), then the joint bounding factor is 2 × 2/(2 + 2 – 1) = 1.33, and the corrected relative risk is 2.1/1.33 = 1.58 with a 95% confidence interval [1.4/1.33, 3.1/1.33] = [1.05, 2.33]. Therefore, an unmeasured confounder with (RREU, RRUD) = (2, 2) cannot explain away the observed relative risk 2.1 or its lower confidence limit 1.4, i.e., it cannot reduce the point estimate and lower confidence limit of the relative risk to be smaller than 1. If we consider an unmeasured confounder with (RREU, RRUD) = (2.5, 3.5), then the joint bounding factor is 2.5 × 3.5/(2.5 + 3.5 – 1) = 1.75, and an estimate for the lower bound of the true causal relative risk is 2.1/1.75 = 1.20 with a 95% confidence interval [1.4/1.75, 3.1/1.75] = [0.8, 1.77]. Although the confounder with (RREU, RRUD) = (2.5, 3.5) cannot explain away the observed relative risk of 2.1, it reduces the original lower confidence limit 1.4 to 0.8 (i.e., less than 1). Note that we are not merely assessing a binary confounder, and we are not imposing the no interaction assumption. Moreover, we are not restricted to only assessing how much confounding can explain away an effect nor are we even assuming that there is a single unmeasured confounder (since U can be a vector of unmeasured confounders). The corrected estimates and confidence intervals above are applicable irrespective of the underlying confounder (or confounders). We can apply the technique to obtain a range of values for the true causal effect under different specifications of RREU and RRUD.
Table 1 shows the magnitudes of the joint bounding factor for different combinations of RREU and RRUD. The entries in the table for the joint bounding factor are the largest observed relative risks that such an unmeasured confounder could explain away. We can see from the table that the joint bounding factor is always smaller than both of RREU and RRUD, and much smaller than the maximum of them.
TABLE 1.
As a second important consequence of our main Result 1, we also show in eAppendix 2 (http://links.lww.com/EDE/B16) that once we specify one of the unmeasured confounding measures, for example RREU, then to be able to reduce an observed relative risk of to a true causal relative risk of the other confounding measure RRUD must be at least of the magnitude
For an unmeasured confounder to completely explain away the relative risk, i.e., reduce to , once we specify RREU the other unmeasured confounding measure much be at least of the magnitude
For example, if we have an observed relative risk , and we specify the exposure–confounder association RREU = 3. Then to reduce the observed relative risk to a true causal relative risk , the confounder–outcome association must be at least as large as (3 × 2.5 – 2.5)/(3 × 1.5 – 2.5) = 2.5; to completely explain away the observed relative risk (i.e., to reduce the observed relative risk to ), the confounder–outcome association must be at least as large as (3 × 2.5 – 2.5)/(3 – 2.5) = 10. The symmetry of Result 1 implies that a similar result also holds for RREU with prespecified RRUD.
Third, we show in eAppendix 2 (http://links.lww.com/EDE/B16) that if both the generalized relative risks RREU and RRUD have the same magnitude, for an unmeasured confounder to reduce an observed relative risk of to a true causal relative risk of both of the confounding relative risks must thus be at least as large as
For an unmeasured confounder to completely explain away an observed relative risk of (i.e., to reduce to a true causal relative risk of ), both RREU and RRUD must be at least as large as
If one of the confounding relative risks is smaller than the lower bound above, we then know that the other one must be larger. Thus even if RREU and RRUD are not of the same magnitude, the maximum of RREU and RRUD must satisfy the inequality above. We then have the following “high threshold” condition:
For example, to reduce an observed relative risk of to a true causal relative risk of , the high threshold is ; at least one of RREU and RRUD must be of magnitude 2.72 or above. To completely explain away an observed relative risk of , the high threshold is ; at least one of RREU and RRUD must be of magnitude 4.44 or higher to completely explain away the effect.
Fourth, the bias formula in (1) is relevant for an apparently causative exposure, which allows researchers to get lower bounds of the true causal relative risk given prespecified sensitivity parameters RREU and RRUD. If the exposure E is apparently preventive with , we can use the following formula to conduct sensitivity analysis:
(2) |
where we modify the definition of RREU as (i.e., the maximum of the inverse relative risks relating E and U), or equivalently the inverse of the minimum of the relative risks relating E and U. For an apparently preventive exposure, (2) allows researchers to obtain an upper bound of the causal relative risk by multiplying the observed relative risk by the joint bounding factor . We present the proof in eAppendix 2 (http://links.lww.com/EDE/B16), and omit analogous discussion based on (2).
Finally, all the results above are within strata of the observed covariates C as would be obtained from a log-binomial regression model or a logistic regression model with rare outcome. If averaged relative risk over the observed covariates C is of interest, the true causal relative risk must be at least as large as the minimum of over c. If we assume a common causal relative risk among the levels of C as in the usual log-linear or logistic regression with rare outcomes, then the true causal relative risk must be at least as large as the maximum of over c. See eAppendix 2 (http://links.lww.com/EDE/B16) for further discussion.
RELATION WITH CORNFIELD CONDITIONS
Under the assumptions of a binary confounder U and the conditional independence between the exposure E and the outcome D given the confounder U, Cornfield et al.1 showed that the exposure–confounder relative risk must be at least as large as the observed exposure–outcome relative risk:
(3) |
Schlesselman7 further showed that the confounder–outcome relative risk must also be at least as large as the observed exposure–outcome relative risk:
(4) |
We show in eAppendix 2 (http://links.lww.com/EDE/B16) that the classical Cornfield conditions (3) and (4) are just special cases of our result by letting one of RREU or RRUD go to infinity in (1). Moreover, our results apply to general confounders not just binary confounders, and our results also apply to other possible values of the true causal relative risk of the exposure on the outcome. We are not restricted to only assessing how strong the unmeasured confounder would have to be to completely explain away the effect. Thus, for example, for confounding to reduce the observed relative risk to a true causal relative risk of , the unmeasured confounding measures have to satisfy
(5) |
Perhaps even more importantly with regard to Cornfield-like conditions, our main Result 1 not only leads to the conditions in (5) that both RREU and RRUD must satisfy but also implies the following condition that the maximum of RREU and RRUD must satisfy:
(6) |
to reduce an observed relative risk to a true causal relative risk We show this in eAppendix 2 (http://links.lww.com/EDE/B16). As a special case, for the unmeasured confounder to completely explain away the observed relative risk (i.e., ), it is necessary that
Once again the results do not require a binary unmeasured confounder. They are applicable to any unmeasured confounder. Similar low and high threshold Cornfield conditions that the minimum and maximum of the confounding measures must satisfy to completely explain away an effect were derived on an odds ratio scale of exposure–confounder association by Flanders and Khoury12 and Lee,10 and we comment and extend these results in eAppendix 2 (http://links.lww.com/EDE/B16).
The classical Cornfield conditions and the high threshold generalization are useful to answer the question about the magnitude of the association between the exposure and the confounder and that between the confounder and the outcome, to explain away the observed exposure–outcome association or with our new results, to reduce it to a prespecified magnitude. The Cornfield conditions in (5) and (6) are especially useful, when we want to specify only one of the marginal associations RREU or RRUD as well as their relative magnitudes. However, they are inferior to the main Result 1, which is essentially the condition that the joint values of (RREU, RRUD) must satisfy. As will be seen below, although the high threshold conclusions are a useful heuristic, they are weaker than the use of our new joint bounding factor in Result 1 insofar as there are scenarios which the joint bounding factor in Result 1 can rule out an estimate as being due to unmeasured confounding but the high threshold conditions cannot. For example, when we have an observed exposure–outcome relative risk of , the low threshold (i.e., the classical Cornfield condition) is given by
the high threshold is given by
and the joint threshold condition is given by
Thus, the low Cornfield threshold is 3, and so we know that we must have that both RREU and RRUD be greater than 3 to explain away the effect. The high Cornfield threshold is 5.45, and so at least one of RREU and RRUD must be larger than 5.45 to explain away the effect. Consider an unmeasured confounder with (RREU = 5.5, RRUD = 3.1), they would exceed both the low Cornfield threshold (since RREU > 3, RRUD > 3) and the high threshold (since RREU > 5.45), and we might thus think it can explain away the observed exposure–outcome relative risk. However, using our joint threshold condition in (1), an unmeasured confounder with (RREU = 5.5, RRUD = 3.1) has a bounding factor 5.5 × 3.1/(5.5 + 3.1 – 1) = 2.24 < 3 and thus such confounding could not explain away an observed relative risk of 3. We can see this from our result in (1), but we cannot see this from the classical Cornfield conditions and even the new high threshold Cornfield condition. The Cornfield conditions, both low and high thresholds, although a useful heuristic, are not as useful for sensitivity analysis as our bounding factor. This is because there are scenarios, such as the one above, in which our bounding factor can rule out an estimate as due to unmeasured confounding, while the low and high threshold Cornfield conditions cannot.
ILLUSTRATION
Consider the historical study conducted by Hammond and Horn,13 in which the point estimate of the observed relative risk of cigarette smoking on lung cancer was with 95% confidence interval [8.02, 14.36]. Fisher14 suggested that the observed relative risk of the exposure E on the outcome D might be completely due to the existence of a common genetic confounder. The work of Cornfield et al.1 showed that for a binary unmeasured confounder to completely explain away the observed relative risk, both the exposure–confounder relative risk and the confounder–outcome relative risk would have to be at least 10.73. Let us now assume then that both the exposure–confounder relative risk and the confounder–outcome relative risk have the magnitude 10.73. The joint bounding factor is
Even if we assume such a strong confounder, the point estimate of the causal relative risk of cigarette smoking and lung cancer must still be at least as large as and the 95% confidence interval is [8.02 /5.63,14.36/5.63] = [1.42, 2.55] with the lower confidence limit still larger than 1. Thus in fact, not even exposure–confounder and confounder–outcome relative risks of 10.73 suffice to explain away the effect nor the lower confidence limit. In fact, to explain away the point estimate of the observed relative risk 10.73, the magnitude of RREU and RRUD (if RREU = RRUD) should be at least as large as . And to explain away the lower confidence limit 8.02, these two confounding relative risks should be at least as large as More generally, we can plot those values of RREU and RRUD that would be required to explain away the effect estimate or the lower limit of the confidence interval. This is given in the Figure. To explain away the point estimate, the two parameters would have to lie on or above the solid line. To explain away the lower confidence limit, the two parameters would have to lie on or above the dotted line. These results hold without any assumptions on the structure of the unmeasured confounding. The numerical results above show that, by using the new joint bounding factor, it is even more implausible than using the Cornfield conditions that a genetic confounder explains away the relative risk between cigarette smoking and lung cancer.
More generally, we could consider corrected estimates and confidence intervals for the effect over a range of different values of the sensitivity analysis parameters, RREU and RRUD, as in Table 2. The columns of Table 2 correspond to RRUD and the rows to RREU. The entries are the corrected estimates and confidence intervals for the effect under the different confounding scenarios. In general, a table like this one is most informative for sensitivity analysis. SAS code to carry out such a sensitivity analysis and to provide such a table is given in eAppendix 9 (http://links.lww.com/EDE/B16).
TABLE 2.
DISCUSSION
A crucial task in causal inference with observational studies is to assess the sensitivity of causal conclusions with respect to unmeasured confounding. In sensitivity analysis, because one is assessing the sensitivity of conclusions to the assumption of no unmeasured confounding, additional untestable assumptions may often seem undesirable and suspect to researchers. We have introduced a new joint bounding factor that allows researchers to conduct sensitivity analysis without assumptions, that is, we provide an inequality, which is applicable without any assumptions, such that the sensitivity analysis parameters must satisfy the inequality if an unmeasured confounder is to explain away the observed effect estimate or reduce it to a particular level. We can obtain a conservative estimate of the true causal effect by dividing the observed relative risk by the bounding factor; the method does not assume a single binary confounder or no exposure–confounder interaction on the outcome.
Previous sensitivity analysis approaches in the literature often relied on the assumption of a single binary confounder and no-interaction between the effects of the exposure and the confounder on the outcome.5,8,9 For example, Schlesselman7 assumed a binary confounder, a common relative risk, γ, of the confounder on the outcome for both with and without exposure (i.e., a no interaction assumption). Under these assumptions, he obtained the bias factor for sensitivity analysis requiring specifications of and Our result requires fewer assumptions and fewer sensitivity parameters (two rather than three). We further discuss in eAppendix 4 (http://links.lww.com/EDE/B16) that, under Schlesselman’s formula, if is constrained to be no larger than some limit RREU, then the maximum bias factor that can be obtained from Schlesselman’s formula is , which is the same as our bounding factor. Thus, in this setting Schlesselman’s no interaction assumption does not strengthen the bounds; the no interaction assumption is unnecessary. Without the no interaction assumption, Flanders and Khoury12 and VanderWeele and Arah,8 derived general formulas for sensitivity analysis. However, unless the confounder is binary, these formulas require specifying a very large number of parameters. They also require specifying the prevalence of each confounder level. Flanders and Khoury12 derive bounds for the true causal relative risk for the exposed population which are potentially applicable without specifying the prevalence of the unmeasured confounder. However, without specifying the prevalence, their formula only leads to a low threshold Cornfield condition, and these bounds are thus much weaker than those in this article. We discuss further the relation between their results and ours in eAppendix 4 (http://links.lww.com/EDE/B16).
The relative risk scale is widely used for sensitivity analysis in epidemiology and elsewhere, but the risk difference scale is also often of interest and importance.11,15 We show, in Appendix 1, that similar conditions for sensitivity analysis also hold for the risk difference. If we use similar sensitivity parameters on the relative risk scale for the risk difference estimate, then we can derive similar lower bounds on the effects and determine how much confounding is required to explain away an effect or reduce it to a specific level. SAS code for this approach is also given in eAppendix 10 (http://links.lww.com/EDE/B16). We can also do sensitivity analysis for the risk difference using sensitivity parameters on the risk difference scale. Unfortunately, however, these conditions for the risk difference using risk difference sensitivity parameters then depend on the number of categories of the unmeasured confounder, and become weaker for confounders with more categories. This is not the case for sensitivity analysis of the risk difference (or the relative risk) if the sensitivity parameters themselves are expressed on the relative risk scale, in which case the bounding factor is applicable and is the same regardless of the number of categories. Due to this property, it is perhaps more suitable to conduct sensitivity analysis for the risk difference using sensitivity parameters on the relative risk scale. See Appendix 2 for further discussion.
The hazard ratio is widely used for analyzing data with time-to-event outcomes. In eAppendix 7 (http://links.lww.com/EDE/B16), we show that under the assumption of having a rare outcome at the end of follow-up, the same bounding factor also applies to the hazard ratio with the confounder–outcome relative risk replaced by the confounder–outcome hazard ratio. Likewise similar results also apply to non-negative outcomes (e.g., counts or positive continuous outcomes) by replacing the confounder–outcome relative risk by the maximum ratio by which the confounder may increase the expected outcome comparing any two confounder categories.
The new joint bounding factor (RREU + RRUD – 1) plays a central role in our sensitivity analysis approach, which, in turn, gives us a new measure of the strength of unmeasured confounding induced by a confounder U Our approach has the advantage of making no assumptions about the structure of the unmeasured confounder or confounders, and of delivering conclusions much stronger than the original Cornfield conditions.
In general, a table with many different possible sensitivity analysis parameters including values that are quite extreme, such as Table 2, will be most informative. However, at the very least, in any observational study, researchers should report how much confounding would be needed to reduce the estimate, and how much confounding would be needed to reduce the confidence interval, to include the null. We believe that if this were always done in observational studies, the evidence for causality could much more easily be assessed and science would be better served.
Supplementary Material
APPENDIX 1
Conditions for the Risk Difference Using Sensitivity Parameters on the Relative Risk Scale
As in the text, we assume analysis is conducted, and all probabilities below are, conditional on, or within strata of the measured covariates C. Define the bounding factor as , the prevalence of the exposure as , and the probabilities of the outcome with and without exposure as and The causal risk differences for the exposed and unexposed populations are
and the causal risk difference for the whole population is
We show in eAppendix 5 (http://links.lww.com/EDE/B16) that the lower bounds for the causal risk differences are
Note that even without knowing f, we can use the inequality to obtain a lower bound for .
As an example, suppose the probabilities of the outcome with and without exposure are , and therefore the observed risk difference is . If we assume that the unmeasured confounding measures are with the joint bounding factor of , then the true risk difference for the exposed is at least as large as , the true risk difference for the unexposed is at least as large as , and the true risk difference for the whole population is at least as large as . If we further know that the prevalence of the exposure is , the true risk difference for the whole population is at least as large as
The above results imply that, for an unmeasured confounder to reduce the observed risk difference to be and , respectively, the Cornfield conditions for the joint bounding factor for the exposed, the unexposed, and the whole population, respectively, are
Note that if the true causal risk difference is , the above conditions all reduce to Suppose, again, the probabilities of the observed outcome with and without exposure are , and the prevalence of the exposure is . For an unmeasured confounder to reduce the observed risk difference of to a true risk difference of , the joint bounding factor resulting from the confounder must be at least as large as
Therefore, as in the text both of the confounding measures and must be at least as large as , and the maximum of them must be at least as large as
The above results are useful for apparently causative exposures with , which give (possibly positive) lower bounds for the causal risk differences. However, for apparently preventive exposure with , we need to modify the definition of as . And we have the following analogous results on the upper bounds of the causal risk differences:
The results above are conditional on measured covariates C. Due to the linearity of the risk difference, we can also obtain the lower bound of the marginal risk differences averaged over the observed covariates C using
In eAppendix 5 (http://links.lww.com/EDE/B16), we provide details and proofs for the results above, discuss statistical inference for the causal risk difference bounds under finite samples, and give formulas for how large the bounding factor would have to be to reduce an estimate or a confidence interval to 0 or to some other specified quantity. In the eAppendix (http://links.lww.com/EDE/B16), we also provide software code to implement this sensitivity analysis approach for the risk difference.
APPENDIX 2
Conditions for the Risk Difference Using Sensitivity Parameters on the Risk Difference Scale
In Appendix 1, we considered sensitivity analysis for the risk difference with sensitivity analysis parameters on the relative risk scale. In this Appendix, we consider sensitivity analysis for the risk difference with parameters defined on the risk difference scale. Unfortunately, for the reasons described below, the results for the risk difference with parameters defined on the difference scale are not as practically useful as when the parameters are defined on the relative risk scale.
Let denote the observed risk difference, and
denote the standardized risk difference.
Define as the difference in the probability that the confounder U takes a particular value k comparing exposed and unexposed. We use , the maximum of these absolute differences, to measure the exposure–confounder association on the risk difference scale, defined as the maximal risk difference of the exposure E on the confounder U. Define and as the difference in the probability of the outcome comparing the category k and 0 of the confounder U with and without exposure. Define and as the maximums of these differences with and without exposure, respectively. We use to measure the confounder–outcome association in the risk difference scale, defined as the maximal risk difference of the confounder U on the outcome D.
We first consider a binary unmeasured confounder. For binary confounder U, the maximal risk difference becomes the ordinary risk difference , and the maximal risk difference becomes the maximum of two conditional risk difference . We have that
which further leads to the following low and high thresholds:
which generalize previous results under the null of zero causal effect of E on D.11,15,16
For categorical confounder U, no simple form of the bounding factor is available, but we can still show that and must satisfy the following conditions:
When K = 3 such as a three-level genetic confounder, these conditions reduce to
The results above generalize previous results11 from the null hypothesis of no effect () to alternative hypotheses ( arbitrary). We show the proofs and extensions for the above results in eAppendix 6 (http://links.lww.com/EDE/B16).
We can see from above that the generalized Cornfield conditions for the risk difference under alternative hypotheses depend on the number of categories of U, and become less informative as the number of categories increases. Therefore, a binary confounder is not the most conservative case for sensitivity analysis with parameters expressed the risk difference scale. However, the Cornfield conditions for the relative risk do not suffer from this problem. Therefore, it seems that it is more appropriate to conduct sensitivity analysis with parameters expressed on the risk ratio scale, and a binary confounder is the most conservative case for sensitivity analysis with parameters expressed on the risk ratio scale.17,18
Footnotes
This work was partly supported by National Institutes of Health Grant R01 ES017876.
The authors report no conflicts of interest.
Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (www.epidem.com).
REFERENCES
- 1.Cornfield J, Haenszel W, Hammond EC, et al. Smoking and lung cancer: recent evidence and a discussion of some questions. J Natl Cancer Inst. 1959;22:173–203. [PubMed] [Google Scholar]
- 2.Bross IDJ. Spurious effects from an extraneous variable. J Chronic Dis. 1966;19:637–647. doi: 10.1016/0021-9681(66)90062-2. [DOI] [PubMed] [Google Scholar]
- 3.Bross IDJ. Pertinency of an extraneous variable. J Chronic Dis. 1967;20:487–495. doi: 10.1016/0021-9681(67)90080-x. [DOI] [PubMed] [Google Scholar]
- 4.Yanagawa T. Case-control studies: assessing the effect of a confounding factor. Biometrika. 1984;71:191–194. [Google Scholar]
- 5.Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Series B. 1983;45:212–218. [Google Scholar]
- 6.Imbens GW. Sensitivity to exogeneity assumptions in program evaluation. Am Econ Rev. 2003;93:126–132. [Google Scholar]
- 7.Schlesselman JJ. Assessing effects of confounding variables. Am J Epidemiol. 1978;108:3–8. [PubMed] [Google Scholar]
- 8.VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology. 2011;22:42–52. doi: 10.1097/EDE.0b013e3181f74493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54:948–963. [PubMed] [Google Scholar]
- 10.Lee WC. Bounding the bias of unmeasured factors with confounding and effect-modifying potentials. Stat Med. 2011;30:1007–1017. doi: 10.1002/sim.4151. [DOI] [PubMed] [Google Scholar]
- 11.Ding P, VanderWeele TJ. Generalized Cornfield conditions for the risk difference. Biometrika. 2014;101:971–977. [Google Scholar]
- 12.Flanders WD, Khoury MJ. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiology. 1990;1:239–246. doi: 10.1097/00001648-199005000-00010. [DOI] [PubMed] [Google Scholar]
- 13.Hammond EC, Horn D. Smoking and death rates: report on forty four months of follow-up of 187,783 men. J Am Med Assoc. 1958;166:1159–1172, 1294–1308. doi: 10.1001/jama.1958.02990110030007. [DOI] [PubMed] [Google Scholar]
- 14.Fisher RA. Dangers of cigarette smoking [letter]. Brit Med J. 1957;2:297–298. [Google Scholar]
- 15.Poole C. On the origin of risk relativism. Epidemiology. 2010;21:3–9. doi: 10.1097/EDE.0b013e3181c30eba. [DOI] [PubMed] [Google Scholar]
- 16.Gastwirth JL, Krieger AM, Rosenbaum PR. Cornfield’s inequality. In: Armitage P, Colton T, editors. In: Encyclopedia of Biostatistics. New York, NY: Wiley; 1998. pp. 952–955. [Google Scholar]
- 17.Wang L, Krieger AM. Causal conclusions are most sensitive to unobserved binary covariates. Stat Med. 2006;25:2257–2271. doi: 10.1002/sim.2344. [DOI] [PubMed] [Google Scholar]
- 18.Ichino A, Mealli F, Nannicini T. From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity? J Appl Econom. 2008;23:305–327. [Google Scholar]