Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Nov 1.
Published in final edited form as: Epidemiology. 2022 Jul 27;33(6):832–839. doi: 10.1097/EDE.0000000000001528

Negative Control Exposures: Causal effect Identifiability and Use in Probabilistic-Bias and Bayesian Analyses with Unmeasured Confounders

WD Flanders a, LA Waller b, Q Zhang a, D Getahun c, M Silverberg d, M Goodman a
PMCID: PMC9562027  NIHMSID: NIHMS1825160  PMID: 35895515

Abstract

Background:

Probabilistic bias and Bayesian analyses are important tools for bias correction, particularly when required parameters are nonidentifiable. Negative controls are another tool; they can be used to detect and correct for confounding. Our goals are to present conditions that assure identifiability of certain causal effects and to describe and illustrate a probabilistic bias analysis and related Bayesian analysis that use a negative control exposure.

Methods:

Using potential-outcome models, we characterized assumptions needed for identification of causal effects using a dichotomous, negative control exposure when residual confounding exists. We defined bias parameters, characterized their relationships with the negative control and with specified causal effects, and described the corresponding probabilistic-bias and Bayesian analyses. We present analytic examples using data on hormone therapy and suicide attempts among transgender people. To address possible confounding by healthcare utilization, we used prior TdaP (tetanus–diphtheria–pertussis) vaccination as a negative control exposure.

Results:

Hormone therapy was weakly associated with risk (risk ratio (RR) = 0.9). The negative control exposure was associated with risk (RR = 1.7), suggesting confounding. Based on an assumed prior distribution for the bias parameter, the 95% simulation interval for the distribution of confounding-adjusted RR was (0.17, 1.6), with median 0.5; the 95% credibility interval was similar.

Conclusion:

We used dichotomous negative control exposure to identify causal effects when a confounder was unmeasured under strong assumptions. It may be possible to relax assumptions and the negative control exposure could prove helpful for probabilistic bias analyses and Bayesian analyses.

Keywords: bias, confounding, negative controls, negative control exposure, probabilistic bias analysis, Bayesian Analysis, adjustment

Background:

Residual confounding often threatens valid estimation of causal effects, especially absent randomization of exposure. In a potential outcome framework, confounding implies non-exchangeability, defined below as an association of the exposure with the potential outcomes.

Numerous approaches can adjust or account for measured confounders, including restriction, control, or adjustment by stratification or modeling in the analysis, difference in difference and regression discontinuity analyses, and use of instrumental variables1.

To detect residual confounding, perhaps due to an unmeasured confounder, one can use a negative control outcome or negative control exposure2. A negative control exposure, our focus, is a variable that does not cause the outcome but that is associated with the unmeasured confounder (detailed in Methods). In an early application, Yerushalmy studied the effects of maternal cigarette smoking on birth weight (reprinted3). To detect confounding, he assessed the association of paternal smoking with his offspring’s birthweight; this alternative “exposure” was at that time thought not to affect the outcome of interest – a negative control exposure; he observed an association, which he interpreted as suggesting confounding. Later work using cotinine levels suggested this use of paternal smoking as a negative control was valid2,4. As another example, Flanders et al. studied the effects of air pollution on emergency department visits for respiratory diseases. They used pollutant levels the day after the outcome had occurred, which could not cause the outcome, as a negative control exposure to detect residual confounding or other bias5,6. A study of influenza vaccination and deaths from influenza7 exemplifies negative control outcomes. A strong association of vaccination status with influenza deaths before the influenza season (an alternative “outcome” thought to be unaffected by the exposure of interest, in other words a negative control outcome) suggested bias in the estimated effect of vaccination. Lipsitch et al. discussed and formalized these concepts for detection of residual confounding8.

Much subsequent work goes beyond bias detection to use negative controls to adjust for residual confounding2,9. Flanders et al. used a negative control exposure to partially correct for residual confounding10. Their approach, however, involved certain distributional assumptions. Other approaches have involved outcome calibration under a rank preservation assumption11, and use of a linear model for the unmeasured confounder with factor analysis12,13. Miao et al. recently provided conditions that, if met, allow identification of causal effects by using two negative controls, which can act as surrogates for the unmeasured confounder(s)1416.

Assumptions needed for identifiability can be rather strong if a confounder remains unmeasured. For example, in the categorical case, the approach of Miao et al.14 requires two negative controls that serve as proxies for the unmeasured confounder (say U). They must have several properties including: each proxy has at least as many categories as U, the proxies are independent conditional on U and certain probability matrices have inverses.

Probabilistic bias analyses can address residual biases in the effect estimate that remain after conventional analyses17. For residual confounding, probabilistic bias analyses use substantive knowledge to help formulate a distribution of bias parameters that characterize unobserved associations (specified in methods); apply that distribution to correct conventional effect estimates, such as risk ratios; and produce a distribution of plausible corrected estimates. Our goals are to describe and illustrate a method that uses a negative control exposure to partially correct for confounding and to formulate probabilistic bias analyses. We extend the approach to a fully Bayesian analysis.

Methods:

Background, Notation, and Definitions

Our specific objectives are to: present and justify conditions sufficient for using a negative control exposure (N) to identify causal effects of an exposure (E) on an outcome (Y) when a confounder (U) is unmeasured; describe probabilistic bias analyses to address confounding that incorporate information from the negative control, c); describe a related Bayesian formulation (Appendix); and provide R code to implement these analyses. Here, we measure effects with risk ratios observable in a cohort study. These approaches rely on substantive knowledge to inform the choice of the prior distribution of plausible bias parameters.

We assume measured confounders, denoted collectively by X, are categorical, or can be adequately approximated as categorical to control confounding (this imposes little restriction other than regularity conditions). All results are conditional on X, but for simplicity that dependence is suppressed in the notation. For example, Menx denotes the number of people at baseline in the cohort with E = e, N = n, X = x for e, n = 0,1 and x is the value of X; for simplicity we write Men (conditioning on X = x is implicit). Conditional risk, defined as the probability that the outcome occurs (Y = 1) during the follow-up period among those with E = e, N = n at baseline in the cohort, is denoted by Ren = p(Y = 1|E = e, N = n) = E(Y|E = e, N = n). We denote the counterfactual outcome and counterfactual risk among those with E = e, N = n, if E were set to e′ by Y(e′) and Ren(e′) = E[Y(e′)|E = 1, N = n] for e, n, e′ = 0 or 1 (Table), respectively; Ren(1) or Ren(0) must be counterfactual. Similarly, R(e) is the counterfactual risk in the population if E were set to e for all.

The causal relationships are summarized in Figure 1, a Single World Intervention Template (SWIT)18. Those relationships, assumed correct, are consistent with assumptions A1A4:

  • A1)

    NE, Y(e)|U, X; Y(n, e) = Y(e), conditional independence between N and E, Y; N has no effect;

  • A2)

    EY(e)|U, X; conditional exchangeability;

  • A3)

    Conditional on E, X, we expect the negative control to be associated with Y and unmeasured confounders U;

  • A4)

    Ren(e′) = Ren and Ren,u(e′) = Ren,u if e′ = e; counterfactual-model consistency.

Figure 1.

Figure 1.

Single World Intervention Template18 showing assumed causal relationships. E represents exposure, X measured confounder(s), U unmeasured confounder(s), N a negative control and Y(e) the potential outcome if E were set to e.

Ideal or U-comparable negative control exposures described by Lipsitch et al.8 should satisfy assumptions A1A4 (eAppendix S4). However, variables that are not ideal or U-comparable negative controls can satisfy A1A4, serve as indicators of residual confounding or other bias10 and be used for the probabilistic bias or Bayesian analyses described here (e.g., Supplemental Figure S3).

We consider the causal effects of exposure among those with E = e, N = n, denoted by CEen for e, n = 0,1 (Table 1). Using risk ratios and counterfactuals, we express CEen as:

CEen=Ren(1)/Ren(0) for e,n=0,1; 1)

and, the population average causal effect as:

PACE=R(1)/R(0). 2)

Table 1.

Summary of Parameter Definitions and relationship to Causal Effects

Parameter Symbol Definition
Men Number at risk at baseline in cohort, with E = e, N = n for e, n = 0,1
Ren Risk during follow-up among those with E = e, N = n for e, n = 0,1
Renu Risk during follow-up among those with E = e, N = n, U = u for e, n = 0,1
Re. Risk during follow-up among those with E = e for e = 0,1
RRe. Risk ratio comparing risk among those with E = e, N = 1 to that among those with E = e, N = 0 for e = 0,1; e.g., RR1. = R11/R10.
Y(e) Counterfactual value of Y if E were set to e
Ren(e′) Counterfactual risk among those with E = e, N = n, if E were set to e′ for e, n, e′ = 0,1.
Renu(e′) Counterfactual risk among those with E = e, N = n, U = u, if E were set to e′ for e, n, e′ = 0,1.
CEen Causal effect of E among those with E = e, N = n for e, n = 0,1;CEen = Ren(1)/Ren(0)
ε1 (ϰ1) Bias Parameter for Probabilistic bias of CE10; Equation 5, Assumption A5c
ε2 (ϰ2)ǂ Bias Parameter for Probabilistic bias of CE01; Equation 6, Assumption A5d
ε3, ε4ǂ Bias Parameter for Probabilistic bias Analyses of CE11, CE00; Assumption A7

bias parameter for probabilistic bias analysis; corresponding parameter for Bayesian analysis in Parentheses

ǂ

These parameters discussed and used for probabilistic bias analyses for CE11 and CE00.

In the remainder of methods, we first consider CE10 in detail, providing and justifying assumptions under which CE10 can be identified. We then introduce a bias parameter to relax the assumption needed for identification and use this parameter as the basis for probabilistic bias analyses. Finally, we consider other causal effects. The Appendix provides a fully Bayesian formulation of our approach.

Identifiability Conditions and Probabilistic Bias Analysis for CE10

We show that CE10 is identifiable in a cohort study under an easily-specified, but strong assumption involving the distribution of the negative control N.

Consistent with the pattern of causal effects summarized in Figure 1 and assumptions (A1)–(A4), we can write the identifiable risk R00 as:

R00=uR00uP(E=0|U=u)P(N=0|U=u)p(U=u)up(E=0|U=u)p(N=0|U=u)p(U=u) 3)

where Renu is the conditional risk among those with E = e, N = n, U = u. Let Renu(e′) denote the counterfactual risk among those with E = e, N = n, U = u if E were set to e′. We can write the counterfactual risk R10(0) as the weighted average:

R10(0)=uR10u(0)P(E=1|U=u)P(N=0|U=u)p(U=u)up(E=1|U=u)p(N=0|U=u)p(U=u)=uR00u(0)P(E=1|U=u)P(N=0|U=u)p(U=u)up(E=1|U=u)p(N=0|U=u)p(U=u)    (substitute R00u(0) for R10u(0), assumption A2)=uR01u(0)P(E=1|U=u)P(N=0|U=u)p(U=u)up(E=1|U=u)p(N=0|U=u)p(U=u)    (substitute R01u(0) for R00u(0) by A1)=uR01uP(E=1|U=u)P(N=0|U=u)p(U=u)up(E=1|U=u)p(N=0|U=u)p(U=u)    (substitute R01u for R01u(0) by consistency) 4)

We now state two assumptions either of which, with Assumptions (A1A4), suffices (Claim 1) to assure identifiability of CE10:

p(E=1|U=u)=p(N=1|U=u) for all u;   (equality of conditional distributions) or A5a)
p(U=u|E=1,N=0)=p(U=u|E=0,N=1), for all u. A5b)

Claim 1: Under assumptions (A1A4) and (A5a) or (A5b): (i) R10(0) = R01; and (ii) CE10 is identified by the ratio of observable risks R10/R01.

Proof: By assumption (A5a), we can substitute p(N = 1|U = u) for p(E = 1|U = u), and p(E = 0|U = u) for p(N = 0|U = u) into the last line of Expression (4), showing that R10(0)=uR01uP(E=0|U=u)P(N=1|U=u)p(U=u)up(E=0|U=u)p(N=1|U=u)p(U=u) which equals R01 proving (i). Proof of Claim (i) using (A5b) is similar (eAppendix S5). Now, CE10=R10R10(0) which, by (i), equals R10R01. The latter is consistently estimated by the ratio of observable risks in the appropriate subgroups of the cohort study, proving (ii).

Following Claim 1, we take the identifiable risk ratio R10/R01 as the estimator of CE10.

Note: The intuition behind this estimator is that the distortion caused by the association of the unmeasured variable U with exposure – is compensated for and balanced by the association of U with the negative control; under assumption (A5b), the distribution of U is the same in the groups being compared.

Assumptions (A5aA5b) differ from the equi-distributional confounding assumption of Sofer et al.19 which concerns equality of the conditional distributions of the outcome and of a negative control outcome; see also the “confounding bridge” assumption of Miao et al.20 that involves conditional distributions of the negative control outcome and the outcome, rather than the negative control exposure and the exposure.

There is some plausibility that assumption (A5a) or (A5b) would hold, at least approximately, since negative controls “… should be selected such that they share a common confounding mechanism as the exposure and outcome variables …”2. Nevertheless, the assumption (A5a) or (A5b) is strong and, with U unmeasured, unverifiable. Therefore, we introduce a bias parameter that allows for deviations from (A5aA5b) and that can be used in probabilistic bias analyses. In particular, we relax the key implication of assumption (A5a, A5b) that R10(0) = R01, and instead assume:

R10(0)1R10(0)=ε1R011R01. A5c)

A5c assumes that the counterfactual odds R10(0) is equal to the (observable) odds R01/(1 – R01) times a bias parameter ε1. Using risk odds in assumption (A5c), rather than say risks, assures that 0 < R10(0) < 1 for ε1ϵ(0,∞). Substituting ε1R01/(1 − R01 + ε1R01) for R10(0), justified by Assumption (A5c), gives:

CE10=R10R10(0)=1ε1R10(1R01+ε1R01)R01 5)

Note: The bias parameter ε1 reflects residual bias in R10 as an estimator of the counterfactual risk R10(0) on the odds scale. Equation (5) shows that ε1 equals the ratio of the estimator to the causal effect, sometimes referred to as the confounding risk ratio1,2123, multiplied by (1 – R01(1 − ε1)). For rare outcomes, (1 – R01(1 − ε1)) ≈ 1, and ε1 approximates the confounding risk ratio. This conceptualization may aid interpretation (see also, eAppendix S1).

To implement a probabilistic bias analysis for residual confounding, we specify a distribution for ε1 (with support from 0 to infinity); the distribution has greater or less weight in the tails, depending on the extent to which R10(0) and R01 are thought to differ (reflecting differences in the conditional distributions of E and N). If the negative-control association with U is thought to mirror the corresponding exposure association fairly accurately, the ε1-distribution can be formulated with a substantial probability that ε1 is near 1, whereas if the associations differ substantially, greater weight can be assigned elsewhere. An R program to implement probabilistic bias analysis is given in eAppendix S2. Using a Monte Carlo approach, the program: randomly selects a value of the bias parameter from the specified distribution; applies the bias model to calculate the counterfactual risk (R10(0), Assumption A5c); accounts for random error by sampling R10(0) and R10 from binomial distributions; applies Equation (5) to calculate a bias-adjusted estimate of CE10; and creates a simulation interval.17

Identifiability Conditions and Probabilistic Bias Analysis for CE10

Assumptions (A1A4; A5a or A5b) imply that CE01 = CE10 (proven as Claim 2, eAppendix S5), which implies that CE01, like CE10, is identifiable as the ratio of observable risks R10/R01. To relax assumption (A5a or A5b) and conduct probabilistic bias analyses, we introduce a second bias parameter (ε2), that plays a role like that of ε1: ε2 reflects differences between the estimator R10R01 based on observed risks and the estimand CE01. We have assumption (A5d):

R01(1)1R01(1)=ε2R101R10 A5d)

implying:

CE01=R01(1)R01=R10R01(ε2ε2R10+R10) 6)

By specifying a distribution for ε2 we can conduct probabilistic bias analyses for CE01, like those for CE10.

Identifiability Conditions and Probabilistic Bias Analysis for CE11, CE00 and PACE

Causal effects CE11, CE00 and PACE can differ from CE10 and CE01 and so are not necessarily identified as R10/R01 under Assumptions A1A4, A5b. However, we can identify these effects if we can assume a multiplicative model for the effect of E given U, conditional on U = u and N = n:

Renu(e')=eβ1e'R0nu    (multiplicative homogeneity of effects E) A6)

In words, assumption (A6) states that the counterfactual risk, if E were set to e′, is eβ1e' times the risk among those with E = 0, N = n and U = u. We show in eAppendix S5 (Claim 3) that Assumption (A6) implies that CEen and PACE both equal Ren(1)Ren(0)=eβ1. By Assumptions (A1A5), CE10 (and CE01) are identified, and therefore so are CE11, CE00 and PACE. Using bias parameters to account for errors in assumptions (A1A6) and combining results, we have

CE11=1ε3CE10 and CE00=1ε4CE10. A7)

Analyses based on use of ε3 and ε4 use the strong assumption of a multiplicative effect of exposure in addition to (A1A5) and are viewed as supplementary.

Fully Bayesian Analysis for CE10 and CE01

The Appendix outlines use of prior distributions for R10 and R01 and two parameters ϰ1 and ϰ2 to provide a fully Bayesian formulation of the problem24. Parameters ϰ1 and ϰ2 (defined in the Appendix: Equation 1A and just below) reflect the same associations in the Bayesian formulation as do ε1 and ε2 in probabilistic bias analyses. eAppendix S3 documents an R program to implement Bayesian analyses for CE10 and CE01.

Example

We illustrate these methods by applying them to investigate the possible effect of gender-affirming hormone therapy on risk of suicide attempts. We use data from ongoing studies of transgender people25,26 (approved by the Institutional Review Board of Emory University and all participating sites); data are summarized in Table 2 and eAppendix Tables S1S2. The cohort consists of people from two health plans in California, who were 20 years old or younger on 31 December 2015, and received a transgender-specific diagnosis (e.g., ‘gender dysphoria’) by age 20. We defined exposure as receiving gender-affirming hormones or puberty suppression therapy at or before age 20. The outcome of interest was at least one episode of self-inflicted injury or poisoning, or any hospitalization or emergency room visit for a mental health problem documented in the medical records during the 1-year follow-up period starting at age 20. eAppendix S6 includes additional descriptive information. We were concerned about potential confounding by healthcare utilization, as greater utilization might associate with both more hormone therapy and more (documentation of) mental health diagnoses. Therefore, we used recorded receipt of TdaP vaccine at or before age 20 years as a negative control exposure. If healthcare utilization was a confounder, we thought that TdaP vaccination should, like hormone therapy, be associated with both healthcare utilization and the outcome. The crude risk ratio (cRR) for exposure is 0.88. After adjusting for the negative control, the Mantel–Haenszel mRR is 0.88 (95% CI: 0.56 – 1.4; Table 3). The negative control was associated with risk, both among the exposed (RR1 = 1.7; 95% CI: 0.76 – 3.8) and the unexposed (RR2 = 1.7; 95% CI: 1.1 – 2.6).

Table 2.

Distribution of Self-harm Episodes (Y), Hormone Use (E), Prior Vaccination (N)

Variable - Value of the Variable -
Number self-harm episodes (Y) 1+ 0 1+ 0 1+ 0 1+ 0
Hormone use (E; yes/no) yes yes no no yes yes no no
Vaccinated (N; yes/no) yes yes yes yes no no no no
Number with this combination 10 58 30 152 11 116 42 380

Table 3.

Summary of Estimated Values of Selected, Identifiable Parameters in the Example

Parameter Estimated Value Description
cRR 0.88 crude Risk Ratio – association of risk with E
 mRR 0.88 Mantel-Haenszel Risk Ratio, adjusted for the negative control
RR1 1.7 Risk ratio for association of risk with N among those with E = 1
RR0 1.7 Risk ratio for association of risk with N among those with E = 0
R10 0.087 Risk among those with E = 1 and N = 0
R01 0.17 Risk among those with E = 2 and N = 1

To implement the probabilistic bias analyses, we specify prior distributions for ε1. As provided in the R program (eAppendix S2), we chose a log-normal distribution for ε1, with median 1 and the ratio of the 10th percentile to the median of 0.5. For this specification, the 10th and 90th percentiles of the prior for ε1 are 0.50 and 2.0, indicating that ε1 will fall in this range with 80% probability under the prior. The resulting distribution of confounding-adjusted causal effect estimates (simulation intervals)17 is given in Figure 2, for the assumed prior. The 95% simulation interval is from 0.17 to 1.6, with median 0.52. These results are approximately interpretable as semi-Bayesian, as the bias parameter is sampled from a prior distribution17. Using the fully Bayesian analysis described in the Appendix with uninformative priors for P(Y = 1|E = 1, N = 2) and P(Y = 1|E = 2, N = 1), a log-normal prior for ϰ1 and setting the median of the prior (log-normal) to 1 and the variance so that the ratio of the median to the 10th percentile was 0.5, the 95% credibility interval was (0.19 to 1.7) with median 0.54. In a sensitivity analysis, we doubled the variance of the prior for ϰ1- reflecting greater uncertainty in the value of the bias parameter. The 95% credibility interval was then (0.16, 2.1).

Figure 2.

Figure 2.

Histogram showing the distribution of confounding-adjusted causal effect estimates from probabilistic bias analysis.

Discussion

Using an equidistributional assumption (A5a or A5b) and others (A1A4), we have shown how a negative control exposure can be used to estimate certain causal effects (CE11 and CE00) when an unmeasured confounder distorts results. If exposure effects are multiplicative, these assumptions suffice to also identify CE11 and CE00 as well as other effects, such as the population average effect. The key assumption (A5a or A5b) is strong, so we have also described and illustrated two ways to relax the assumption. These inter-related methods--probabilistic bias analyses and Bayesian analyses--both use information from a negative control exposure to account for residual confounding and require the researcher to specify a prior distribution for a bias parameter. They yielded similar results, as expected, in our example. Our formulation of probabilistic bias analyses, as is common17, includes a bias model, postulating a prior distribution for the bias parameters, and Monte Carlo simulation to obtain a distribution for the bias-adjusted estimate of interest. Our method extends the approach to incorporate information from the negative control. We also describe a complementary Bayesian formulation.

Exchangeability implies that risk among those actually exposed is the same as the risk among the unexposed if they had been exposed, and conversely; it can be defined as independence of the actual exposure and response types2729. The methods proposed for addressing non-exchangeability are natural ones in the sense that they use the negative control as a reflection of exposure associations with unmeasured confounders that define non-exchangeability. If the negative control has more than two categories, say N = n for n = 0,1, …, N, then the approach described here is still applicable by selecting two categories (or combinations of categories) and contrasting them. For example, if a priori knowledge suggested that P(U = u|E = 1, N = 0) ≈ P(U = u|E = 0, NS1) where S1 = {3,4}, then R10(0) could be estimated by R0S1. To the extent allowed by a priori knowledge, original categories can be combined to most closely approximate the identifying assumption.

Two causal effects are most readily addressed by the proposed approach, the effect of exposure among the exposed without negative control exposure (E = 1, N = 0), and the effect of exposure among the unexposed with negative control exposure (E = 0, N = 1). The researcher can use substantive knowledge to assess how well the association of the negative control with the unmeasured confounder reflects the association of exposure with the confounder. In the example when using Bayesian analyses, doubling the variance of the prior distribution for ϰ1 led to wider credibility intervals, but not substantially so (Example 1) – suggesting some degree of robustness to a modest change in uncertainty regarding the prior distribution of ϰ1. While it is possible to extend the approach to apply it to effects in other subgroups and to a population average causal effect, assignment of the prior distribution is perhaps more uncertain because an additional assumption (multiplicative effect of exposure) is used. Therefore, we view these additional analyses as secondary. We caution that an association between a negative control and the outcome can reflect bias other than non-exchangeability, such as misclassification or model mis-specification5. Our analyses are not designed to correct for these other biases.

The probabilistic bias analysis and the Bayesian analysis both used a negative control to correct for residual confounding and were based on researcher-supplied inputs that are informed, to the extent possible, on subject-matter knowledge. The probabilistic bias analyses depend on the prior distributions for ε1 and ε2 and the Bayesian results on those for ϰ1 and ϰ2; here we used log-normal distributions for both. With a non-informative prior for the other parameters, the 95% credibility interval (Bayesian analysis) was similar to the simulation interval (probabilistic bias analysis). Absent assumptions (such as A5b), parameters ϰ1 and ϰ2 or bias parameters ε1 and ε2 are not identifiable; however, “indirect learning” is possible30, evidenced here in the change from prior to posterior distribution of ϰ1 (eAppendix S1: Figures S1 and S2). This indirect learning and changes in the distributions result from learning about the identifiable parameters30.

In some situations, when the confounder is known but unmeasured, external results may provide direct estimates of ϰ1 or ε1. For example using equation 5, ε1 could be estimated as ε1=R10R10R01aCE10R01R10R01 where aCE10 is an external estimate of the causal effect (e.g., adjusted for all confounders in a study where U was measured). However, if external measurements of the exposure, confounders, and outcome are available, we can also consider other, possibly more efficient, approaches17 or perhaps use of a directly calculated confounding risk ratio. We could also use priors for both ϰ1 and a causal effect (e.g., CE01), plus another parameter (e.g., R10). Evaluation of the posterior distribution, however, would likely then require Gibbs sampling or other technique more complicated than the straightforward one used here.

In summary, we have provided assumptions sufficient for using a negative control to identify causal effects when a confounder is unmeasured. We have also described and illustrated the application of both probabilistic bias analysis and Bayesian formulations to address residual confounding. These methods use a negative control exposure and researcher-supplied prior information about how well the negative control captures the associations that create confounding, to produce results partially adjusted for residual confounding.

Supplementary Material

Supplemental Digital Content

Funding information:

This work was supported by Contract AD-12-11-4532 from the Patient Centered Outcome Research Institute and Grant R21HD076387 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Abbreviations:

RR

risk ratio

CE

causal effect

Appendix

The Appendix builds on arguments in the main text to describe a Bayesian formulation of our approach to estimating causal effects. The goal is to calculate Bayesian credibility intervals and the posterior median for CE10, conditional on the observations and using the negative control exposure; arguments for CE01 are directly analogous. Results for CE11 and CE00 depend on an additional, strong assumption and are considered supplementary. We use terminology from Lash1,17.

Parameters

The bias parameter ε1 of the main text was introduced to relax assumption (A5a); it allows for differences between the distribution of E given U and that of N given U. Paralleling the formulation of probabilistic bias analysis (main text; Assumption A5c), we define the (relative) bias parameter ϰ1 as:

ϰ1=R10(0)1R10(0)/R011R01. 1A)

Parameters R10, R01 (Table 1) and ϰ1 fully parameterize the conditional distributions of: outcome Y among those with E = 1, N = 0, and among those with E = 0, N = 1. They also parameterize the distribution of Y(0) among those with E = 1, N = 0:

P(Y(0)=1|E=1,N=0)=R10(0)=ϰ1R01/(1R01+ϰ1R01),  (expression 1A).

CE10 can be expressed using only R10, R20 and ϰ1.

Sampling Distribution

The conditional likelihood of the observed data, given the parameters R10 = p1 and R01 = p2 is:

P(Y=y1|E=1, N=0,R10=p1)P(Y=y0|E=0, N=1,R01=p0)=M10!(M10y1)!y1!p1y1(1p1)(M10y1)M01!(M01y0)!y0!p0y0(1p0)(M01y0) 2A)

Here y1 is the number of subjects with y1 = 1, E = 1 and N = 0; M10 is the number with E = 1 and N = 0; y0 and M01 are the corresponding numbers where E = 0 and N = 1; the “data” are y1, M10, y0, N01, E and N.

Prior Distributions for parameters R10, R01, and ϰ1

We use a log-normal prior for ϰ1:

fϰ1(ln(e1))=1e1σ12πe12(ln(e1)μ1)2/σ12 for 0ϰ1< and 0 elsewhere, 3A)

where μ1 and σ12 are the mean and variance of ln(e1). The median of e1 is eμ1 and the variance is (eσ121)e2μ1+σ12. To double the variance of e1 (original, non-log scale), we solve: (eσ221)e2μ1+σ22=2(eσ121)e2μ1+σ12 for σ2. For σ1 = 0.54 (the initial value of σ1 used in Example, main text), we use σ2 = 0.67 to double the variance of e1.

We use beta priors for R10:

fR10(p1)=Γ(α1+β1)Γ(α1)Γ(β1)p1α11(1pi)β11,    0<p1<1 4A)

and for R01:

fR01(p2)=Γ(α0+β0)Γ(α0)Γ(β0)p0α01(1p0)β01,    0<p0<1 5A)

fR10(p1) and fR01(p0) are 0 for p1 or p0 ∉ [0,1] < 0 or, for p1 or p0 > 1.

The priors fR01(p0) and fR10(p1) are non-informative uniform priors if we use αj = βj = 1.

Posterior Distribution

For the analysis of CE10, the posterior distribution is:

fR10,R01,ϰ1|Y1,Y0(p1,p0,e1|y1,y0)=1Gp1y1(1p1)(M10y1)Γ(α1+β1)Γ(α1)Γ(β1)p1α11(1p1)(β11)p0y0(1p0)M01y0Γ(α0+β0)Γ(α0)Γ(β0)×p0α01(1p0)β011e1e12(ln(e1)μ1)2/σ12=1Gp1y1+α11(1p1)M10y1+β11p0y0+α01(1p0)M01y0+β011e1e12(ln(e1)μ1)2/σ12 for 0<p1<1,0<p0<1, and 0<e1<, and=0 elsewhere, 6A)

where 1G=Γ(M10+α1+β1)Γ(y1+α1)Γ(M10y1+β1)Γ(M01+α0+β0)Γ(y0+α0)Γ(M01y0+β0)1σ12π.

This posterior distribution is the product of two independent beta distributions and a log-normal distribution. We can sample from this distribution simply by independently sampling: p1 (for R12) from a beta(y1 + α1, M10y1 + β1), p0 (for R01) from beta(y0 + α0, M01y0 + β0), and e1 (for ϰ1) from a log-normal(μ1, σ12) distribution.

Evaluation

To evaluate the posterior distribution, the supplemental R code (eAppendix S3) performs the described sampling in 100,000 independent replications. For each sample, it calculates 1ε1R10(1R01+ε1R01)R01 (or R10R01(ε2ε2R10+R10)), the parameter of interest CE10 (or CE01). It then calculates desired statistics from the empiric distribution of sampled values (e.g., 2.5 and 97.5 percentiles for the 95% credibility interval for CE10, and the 50th percentile for the median). The default parameters for the beta distribution are αi = βi = 1, which yields uniform priors. The user must input the parameters of the log-normal prior distribution of ϰi (details and further explication in eAppendices S1S2).

Footnotes

Conflict of Interest Statement: Drs. Flanders and Goodman and Ms. Zhang provide consulting services through Epidemiologic Research & Methods, LLC. This company is owned by Dr. Flanders and provides consulting services to clients. None of the work on the present paper was related to that consulting.

References

  • 1.Lash TLV, Tyler J, Haneuse S, Rothman KJ. Modern epidemiology. 4th ed. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2021. [Google Scholar]
  • 2.Shi X, Miao W, Tchetgen Tchetgen EJ. A selective review of negative control methods in epidemiology. Current Epidemiology Reports. 2020:1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Yerushalmy J The relationship of parents’ cigarette smoking to outcome of pregnancy—implications as to the problem of inferring causation from observed associations. International journal of epidemiology. 2014;43(5):1355–1366. [DOI] [PubMed] [Google Scholar]
  • 4.Taylor AE, Smith GD, Bares CB, Edwards AC, Munafò MR. Partner smoking and maternal cotinine during pregnancy: implications for negative control methods. Drug and alcohol dependence. 2014;139:159–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Flanders WD, Klein M, Darrow LA, et al. A Method to Detect Residual Confounding in Spatial and Other Observational Studies. Epidemiology (Cambridge, Mass). 2011;22(6):823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Flanders WD, Klein M, Strickland M, et al. A method of identifying residual confounding and other violations of model assumptions. Epidemiology. 2009;20(6):S44–S45. [Google Scholar]
  • 7.Jackson LA, Jackson ML, Nelson JC, Neuzil KM, Weiss NS. Evidence of bias in estimates of influenza vaccine effectiveness in seniors. International journal of epidemiology. 2006;35(2):337–344. [DOI] [PubMed] [Google Scholar]
  • 8.Lipsitch M, Tchetgen Tchetgen EJ, Cohen T. Negative Controls: A Tool for Detecting Confounding and Bias in Observational Studies. Epidemiology. 2010;21(3):383–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Swanson SA, Hernán MA, Miller M, Robins JM, Richardson TS. Partial identification of the average treatment effect using instrumental variables: review of methods for binary instruments, treatments, and outcomes. Journal of the American Statistical Association. 2018;113(522):933–947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Flanders WD, Strickland MJ, Klein M. A new method for partial correction of residual confounding in time-series and other observational studies. American journal of epidemiology. 2017;185(10):941–949. [DOI] [PubMed] [Google Scholar]
  • 11.Tchetgen Tchetgen EJ. The control outcome calibration approach for causal inference with unobserved confounding. American journal of epidemiology. 2014;179(5):633–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang J, Zhao Q, Hastie T, Owen AB. Confounder adjustment in multiple hypothesis testing. Annals of statistics. 2017;45(5):1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Jacob L, Gagnon-Bartsch JA, Speed TP. Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed. Biostatistics. 2016;17(1):16–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105(4):987–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shi X, Miao W, Nelson JC, Tchetgen Tchetgen EJ. Multiply robust causal inference with double‐negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2020;82(2):521–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kuroki M, Pearl J. Measurement bias and effect restoration in causal inference. Biometrika. 2014;101(2):423–437. [Google Scholar]
  • 17.Lash TL, VanderWeele TJ, Haneuse S, Rothman KJ. Bias Analysis (Chapter 21). In: Modern Epidemiology. 4th ed. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2021:711–754. [Google Scholar]
  • 18.Richardson TS, Robins JM. Single World Intervention Graphs (SWIGs): A Unication of the Counterfactual and Graphical Approaches to Causality. Working Paper Number 128, Center for Statistics and the Social Sciences, University of Washingtion. 2013:http://www.csss.washington.edu/Papers/wp128.pdf. [Google Scholar]
  • 19.Sofer T, Richardson DB, Colicino E, Schwartz J, Tchetgen Tchetgen E. On negative outcome control of unobserved confounding as a generalization of difference-in-differences. Statistical science: a review journal of the Institute of Mathematical Statistics. 2016;31(3):348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Miao W, Shi X, Tchetgen Tchetgen EJ. A confounding bridge approach for double negative control inference on causal effects. arXiv preprint arXiv:180804945. 2018. [Google Scholar]
  • 21.Flanders WD, Khoury M. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiol. 1990;1(3):239–246. [DOI] [PubMed] [Google Scholar]
  • 22.Yanagawa T Case-control studies: assessing the effect of a confounding factor. Biometrika. 1984;71(1):191–194. [Google Scholar]
  • 23.Miettinen OS. Components of the crude risk ratio. American Journal of Epidemiology. 1972;96(2):168–172. [DOI] [PubMed] [Google Scholar]
  • 24.Greenland S Multiple‐bias modelling for analysis of observational data. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2005;168(2):267–306. [Google Scholar]
  • 25.Mak J, Shires DA, Zhang Q, et al. Suicide attempts among a cohort of transgender and gender diverse people. American journal of preventive medicine. 2020;59(4):570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Goodman M, Nash R. Examining health outcomes for people who are transgender. Washington, DC: Patient-Centered Outcomes Research Institute (PCORI) https://doi org/1025302/22019 AD. 2018;12114532. [Google Scholar]
  • 27.Flanders WD, Eldridge RC. Summary of relationships between exchangeability, biasing paths and bias. European journal of epidemiology. 2015;30(10):1089–1099. [DOI] [PubMed] [Google Scholar]
  • 28.Hernán MA. A definition of causal effect for epidemiology. J Epidemiol Community Health. 2004;58:265–271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Greenland S, Robins J. Identifiability, exchangeability, and epidemiologic confounding. Int J Epidemiol. 1986;15:413–419. [DOI] [PubMed] [Google Scholar]
  • 30.Gustafson P On model expansion, model contraction, identifiability and prior information: two illustrative scenarios involving mismeasured variables. Statistical science. 2005;20(2):111–140. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Digital Content

RESOURCES