Abstract
In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption, under which residual dependence amongst outcomes can be used to simplify and sharpen sensitivity analyses. We focus on a class of factor models for which we can bound the causal effects for all outcomes conditional on a single sensitivity parameter that represents the fraction of treatment variance explained by unobserved confounders. We characterize how causal ignorance regions shrink under additional prior assumptions about the presence of null control outcomes, and provide new approaches for quantifying the robustness of causal effect estimates. Finally, we illustrate our sensitivity analysis workflow in practice, in an analysis of both simulated data and a case study with data from the National Health and Nutrition Examination Survey (NHANES).
Keywords: Observational studies, multiple outcomes, sensitivity analysis, factor model, latent confounders, deconfounder
1. Introduction
Large observational datasets often include measurements of multiple outcomes of interest. For example, in high-throughput biology, the goal might be to understand the effect of a treatment on multiple biomarkers (Leek and Storey, 2007; Zhao et al., 2018), or in patient-centered epidemiologic studies, researchers might study the potential impact of health recommendations on multiple disease-related outcomes (Sánchez et al., 2005; Kennedy et al., 2019). While it is always possible to analyze each outcome separately, there has been a recent emphasis on the importance of techniques for simultaneously inferring the effect of an intervention on multiple outcomes (VanderWeele, 2017).
As in any observational study, the validity of multi-outcome causal inference rests on untestable and often implausible assumptions about unconfoundedness. As such, methods which explore the sensitivity and robustness of effects to assumptions about unconfoundedness are increasingly recognized as a crucial part of any rigorous analysis. In this paper, we demonstrate how to leverage prior knowledge to strengthen causal conclusions from observational datasets with multiple outcomes. We propose a sensitivity analysis unique to multi-outcome inference, which leads to characterizations of the robustness of causal effects that are both simpler and sharper than those that can be achieved by analyzing each outcome separately. We focus on linear factor models of the outcomes where we 1) establish bounds on the magnitude of unobserved confounding bias for all outcomes as a function of a single interpretable sensitivity parameter under an assumption about shared confounding, 2) provide novel theoretical results that demonstrate how assumptions about null control outcomes inform the sensitivity analysis and 3) provide practical guidance and a workflow for sensitivity analysis for multi-outcome studies. We demonstrate this workflow in simulation and an analysis of the effect of light drinking on multiple biomarkers for health.
After reviewing related literature in Section 1.1, we introduce the problem setting and the challenges in multi-outcome causal inference (Section 2). In Section 3, we establish the framework for our proposed sensitivity analysis with multivariate outcomes and introduce theoretical results about confounding bias for a model in which the expected outcomes are linear in the unobserved confounders (given observed covariates). We provide additional novel theories demonstrating how null control outcomes can be leveraged in conjunction with our sensitivity analysis and how to characterize the robustness of treatment effect estimates. We illustrate the theoretical insights with a Bayesian implementation of our sensitivity analysis on simulated data in Section 4. In Section 5, we discuss the interpretation and calibration of sensitivity parameters. In Section 6, we illustrate our approach on a real-world example of the effects of light drinking on health measures using data from the National Health and Nutrition Examination Study (NHANES).
1.1. Related Literature
There are a number of methods for causal inference with multiple outcomes, although the majority of these appear in the context of randomized control trials (Freemantle et al., 2003; Mattei et al., 2013). In the context of observational studies, Sammel et al. (1999) propose a multivariate linear mixed effects model for multi-outcome causal inference, Thurston et al. (2009) consider a Bayesian generalization of multiple outcomes which are nested across different domains, and Sánchez et al. (2005) review a variety of structural equation models for multiple outcomes with application to epidemiological problems. These works have the important caveat that they typically assume a version of the “no unobserved confounding” (NUC) assumption. More recently, Kennedy et al. (2019) develop a nonparametric doubly robust method for estimation and hypothesis testing of scaled treatment effects on multiple outcomes, but again assume NUC.
When unobserved confounding is expected to be an issue, certain assumptions about additional outcomes can sometimes be used to identify the effects of alternative primary outcomes. As one important example, additional outcomes called “proxy confounders” or “null control outcomes” can sometimes be used to identify causal effects for a set of primary outcomes (Shi et al., 2020; Tchetgen et al., 2020). Relatedly, Wang et al. (2017) establish identification assumptions in a linear Gaussian model with multiple outcomes when the non-null effects are assumed to be sparse. Although not explicitly framed in causal language, a closely related line of work explores approaches for de-biasing estimates in the presence of confounders, for example when there are batch effects in high-throughput biological datasets (e.g. Gagnon-Bartsch and Speed, 2012; Gagnon-Bartsch et al., 2013). These works all focus on various assumptions that can be made to point identify causal effects for each outcome.
In contrast, sensitivity analyses are useful for explicitly relaxing such identification assumptions. There are a variety of different approaches for assessing sensitivity to potential unmeasured confounders in single-treatment, single-outcome settings (Rosenbaum and Rubin, 1983; Tan, 2006; Franks et al., 2019; Cinelli and Hazlett, 2020; Veitch and Zaveri, 2020). In the multi-outcome setting, Fogarty and Small (2016) consider sensitivity analysis for multiple comparisons in matched observational studies using weighting estimators. They focus on the implications of the key fact that omitted variable biases across multiple outcomes are connected through the shared effect of the unmeasured confounders on the treatment. Under a similar matched pairs framework, Rosenbaum (2021) considers a sensitivity analysis for a single primary outcome and shows how a null control outcome can sometimes increase the evidence for the robustness of the primary outcome. One potential concern with their sensitivity analyses for weighting estimators is that the sensitivity analyses are implicitly based on the overly conservative assumption that unmeasured confounders explain nearly all the variation in the outcomes. We address this concern by developing a sensitivity analysis in which we explicitly account for the effect of unmeasured confounders on the outcomes.
Building on existing parametric models for multiple outcomes (e.g. Sammel et al., 1999; Sánchez et al., 2005; Thurston et al., 2009), we propose an outcome model with latent variables which account for the residual correlations between outcomes. Unlike these previous works, we expressly consider the possibility that these latent variables might correspond to potential confounders, in that they may also correlate with the treatment. To account for the potential dependence between latent variables and treatment, we use a so-called latent variable sensitivity analysis (Rosenbaum and Rubin, 1983). Typically, in a latent variable sensitivity analysis, sensitivity parameters govern the functional relationship between a latent variable and both the outcome and the treatment. For instance, Cinelli and Hazlett (2020) propose a particularly intuitive latent variable sensitivity analysis for single-treatment, single-outcome problems, with two sensitivity parameters corresponding to the fraction of outcome residual variance explained by latent confounders and the fraction of treatment residual variance explained by confounders. We generalize their approach to multi-outcome settings, paralleling a sensitivity analysis strategy developed in the context of causal inference with multiple treatments and a scalar outcome (Zheng et al., 2021b).
2. Setup
We let denote a q-vector of outcomes, a scalar treatment variable, an -vector of potential unobserved confounders, and any observed pre-treatment covariates. The goal of multi-outcome causal inference is to infer the effect of a scalar treatment on the q-dimensional outcomes. In this setting, we define a class of causal estimands as the population average treatment effect (PATE) for any linear combination of the outcomes, , as
| (1) |
where the -operator indicates that we are intervening to set the level to rather than merely conditioning on (Pearl, 2009). Most commonly, we take for some , so that is simply the causal effect on measured outcome in the original coordinates. In some cases, other linear combinations may be of interest, for example when there is not enough power to detect differences in individual outcomes, but there are detectable and interesting differences for linear combinations of outcomes (e.g. see Cook et al., 2010). Relatedly, it is often desirable to define the PATE on standardized outcomes, so that each dimension of has unit variance e.g. (Kennedy et al., 2019).
In general, the PATEs cannot be identified from observational data without assumptions about the absence of unobserved confounders. If, in addition to , the unmeasured confounders were observed, then the following three assumptions would be sufficient to identify the causal effects:
Assumption 1 (Latent unconfoundedness).
and block all backdoor paths between and (Pearl, 2009).
Assumption 2 (Latent positivity).
for all and .
Assumption 3 (SUTVA).
There is no hidden version of the treatment and no interference between units (Rubin, 1980).
Since is not observed, we cannot identify causal effects without additional assumptions. Instead of making potentially implausible assumptions about NUC, we advocate for reasoning about the strength of potential unobserved confounding. Here, we argue that sensitivity analysis can be a useful tool for characterizing how robust our multiple causal conclusions are to such confounding.
3. Sensitivity Analysis with Multiple Outcomes
Let denote the distribution of if we were to intervene to set the level of treatment to . As shown in D’Amour (2019) and Zheng et al. (2021b), can be written as where is the full conditional outcome density, is a proposed distribution for unobserved confounders given the treatment and is the marginal density of the treatment. In the multivariate outcome setting, are sensitivity parameters which govern the relationship between the -dimensional outcomes and the -dimensional unobserved confounders, whereas are parameters which govern the relationship between the scalar treatment and unobserved confounders.
A potential difficulty with multi-outcome sensitivity analysis is that the dimension of rscales with the number of outcomes. However, with multiple outcomes, there is also often additional prior knowledge that can be brought to bear on the problem. We consider two such assumptions that mitigate the challenges associated with having to reason about high-dimensional sensitivity parameters, and can, in some cases, tighten bounds beyond what we would obtain from analyzing outcomes one at a time. First, we explore the implications of assuming that confounding is shared across multiple outcomes. Second, we consider how the sensitivity analysis changes under the additional assumption that there is no causal effect for some specific outcomes (null controls).
To begin, in Section 3.1 we propose our model for multi-outcome causal inference and then in Section 3.2, we establish our sensitivity parameterization and derive worst-case bounds on the causal effects for all outcomes. For this model, under standard identifiability conditions for factor models, we show that the relevant dimensions of are identified and the bound on the magnitude of the confounding bias for all outcomes depends on a single unknown scalar sensitivity parameter governing the strength of the confounder-treatment relationship. In Section 3.4, we characterize the joint relationship between the confounding biases across outcomes by establishing how assumptions about null control outcomes constrain the set of plausible causal conclusions. Finally, we discuss different robustness measures in Section 3.5.
3.1. A Multi-outcome Model with Factor Confounding
In this paper, we focus on models for observed outcomes which have factor-structured residuals. We seek to leverage the factor-structured residuals for assessing the sensitivity of causal conclusions to assumptions about unobserved confounders. We define the conditional mean and covariance of the observed outcome distribution to be
| (2) |
where are rank- factor loadings and is a constant diagonal matrix. For now, we assume the factor loadings can vary with both and , but later make the stronger assumption that the factor loadings are constant.
There are several causal models consistent with the observed data moments in (2). Here, we propose a causal model which is explicitly parameterized in terms of latent factors, , motivated by the idea that unmeasured confounders can induce residual correlation between outcomes. Throughout, we assume the following structural equation model:
| (3) |
| (4) |
| (5) |
where is mean zero and has identity covariance without loss of generality and we define . is mean zero with diagonal covariance independent of and . The proposed structural model satisfies (2) and implies that is the intervention mean, where . The intervention mean differs from the observed data mean by an unidentifiable bias, . We say that are potential confounders because they are upstream of the treatment and outcomes, but are only truly confounding if is non-zero.
Definition 1 (Potential confounding).
are potential confounders in that they are possible causes of and . Further, and are not causes for any function of .
We cannot test whether latent variables from the outcome model are potential confounders without the structural equation assumption, since the observed moments in (2) are also consistent with a model in which the latent variables are mediators caused by the treatment1. However, even if are not potential confounders, our sensitivity analysis will yield conservative bounds on the treatment effects (see Appendix B).
When model (3)–(5) holds, the sensitivity analysis is well-defined in that the PATEs are identified given the parameters , which govern the relationship between confounders and the treatment, and , which are the factor loading matrices governing the influence of the unmeasured confounders on each outcome.
3.2. Establishing Sensitivity Bounds
To complete our model specification, we establish an interpretable sensitivity parameterization for . Following Cinelli and Hazlett (2020), we propose a parametrization which imposes no restrictions on the observed data and reflects the strength of linear dependence between and via the partial correlation, generalized to account for the possibility that each dimension of can have a different confounding effect on each outcome.
Assumption 4 (Conditional Moments of Potential Confounders).
is linear in , has constant variance, and is uncorrelated with observed covariates. Further, let be the -dimensional sensitivity vector corresponding to the partial correlation vector between and after regressing out , where and . Then, the conditional mean and covariance of is
| (6) |
| (7) |
for all , , where we define , and .
Equation (7) is implied by Equation (6) and the constraint that . Since is positive definite, we have that . To maintain consistency with related single-outcome sensitivity analyses (Cinelli and Hazlett, 2020), we denote to be the squared norm of the partial correlation vector between and given , which is identically the partial R-squared based on a linear fit to .
Finally, for the remainder of the paper, we focus on outcome models for which there are no interactions between unobserved confounders and the covariates or the treatment. In the no-interaction model, the residual outcome covariance is constant in and and thus is equivalent to the following assumption.
Assumption 5 (Homoscedasticity).
is invariant to and .
Together, Assumptions (4) and (5) imply that the factor loadings are also invariant to and 2 With these additional assumptions, we have the following bound on the treatment effect for all outcomes:
Theorem 1.
Assume model (3)–(5) with defined by (6)–(7), and Assumptions 1–5, and let . The partial fraction of outcome variance explained by confounders is . The confounding bias of is equal to and it is bounded by
| (8) |
The first bound is achieved when is collinear with .
We can immediately see that the second bound is achieved when by recognizing that the squared-bias can equivalently be written as . The theorem implies that the true treatment effect for contrast lies in the interval and as a consequence, for any , the bias for outcome is proportional to the norm of the th row of .
In the following corollary, we establish a global bound on the biases over all the outcome contrasts with .
Corollary 1.1.
Let be the largest singular value of . For all unit vectors , the confounding bias, , is bounded by
| (9) |
with equality when , the first left singular vector of , and when is collinear with , the first right singular vector of . There is no confounding bias for the causal effect estimates of outcome when .
For corresponds to the linear combination of outcomes that is most correlated with confounders. In contrast, when is in the null space of , is identified because is uncorrelated with .
3.3. Factor Confounding
Under certain additional assumptions about the factor model, we can identify up to rotation, so that the bounds in Theorem 1 and Corollary 1.1 only depend on the single unidentified sensitivity parameter, .
Fact 1.
Assume outcome model (5) with homoscedastic residuals (Assumption 5). If is rank and there remain two disjoint matrices of rank after deleting any row of , then is identifiable up to rotations from the right (Anderson and Rubin, 1956).
The bound in Theorem 1 depends on only through which is invariant to rotations. Likewise, the bound in Corollary 1.1 depends on the first singular value of which is rotation-invariant. In order for to be identified up to rotation, we must have and each confounder must influence at least three outcomes (Anderson and Rubin, 1956). We group the conditions under which the factor model (3)–(5) yields a sensitivity analysis entirely parameterized by into the following assumption.
Assumption 6 (Factor confounding).
The proposed causal model satisfies factor confounding. We say that a causal model satisfies factor confounding if the outcomes follow the model proposed in equation (5), are potential confounders (Definition 1), and is identifiable up to rotations. We say that a model satisfies factor confounding for outcome if is identifiable. Factor confounding for outcome implies that the partial fraction of outcome variance explained by confounders, , is identifiable.
There are some useful ways that practitioners can reason about the plausibility of factor confounding. In particular, since factor confounding is violated if there are confounders that influence fewer than three outcomes, practitioners should consider carefully whether there are important unmeasured confounders that might influence only one or two outcomes. While the bulk of our analysis is done under the factor confounding assumption, even when factor confounding is violated, we can still apply our sensitivity analysis, albeit with more conservative bounds on the causal effects. As such, we view factor confounding as a useful “reference assumption” that can help establish informative bounds on the causal effects. For now, we assume factor confounding, and explore additional relaxations of Assumption 6 in Appendix B.
3.4. Null Control Outcomes
In this section, we establish how additional assumptions about null control outcomes constrain the shared sensitivity vector, , and thus reduce the size of the partial identification regions for the causal effects of the other outcomes. Under null control assumptions, we fix the biases for a set of so-called null control outcomes to match the observed effect under NUC, so that the causal effects for the null control outcomes are zero after accounting for confounding biases. Such assumptions add important context, in particular because Theorem 1 only establishes marginal bounds for any treatment contrast, but does not account for the dependence in the omitted variable bias across outcomes. Null control assumptions reveal the joint relationship between biases across outcomes. Our results complement those from Rosenbaum (2021), who explore sensitivity analysis for matching estimators and demonstrate that null control outcomes can make tests of significance on the primary outcome more robust to confounding.
For a fixed treatment contrast versus , and with a slight abuse of notation, we let correspond to the -vector of PATEs on each of the measured outcomes and let denote the -vector of PATEs under NUC. Let be a set of indices for null control outcomes for which there is assumed to be no causal effect of the treatment on these measured outcomes, that is, for any . For these null control outcomes, the corresponding c-vector of treatment effects under NUC, , must equal the corresponding confounding biases. Since the bias is a function of the sensitivity vector , we have that where is a matrix equal to the rows of corresponding to null control outcomes. This equation implies that must be in the column space of and also implies a lower bound on the fraction of confounding variation in the treatment, .
Proposition 1.
Assume model (3)–(5) with sensitivity parameterization (6)–(7) and Assumptions 1–5. Further, suppose there are null control outcomes, , such that for . Then, must be in the column space of . In addition, the fraction of variation in the treatment due to the confounding is lower bounded by
| (10) |
where denotes the pseudoinverse of is identifiable under factor confounding (Assumption 6).
When the number of null control outcomes is smaller than the rank of , then is automatically in the column space of . In order to correct for the biases of null control outcomes, confounding must explain at least of the residual treatment variance. Moreover, for any assumed , the assumption of null controls constrains the space of possible effects for the non-null outcomes. We formalize this below.
Theorem 2.
Under the assumptions established in Proposition 1, for any value of , the confounding bias for the treatment effect on outcome is in the interval
| (11) |
where is the projection matrix onto the space orthogonal to the row space of . Under Assumption 6, is the only unidentifiable parameter.
Note that the ignorance region is no longer centered at but instead , where is the bias correction under the null controls assumption. Theorem 2 indicates that whenever is of rank or whenever we assume , treatment effects for all outcomes are identifiable under factor confounding. A direct comparison of the ignorance regions from Theorem 1 and Theorem 2 indicates that after incorporating null control outcomes, for any fixed the width of the ignorance region is reduced by a multiplicative factor of
| (12) |
From Equation (12), it is evident that null controls reduce the width of the worst-case ignorance region in two ways. The first factor under the radical is due to the fact that only of the treatment variance can be due to confounders which are uncorrelated with the null control outcomes. This factor reduces the width of the ignorance regions for all non-null outcomes by an equal proportion. As a special case, when all the unobserved confounders are correlated with the null control outcomes, then and the treatment effects for all outcomes are identified. In contrast, the second factor depends on the specific outcome of interest, . The ignorance region shrinks the most for outcomes that are mostly correlated with the same set of confounders as the null control outcomes. Mathematically, when is in the row space of , the treatment effect of is identified under factor confounding. When is orthogonal to the row space of , so that the confounders affecting the null control outcomes are independent of the confounders affecting , and thus there is no further reduction of the ignorance region. In summary, the best null control outcomes are those which have large confounding biases (and hence large values of ) and also have similar outcome-confounder associations with the other outcomes of interest. We illustrate these facts in a simulation study in Section 4.
3.5. Robustness
A common strategy for characterizing the robustness to confounding is to identify the “smallest” sensitivity parameter(s) which nullifies the causal effect. For example, for single-outcome inference Cinelli and Hazlett (2020) define the robustness value, , as the smallest value of , needed to change the sign of the effect3 and define an “extreme robustness value” as the smallest value of needed to change the sign without any constraints on Cinelli and Hazlett, 2022). The is a more conservative measure of robustness than , since the smallest value of needed to change the sign of the causal effect is achieved when , that is, all the residual outcome variance is attributable to confounders.
Here, we define the factor confounding robustness value, , for outcome as the smallest value of needed to make the causal effect zero under factor confounding. is a more accurate reflection of robustness than or when factor confounding holds, because is identified under this assumption for all a (Assumption 6). and can be either smaller or larger than , depending on how much variance is attributed to potential confounders from the factor model. When , then . Conversely, when , then if and only if , that is, the latent factors explain all the residual outcome variance for . We demonstrate the relationship between these robustness values in a simulation study in Section 4.
In addition, we can quantify how assumptions about null control outcomes influence . First, note that when there is only a single null control, indexed by , we have that , since is the smallest fraction of treatment variance needed to nullify outcome . In other words, the total fraction of treatment variance explained by confounders is lower bounded by the robustness value for the null control. We define the “combined robustness value” with null controls, , as
| (13) |
where is the cth canonical basis vector. corresponds to the minimum fraction of treatment variance explained by unobserved confounders that is required to make the causal effect on equal to zero and satisfy all null control assumptions. Naturally, the minimum fraction of treatment variance explained by confounders needed to satisfy the null control assumptions and nullify must be larger than the minimum fraction needed to nullify just .
Theorem 3.
Let denote the vector of PATEs for all outcomes and be the vector of PATEs for null control outcomes under NUC. Under the assumptions established in Proposition 1, the factor confounding robustness value for outcome is
| (14) |
where . The combined robustness value for outcome given null controls is
| (15) |
where .
In the following examples, in addition to the combined robustness value , we report another useful summary, which corresponds to the additional fraction of variance explained by confounders that is needed to nullify outcome beyond the amount needed to nullify the null control outcomes, .
4. Simulation Study
In this section, we provide intuition for the theoretical results in a simple simulated example without covariates where both the treatment and outcome model are linear and Gaussian. We generate observations from model (3)–(5) with latent variables, outcomes and fix in (4) and in (5), where represents the vector of causal effects for a unit change in . We choose so that there is no causal effect on the first, second, and tenth outcomes, and a causal effect of one for the other seven outcomes.
The partial correlation vector between and , is chosen uniformly on a sphere with . We also choose with a particularly simple structure, shown in Figure 1(a). After generating data, we infer and using a Bayesian multivariate linear regression model with a factor model structure on the residual covariance using the probabilistic programming language Stan (Stan Development Team, 2022).
Fig. 1.

a) Heatmap of the . b) 95% posterior credible regions for the causal effects on each of the ten outcomes for based on simulated observations without any null controls (blue) and with as a null control (red). With the null control assumption, outcomes 2 and 3 are identifiable because the corresponding rows of are collinear with row 1. Rows 4–6 of are orthogonal to the first row of , so there is only a small reduction in the size of the identification region and no change in the midpoint of this region. Multiple robustness values are reported above the intervals, including the single-outcome robustness , extreme robustness , and robustness under factor confounding . The row labeled summarizes the robustness under factor confounding with null controls, where the denominator is the combined robustness value defined in (15) and the numerator is the fraction of outcome variance needed to nullify outcome beyond that induced solely by the null control, . The null control assumption significantly increases the robustness of all non-null effects, and is enough to effectively nullify outcomes 2 and 10.
In Figure 1(b), we plot the 95% posterior credible interval for the effect of one unit change in on each outcome assuming (blue) by computing the relevant posterior quantiles of the endpoints of the partial identification region determined by Theorem 1. The outcomes with the largest intervals are those for which the corresponding rows of have the largest magnitudes (i.e. darker colors in Figure 1(a)). Only outcomes 4,7, and 8 are robustly different from zero at .
We then trace the implications of a null control assumption through the sensitivity model to illustrate the results of Section 3.4. In Figure 1(b) we plot the posterior 95% credible regions under the additional assumption that the first outcome is a null control outcome (red). After incorporating the null control assumption, the posterior credible regions for outcomes 2–10 still include the true causal effects assuming factor confounding and . Among the non-null outcomes, is the only outcome with an ignorance region that still includes zero. Since the first three rows of are mutually collinear, fixing the bias of implies that, with infinite data, the effects for and are also identified since for rows (Equation (12)). Rows 4–10 are not mutually collinear with the first row of and thus, even with infinite data, the effects remain unidentified. Rows 4–6 of are orthogonal to row 1, which implies that for and thus the midpoints of the intervals remain unchanged for outcomes 4–6. The interval widths still shrink slightly, since only of the treatment variance can be explained by the confounders of through after accounting for the null control outcomes. The dot products of with rows 7 through 10 are all nonzero, which means that the midpoints of the ignorance regions change for outcomes 7 through 10 after incorporating the null control assumption. Consistent with (11), the directional change in the midpoint of ignorance region for outcome is determined by the sign of the dot product of row and (positive for to and negative for ).
Finally, we compare four different measures of robustness to confounding in black above the corresponding intervals: single-outcome robustness (Cinelli and Hazlett, 2020), extreme robustness , robustness under factor confounding () and robustness under factor confounding with the first outcome as a null control . By definition, is the most conservative (smallest robustness values). In this simulation, is smaller than for six of the outcomes and larger for the other four. is a more accurate reflection of the true robustness under factor confounding since we are able to infer the implied values of . After correctly incorporating the null control assumption, the robustness of all the outcomes with true non-zero effects increases significantly. The largest increase in robustness occurs for , for which the causal effect is identifiable under the null control assumption. For outcomes and , the other outcomes with no true causal effect, are close to zero, which means that the null control assumption alone is sufficient to nullify these effects as well.
5. Calibration
For a sensitivity analysis to be of practical value, it is essential to calibrate the magnitude of the sensitivity parameters against interpretable benchmarks. In this section, we briefly describe strategies for calibrating , the sole sensitivity parameter of the worst-case bias under factor confounding. In Section 5.2, we propose an alternative sensitivity parameterization tailored to binary treatments which we apply to the analysis in Section 6.
5.1. Calibrating
Recall that the magnitude of the sensitivity vector in (6) can be characterized by . For linear models, can directly be interpreted as the partial fraction of treatment variance explained by confounders given , and more generally, as the squared norm of the partial correlations between the treatment and confounders (see Section 3.2). Following prior work, we can calibrate in linear models by comparing it to an estimable benchmark. For a reference covariate (or set of covariates), , and given all baseline covariates , we compute the partial -squared , which serves as a point of comparison for the unknown partial correlation . If we believed could be up to times as informative about the treatment as was, meaning , then Cinelli and Hazlett (2020) show that this implies . Thus, following this work, we use as our benchmark statistic for calibrating given an appropriately chosen value κ and covariate . See Cinelli and Hazlett (2020) for additional details on this and related calibration strategies.
5.2. Calibration for Binary Treatments
For binary-valued treatments, we suggest an alternative sensitivity parameterization which is closely related to the parameterization in the marginal sensitivity model for inference with binary-valued treatments in single-outcome problems (Tan, 2006). Let be the “nominal” propensity score, be the “true” propensity score, and be the multiplicative change in the odds of treatment after accounting for unmeasured confounders. The core assumption of the marginal sensitivity model is that the odds of treatment is bounded by some constant, that is, there exists a such that holds with probability one. Under parametric assumptions about the conditional distribution of , we can characterize the full distribution of . Although not strictly necessary, in this work we focus our examples on settings in which is assumed to be Gaussian given and .
Proposition 2.
Assume is conditionally Gaussian with mean and covariance as given in Equations (6) and (7). Further, denote and . Then, for any we have , where with and .
Proposition 2 states that the log odds ratio, , is a two-component mixture of normal distributions. Using this proposition, we can find , such that . Unconditional on is a two-component mixture with means , variance and mixture weights and , which we use to compute . Since and only depend on and , we can also derive the robustness of outcome in the -parameterization by replacing with in the formulas for and . Note that by Proposition (2), there is a one-to-one mapping between the parameterization and the R-squared parameterization. Consequently, we can calibrate by first calibrating against , as described in the previous section, and then converting the corresponding R-squared back to the scale. We demonstrate an analysis using this benchmarking strategy in the empirical example in the next section. Alternatively, we can use an informal benchmarkarking strategy for by directly computing how much the odds of treatment changes when adding a reference covariate into a propensity model which already includes some baseline covariates, e.g. by computing the 1- quantile of the odds ratio (see e.g. Kallus and Zhou, 2021; Dorn and Guo, 2022). We leave exploration of additional formal benchmarking strategies in this parameterization for future work.
6. Analyzing the Effects of Light Alcohol Consumption
We apply our proposed multi-outcome sensitivity analysis in an investigation of a long-standing question about the potential health benefits of light to moderate alcohol consumption on health outcomes. In particular, observational data indicates that light alcohol consumption is positively correlated with blood levels of HDL (“good cholesterol”) and negatively correlated with LDL (“bad cholesterol”) (e.g. Choudhury et al., 1994; Meister et al., 2000; O’Keefe et al., 2007). However, there are known to be many potential confounders related to diet and lifestyle which could explain these associations. We consider treated individuals as those who self-reported drinking between one and three alcoholic beverages per day, and untreated individuals as those that averaged one drink per week or less. We make use of laboratory outcomes, , which consist of three measures of cholesterol (HDL, LDL, and triglycerides), as well as blood levels of potassium, iron, sodium, and glucose and levels of three environmental toxicants, methylmercury, cadmium and lead, all collected from 2017–2020 (pre-pandemic) as part of the National Health and Nutrition Examination Study (NHANES). We control for observed confounders which include age, gender, and an indicator for education beyond a high school degree.
In our analysis, we consider the layers of assumptions that one might make to reason about the set of causal effects which are consistent with the observed data. First, we report posterior intervals for causal effects under the NUC assumption, and then consider bounds on the causal effects under factor confounding (Assumption 6). We calibrate the bounds by benchmarking values of , the 95 th percentile of the multiplicative change in the odds of treatment after accounting for unobserved confounders (Section 5.2). Alternatively, in lieu of specifying directly, we also explore the implications of a carefully chosen null control outcome on non-null outcomes. We report robustness values for all outcomes both with and without the null control assumption.
We start by estimating associations under NUC by fitting a multivariate linear regression model with factor-structured residuals, where the estimands reduce to the regression coefficients, , for . The logarithms of all outcomes are approximately unimodal, symmetric, and not heavy-tailed, and thus we regress the log outcomes on age, gender, education, and the alcohol consumption indicator to estimate under an assumption of no unobserved confounding. Here, we assume in (5) and fit a Bayesian multivariate linear regression with a rank- factor model structure on the Gaussian residuals using STAN (Stan Development Team, 2022). Note that factor confounding can only be satisfied if otherwise . As such, we fit models of rank and use Pareto-Smoothed Importance Sampling estimates of the leave-one-out cross-validation loss to evaluate relative model fit (Vehtari et al., 2017). We compare differences in expected log predictive density (ELPD) for different ranks and find that models with ranks and have an ELPD within one standard deviation of the full rank model, and thus can be viewed as statistically indistinguishable from the model in which there are no constraints on the residual covariance of the outcomes (see Appendix C, Table 2). For the remainder of the analysis, we proceed with the rank- 5 model as the smallest model which can explain the correlations in the data (see Appendix Figure C.1 for a heatmap of the inferred ).
Next, we use observed covariates to compute benchmark values for the sensitivity value in the -parameterization (Section 5.2). We find that age, gender, and education are significant predictors for almost every outcome (Appendix C, Table 1) and are also all significantly correlated with the propensity for light drinking. We then compute benchmark values of using the procedure proposed in Section 5.2, by assuming for different choices of covariate , and converting the implied value of back to the -scale. (See Sections 5.1 and 5.2). We find that for an unmeasured Gaussian confounder as strong as age, we have , which means that for 95% of the observed units, the predicted odds of light drinking would change by a multiplicative factor of between 1/3.6 and 3.6 when adding the unmeasured covariate into the model which already controls for age, gender and education. When benchmarking against gender we find that and for education we find that .
In Figure 2, we plot the 95% posterior intervals for the causal effects on each of the ten outcomes under the no unobserved confounding assumption as black lines. We find that HDL cholesterol, lead, methylmercury, and potassium are positively associated with light drinking and glucose is negatively associated with light drinking. The 95% credible intervals for all other outcomes include zero. We also include the worst-case bounds for each outcome assuming (black rectangles), which matches the benchmark value computed using age, the observed confounder with the strongest relationship to the treatment. While each marginal interval except methylmercury includes zero under this assumption, it is not true that all of those outcomes could simultaneously be zero at , since the worst-case biases for each outcome are achieved with different values of the sensitivity vector, .
Fig. 2.

95% posterior credible intervals for the causal effects of light drinking on each outcome under NUC (black) and under factor confounding with (black box). A multiplicative change in the odds of light drinking of is consistent with an unmeasured confounder which has a dependence with the treatment which is as strong as the dependence of the treatment on age (See Section 5). In red, we plot the 95% credible intervals when methylmercury is assumed to be a null control outcome and , the value needed to nullify methylmercury under factor confounding (red intervals). Numbers in black indicate different robustness values, converted to the -parameterization, including the single-outcome robustness , extreme robustness , robustness under factor confounding and the combined robustness value with methylmercury as a null control, (in red). Listed values are conservatively reported as the lower endpoint of the 95% posterior credible intervals of the robustness values.
We also include four different robustness values, converted to the - parameterization, above each interval. We include the single-outcome robustness values (, Cinelli and Hazlett, 2020), extreme robustness values (, Cinelli and Hazlett, 2022), robustness under factor confounding and the combined robustness under factor confounding with methylmercury as a null control . Since there is estimation uncertainty for these quantities, we conservatively report the lower endpoint of the 95% posterior interval for each robustness value. We find that under the factor confounding assumption, methylmercury levels are the most robust to confounding followed by HDL . The for methlymercury exceeds the benchmark computed from age whereas the for HDL is only slightly smaller than this benchmark, meaning an unmeasured confounder of comparable strength to age would be enough to nullify the apparent effect of light drinking on HDL. In this analysis, the single-outcome robustness values, , are all much larger than , and likely overstate the robustness of the effects under NUC. In contrast, the inferred factor model implies that for many outcomes, a relatively large fraction of outcome variance may be due to unobserved confounding. As such, for biases large enough to change the sign of the causal effects, for most outcomes. are by definition the most conservative possible robustness values and thus always less than .
Despite the apparent robustness of these effects, we suspect that methylmercury levels are primarily tied to the consumption of fish and note that mercury is not found in alcoholic beverages. Since there is no other known credible mechanism for light drinking directly influencing mercury levels, methylmercury makes an ideal null control outcome. We then evaluate whether correcting the inferred bias in effect estimates for mercury also explains away the apparent effects for the other outcomes. We plot 95% posterior credible intervals for all effects after incorporating methylmercury as a null control, assuming no additional confounding beyond the smallest amount required to explain away methylmercury’s association. Stated differently, these plots show the posterior distribution of which corresponds to the midpoint of the causal effect ignorance regions for outcome given null controls , or equivalently, the identifiable effect estimate under the assumption that (Theorem 2). If the posterior interval for includes 0, then apparently the null control assumption is enough to explain away the significance of without introducing additional unobserved confounding. If it does not include zero, then in red we report the combined robustness, , corresponding to the magnitude of confounding needed to nullify the effect of light drinking on mercury and nullify the effect on the corresponding outcome (Theorem 3, right of slash). We separately report the additional amount needed to nullify the effect beyond what is needed to nullify methylmercury (left of slash).
When methylmercury is a null control, , mercury’s robustness value. The 95% posterior credible intervals of for HDL, lead, and glucose all include zero after incorporating the null control constraint, meaning that factor confounding and the null control assumption is enough to explain away the causal effects for these outcomes, assuming no additional confounding beyond the minimum needed to explain away the effect on mercury. In contrast, the 95% credible interval for the causal effect on potassium, which included zero under NUC, moves away from zero since potassium levels are negatively correlated with mercury levels. On top of the confounders needed to nullify methylmercury, we would need additional confounders that were associated with potassium to change the odds of treatment by as much as about 2.4 (left of slash). In total, we would need unobserved confounders to change the odds of treatment by a multiplicative factor of as much as 4.1 × 2.4 ≈ 9.9 to nullify both methylmercury and potassium (the combined robustness value, right of slash). When factor confounding holds, we conclude that there is evidence that light alcohol consumption has a positive causal effect on potassium levels, but no apparent effect on HDL, lead, or glucose levels. In summary, under factor confounding, a relatively strong association between confounders and treatment is needed to explain the bias in mercury, which further implies a large bias adjustment for potentially non-null outcomes.
Crucially, the factor confounding assumption (Assumption 6) can be violated if, for example, some unmeasured environmental confounders influenced both a single outcome and the propensity for light drinking, but not any other outcomes. In such a case, we could further explore relaxations of factor confounding by manually calibrating to values that are larger than the inferred values. We discuss such strategies and provide additional theory about generalizations in Appendix B. In Appendix C we provide several additional results under relaxations of the factor confounding assumption for this example, focused in particular on the potential for single outcome confounding for methylmercury. At one extreme, when we fix for methylmercury, we assume the strongest possible association between confounders and mercury levels. When this holds, the majority of unobserved confounding for methylmercury is uncorrelated with other outcomes, and thus the null control assumption has a much smaller impact on the conclusions for other outcomes (see Appendix Figure C.2). In full generality, we note that each of the entries of can always be specified manually by a practitioner, although in practice it is likely to be very difficult to rigorously justify any such choice without starting from higher level assumptions like those proposed in this work.
7. Discussion
In this paper, we propose a sensitivity analysis for characterizing the range of potential biases that can arise in observational analyses with multivariate outcomes. Unlike previous work on observational causal inference with multivariate outcomes, which typically require stronger assumptions for causal identification, we explore the range of causal effects that are compatible with the observed data under different untestable assumptions about the strength of unobserved confounding. We show precisely how the bias varies by outcome and depends on the inferred residual covariance in the outcome model. When appropriate, we show how assumptions about factor model identifiability can be used to provide stronger results about the robustness of effects. We then characterize how null control outcomes influence both the partial identification region and robustness of effects.
There are several extensions and generalizations that are worth considering. Importantly, in this work, we focus primarily on modeling under a factor model structure, although extensions for non-continuous outcomes and more complex structures could be developed, perhaps by leveraging the copula decomposition proposed by Zheng et al. (2021b) or by making use of generalized latent variable models (Skrondal and Rabe-Hesketh, 2004). Also, in the simulation and NHANES example, we used Bayesian inference, and there is room for a more in-depth exploration of the effects of different prior distributions on partially identified parameters (Gustafson, 2015; Zheng et al., 2021a).
Finally, this work builds on closely related work on sensitivity analyses for multi-treatment causal analyses (Zheng et al., 2021b). A natural follow-up would be to consider the identification implications for data that involve both multiple simultaneously applied treatments and multiple outcomes. This would further bridge our work and recent works in proximal causal inference (Tchetgen et al., 2020). There may be particular connections to the work of Miao et al. (2018) who discuss conditions under which the average treatment effects can be nonparametrically identified with a single null control treatment and a null control outcome, via a double null controls design (see also Miao et al., 2018; Shi et al., 2020).
Supplementary Material
Acknowledgements
We thank Peng Ding and Avi Feller as well as participants at ACIC 2022 for their insightful feedback and suggestions. We also thank all the anonymous reviewers for thoughtful comments and recommendations which dramatically improved the paper. Franks’ research is partially supported by NIH R01GM144967.
Footnotes
See Zhang and Ding (2022), who derive omitted variable bounds for the direct and indirect effects in a mediation analysis under a similar framework to the one used in this paper.
We consider extensions for the heteroscedastic outcome model in which can vary with and in Appendix B.
Here we use the superscript 1, to emphasize that this is a robustness value for single-outcome analyses.
Conflict of Interest
The authors have no relevant financial or non-financial competing interests to report.
Contributor Information
Jiajing Zheng, UCSB.
Jiaxi Wu, UCSB.
Alexander D’Amour, Google Research.
Alexander Franks, UCSB.
References
- Anderson TW and Rubin H (1956). Statistical inference in factor analysis. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 5: Contributions to Econometrics, Industrial Research, and Psychometry, Volume 3.5, pp. 111–150. University of California Press. [Google Scholar]
- Choudhury SR, Ueshima H, Kita Y, Kobayashi KM, Okayama A, Yamakawa M, Hirao Y, Ishikawa M, and Miyoshi Y (1994). Alcohol intake and serum lipids in a japanese population. International Journal of Epidemiology 23 (5), 940–947. [DOI] [PubMed] [Google Scholar]
- Cinelli C and Hazlett C (2020). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1), 39–67. [Google Scholar]
- Cinelli C and Hazlett C (2022). An omitted variable bias framework for sensitivity analysis of instrumental variables. Available at SSRN 4217915. [Google Scholar]
- Cook RD, Li B, and Chiaromonte F (2010). Envelope models for parsimonious and efficient multivariate linear regression. Statistica Sinica, 927–960. [Google Scholar]
- D’Amour A (2019). On multi-cause approaches to causal inference with unobserved counfounding: Two cautionary failure cases and a promising alternative. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3478–3486. [Google Scholar]
- Dorn J and Guo K (2022). Sharp sensitivity analysis for inverse propensity weighting via quantile balancing. Journal of the American Statistical Association (just-accepted), 1–28. [Google Scholar]
- Fogarty CB and Small DS (2016). Sensitivity analysis for multiple comparisons in matched observational studies through quadratically constrained linear programming. Journal of the American Statistical Association 111 (516), 1820–1830. [Google Scholar]
- Franks A, D’Amour A, and Feller A (2019). Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association. [Google Scholar]
- Freemantle N, Calvert M, Wood J, Eastaugh J, and Griffin C (2003). Composite outcomes in randomized trials: greater precision but with greater uncertainty? Jama 289 (19), 2554–2559. [DOI] [PubMed] [Google Scholar]
- Gagnon-Bartsch JA, Jacob L, and Speed TP (2013). Removing unwanted variation from high dimensional data with negative controls. Berkeley: Tech Reports from Dep Stat Univ California, 1–112. [Google Scholar]
- Gagnon-Bartsch JA and Speed TP (2012). Using control genes to correct for unwanted variation in microarray data. Biostatistics 13 (3), 539–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gustafson P (2015). Bayesian inference for partially identified models: Exploring the limits of limited data. Chapman and Hall/CRC. [DOI] [PubMed] [Google Scholar]
- Kallus N and Zhou A (2021). Minimax-optimal policy learning under unobserved confounding. Management Science 67 (5), 2870–2890. [Google Scholar]
- Kennedy EH, Kangovi S, and Mitra N (2019). Estimating scaled treatment effects with multiple outcomes. Statistical methods in medical research 28 (4), 1094–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leek JT and Storey JD (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics 3 (9), e161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattei A, Li F, and Mealli F (2013). Exploiting multiple outcomes in bayesian principal stratification analysis with application to the evaluation of a job training program. The Annals of Applied Statistics 7 (4), 2336–2360. [Google Scholar]
- Meister KA, Whelan EM, and Kava R (2000). The health effects of moderate alcohol intake in humans: an epidemiologic review. Critical Reviews in Clinical Laboratory Sciences 37 (3), 261–296. [DOI] [PubMed] [Google Scholar]
- Miao W, Geng Z, and Tchetgen Tchetgen EJ (2018). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105 (4), 987–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miao W, Shi X, and Tchetgen Tchetgen E (2018). A confounding bridge approach for double negative control inference on causal effects. arXiv preprint arXiv:1808.04945. [Google Scholar]
- O’Keefe JH, Bybee KA, and Lavie CJ (2007). Alcohol and cardiovascular health: the razor-sharp double-edged sword. Journal of the American College of Cardiology 50 (11), 1009–1014. [DOI] [PubMed] [Google Scholar]
- Pearl J (2009). Causality. Cambridge university press. [Google Scholar]
- Rosenbaum PR (2021). Sensitivity analyses informed by tests for bias in observational studies. Biometrics. [DOI] [PubMed] [Google Scholar]
- Rosenbaum PR and Rubin DB (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society. Series B (Methodological), 212–218. [Google Scholar]
- Rubin DB (1980). Comment. Journal of the American Statistical Association 75 (371), 591–593. [Google Scholar]
- Sammel M, Lin X, and Ryan L (1999). Multivariate linear mixed models for multiple outcomes. Statistics in medicine 18 (17–18), 2479–2492. [DOI] [PubMed] [Google Scholar]
- Sánchez BN, Budtz-Jørgensen E, Ryan LM, and Hu H (2005). Structural equation models: a review with applications to environmental epidemiology. Journal of the American Statistical Association 100 (472), 1443–1455. [Google Scholar]
- Shi X, Miao W, Nelson JC, and Tchetgen Tchetgen EJ (2020). Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (2), 521–540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi X, Miao W, and Tchetgen ET (2020). A selective review of negative control methods in epidemiology. Current epidemiology reports 7 (4), 190–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skrondal A and Rabe-Hesketh S (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Chapman and Hall/CRC. [Google Scholar]
- Stan Development Team (2022). Stan modeling language users guide and reference manual, version 2.30. [Google Scholar]
- Tan Z (2006). A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association 101 (476), 1619–1637. [Google Scholar]
- Tchetgen EJT, Ying A, Cui Y, Shi X, and Miao W (2020). An introduction to proximal causal learning. arXiv preprint arXiv:2009.10982. [Google Scholar]
- Thurston SW, Ruppert D, and Davidson PW (2009). Bayesian models for multiple outcomes nested in domains. Biometrics 65 (4), 1078–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ (2017). Outcome-wide epidemiology. Epidemiology (Cambridge, Mass.) 28 (3), 399–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vehtari A, Gelman A, and Gabry J (2017). Practical bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and computing 27 (5), 1413–1432. [Google Scholar]
- Veitch V and Zaveri A (2020). Sense and sensitivity analysis: Simple post-hoc analysis of bias due to unobserved confounding. Advances in Neural Information Processing Systems 33, 10999–11009. [Google Scholar]
- Wang J, Zhao Q, Hastie T, and Owen AB (2017). Confounder adjustment in multiple hypothesis testing. Annals of statistics 45 (5), 1863–1894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang M and Ding P (2022). Interpretable sensitivity analysis for the baron-kenny approach to mediation with unmeasured confounding. arXiv preprint arXiv:2205.08030. [Google Scholar]
- Zhao Q, Small DS, and Rosenbaum PR (2018). Cross-screening in observational studies that test many hypotheses. Journal of the American Statistical Association 113 (523), 1070–1084. [Google Scholar]
- Zheng J, D’Amour A, and Franks A (2021a). Bayesian inference and partial identification in multi-treatment causal inference with unobserved confounding. arXiv preprint arXiv:2111.07973. [Google Scholar]
- Zheng J, D’Amour A, and Franks A (2021b). Copula-based sensitivity analysis for multi-treatment causal inference with unobserved confounding. arXiv preprint arXiv:2102.09412. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
