Abstract
We present a simulation study and application that shows inclusion of binary proxy variables related to binary unmeasured confounders improves the estimate of a related treatment effect in binary logistic regression. The simulation study included 60,000 randomly generated parameter scenarios of sample size 10,000 across six different simulation structures. We assessed bias by comparing the probability of finding the expected treatment effect relative to the modeled treatment effect with and without the proxy variable. Inclusion of a proxy variable in the logistic regression model significantly reduced the bias of the treatment or exposure effect when compared to logistic regression without the proxy variable. Including proxy variables in the logistic regression model improves the estimation of the treatment effect at weak, moderate, and strong association with unmeasured confounders and the outcome, treatment, or proxy variables. Comparative advantages held for weakly and strongly collapsible situations, as the number of unmeasured confounders increased, and as the number of proxy variables adjusted for increased.
Keywords: proxy variables, confounding adjustment, logistic regression
1 |. INTRODUCTION
The presence of unmeasured confounders —variables outside the range of examination but causally linking treatment and outcome —is common in observational data analysis. Without adjusting for unmeasured confounder in a regression of a treatment on an outcome, the estimated causal effect of on can often result in increased bias. Adequate adjustment of confounding variables is critical to producing robust, unbiased causal effect estimates in observational study designs.1–4 Experimental or trial designs address bias due to confounders through random allocation of subjects to exposure conditions. A series of methods have been proposed to approximate the strength of evidence from this process in observational designs.5–13
With the advent of Big Data, large scale epidemiological studies can more easily occur due to electronic health care records. With the inclusion of larger cohorts and more variables confounding is an obvious issue.14–16 Clinicians may plan ahead to adjust for certain confounders, but the collected data may lack several unmeasured confounding variables. Proxy variables, denoted in our study, are correlated with the unmeasured confounding variables but not directly with the treatment or outcome . A proxy variable that is associated with is a surrogate for this unmeasured confounder, so inclusion of in a logistic regression model when is not available conceivably would remove some confounding bias.
Wickens showed that under certain conditions asymptotic estimation of treatment/exposure effects on improves by including continuous proxy variable S for a continuous in the least squares regression model for predicting a normally distributed outcome .17 In modern large scale observational studies, , and are likely binary outcomes, resulting in interpretations dependent upon causal risk ratios.18,19 For example, consider an observational study of how anti-depressants affect suicide risk among veterans with post-traumatic stress disorder. Gravlee et al. modeled the complex interplay between racial status and measured levels of biological stress.20 High stress, which may be unobservable, might increase the probability that a person would seek out anti-depressants–but also may increase the probability that a person commits suicide. Race may be a useful proxy variable in this case because many researchers have shown that African Americans harbor higher stress levels than that of other Americans.21,22 By adjusting for race, which may be associated with the unmeasured confounder high stress, the estimate of the treatment effect for anti-depressants might be more accurate than without adjustment.
While there has been plenty of academic work in the use of proxy variables due to their importance in econometrics, social sciences, finance, epidemiology, and medical sciences; very little work has been done to explore the role of proxy variables in logistic regression with binary explanatory variables.23–27 Within the causal inference field, proxy variables have also been discussed and researched.28–32 However, these papers do not discuss logistic regression and do not discuss binary explanatory variables. Many of these authors present a causal diagram similar to the one we present in Figure 1A. With this diagram, they do discuss the benefits and limits of the proxy variable. However, these discussions are limited to linear relationships and continuous variables. Kim and Steiner find similar results to the paper by Wickens. They both find asymptotically there is a benefit to always using the proxy.17 In contrast to Wickens, Kim and Steiner do discuss more theoretical results that show the limits of the usefulness of the proxy and potential pitfalls based on parameter relationships within their causal diagram.17,28
FIGURE 1.

Simulation structure(s) depicting true relationships between , and , where (A) represents the structure without effect modifiers and (B) represents the structure with effect modifiers (represented by the dotted line). In both simulation structures, and are generated as a latent continuous variable from the multi-variety normal distribution parameterized by a vector of means, , and a symmetric matrix of variances and covariances, . This continuous variable is dichotomized according to the indicator variable to reveal the binary variables and that are of importance. is dependent only on and is generated from the Bernoulli distribution with probability that is calculated from the anti-logit defined below the casual diagram in (A) and (B) and in Section 2.2. is dependent upon and and is generated from the bernoulli distribution with probability determined from the logit model specific to the chosen simulation structure.
In our study, the treatment , outcome , unmeasured confounders , and associated proxy variables are binary–while the logistic regression setting is explored. Contrasted with linear regression models, adequate adjustment of confounding through nested models is more difficult.33–35 This difficulty stems from collapsibility, with respect to nested models in linear regression, or with non-collapsibility of the logistic regression models.34,35 In logistic regression models, non-collapsibility is often mistaken for bias.6,34 Collapsibility and the steps taken to adjust for this phenomenon in our study are more fully explained in Section 3.3.2.
The effectiveness of utilizing binary proxy variables in place of binary unmeasured confounder(s) in a logistic regression setting has yet to be explored. We seek to answer this question through an extensive simulation study, with 1000 of randomly generated simulation scenarios, and 1000 of simulation replications within each scenario. We randomly generated scenarios instead of displaying results of only a few to decide whether proxy inclusion is a good idea in general, and to avoid cherry-picking some scenarios that showed a benefit of proxy variables.
In this paper, we examine the effect of adjusting for 1, 2, and 3 proxy variables on the estimation of the true treatment effect. 10,000 randomly generated scenarios were constructed under six sets of scenario structures–defined by the number of unmeasured confounders (1, 3, or 5), and whether or not the unmeasured confounder is also an effect modifier of the relationship. For each randomly generated scenario, a 1000 datasets are generated and models with and without proxy variables are fit. We average operating characteristics over the dataset replications for each model fit, and compare the average model performance across the 10,000 randomly generated scenarios.
The rest of this paper is outlined as follows. In Section 2, we describe the simulation study setup under various scenario types, and how we randomly generated these parameters. We also list out the formal proxy and marginal treatment effect models that we will compare. In Section 3, we will describe the operating characteristics used to assess whether proxy variables should be used, and parameters of interest to explore their effects on this performance. Section 4 will describe the simulation study results and in Section 5 we include an application from Woman’s Hospital of Baton Rouge, Louisiana. This application investigates infection risk in pregnant women who received a post-operative wound dressing called negative pressure wound therapy (NPWT) or a standard dressing type. Here high body mass index was a known binary confounder with diabetic status known in the literature as a proxy variable for this confounder. We close the paper with a discussion in Section 6.
2 |. SIMULATION STUDY SETUP
To determine when proxy variables are useful to include in multivariable logistic regression models, we first define the true data generation process. Recall that is a binary outcome variable, is a binary treatment indicator, is a length binary vector of unmeasured confounders where , and is length 3 vector of binary proxy variables associated with .
Our simulation study includes six different simulation scenario structures and within each simulation structure 10,000 different scenarios are randomly generated to determine if the inclusion of results in better treatment effect estimation in general. Three of the structures are determined by the number of unmeasured confounders in the scenario by setting to either 1, 3, or 5. We double the three structures to six simulation structures by requiring the unmeasured confounder(s) to be an effect modifier. Descriptions of the true models assumed and how parameters are generated proceeds in the following subsections.
2.1 |. Treatment and outcome models
In order for to be a vector of unmeasured confounders, we must have that is related to both and as depicted in Figure 1A,B. Further, we assume that a vector of proxy variables is unrelated to . From these fact, we get the following true logistic regression formulations:
| (1) |
| (2) |
In Equation 2, when all entries , then the unmeasured confounding vector modifies the effect of the binary treatment on as shown in Figure 1B. That is, can both be an unmeasured confounder and an effect modifier. We will investigate settings with unmeasured confounders with effect modifiers and without effect modifiers.3,36 Note that Equations (1 and 2) implies that conditional on is independent of both treatment and outcome , which is the definition of a set of proxy variables.
To examine the benefits of using proxy variables, , under a myriad of different scenarios, we generated 10,000 random parameters , which quantify the relationships between binary for the six different simulations structures as described and depicted in Figure 1A,B. These randomly generated parameters are drawn from a uniform distribution with a upper and lower bound of −5 and +5.
For each replicated dataset, data for the patients is generated sequentially by first generating , which will be discussed in the next section. Then we draw from a Bernoulli distribution with probability equal to the antilogit of 1 conditional on . Finally, we draw from a Bernoulli distribution with probability equal to the antilogit of 2 conditional on both and .
2.2 |. Relationship between S and U
The multivariate normal distribution was leveraged to generate binary and correlated . In our study, we randomly generated from a latent continuous vector in the following manner
where is the element-wise indicator of whether each entry of is greater than 0 and is the inverse cumulative distribution function of the normal distribution. is a length vector, which represents the marginal probabilities of each binary outcome, obtained by drawing independent samples from a uniform distribution with lower bound 0 and upper bound 1.
Next, we construct the covariance–variance matrix of the multivariate normal distribution, , to induce correlation between the proxy variables and the unmeasured confounders . In our study, the variances are set to be 1 and therefore the covariances are just the correlations. The specific correlations between any and denoted , where and , is of interest to our study and their usefulness is described later, in Section 3.3.1.
Our covariance-variance matrices are generated from a wishart distribution, , where a by scale matrix where the diagonals are of value and the off diagonals are 0. The sampled symmetric, positive-definite matrices from the wishart distribution then have their variances set to 1, the positive-definite property of said matrix is checked again, and then saved as if it is still positive-definite in nature. Finally, with binary data, we can now generate binary and data through the true treatment and outcome models described previously.
2.3 |. Separation issues in simulation structures
Prior to fitting the models described in later Sections in 3.1, we check the plausibility of each simulation scenario through the use of contingency tables to determine if there are complete separation and quasi-separation issues. Complete separation is when a set of predictors has no overlap in predicting the outcome, and quasi-separation occurs when there is a small overlap in the set of predictors when predicting outcome.37 Separation can occur when the data is small in number and the probability of having a covariate perfectly predict the outcome is non negligible. When using many covariates, the data is effectively chopped into many small pieces per covariate, which may cause a non-negligible probability of perfect prediction of the outcome for one of the pieces.37,38
These types of separation are easily seen through a contingency table of each binary covariate and the outcome and checking for cells with 0 counts. If for any simulation replication separation occurred (indicated by a 0 in the contingency tables), we deemed these scenarios to be pathological and removed these scenarios from consideration. We could confidently say that this was due to pathological scenarios, instead of a small sample size. Failure to remove even one generated dataset with separation could lead to averages over the simulation replications that do not reflect the actual model performance. Additionally, we removed any scenarios that resulted in no variability for any of the entries .
2.4 |. A graphical depiction of the simulation structure
A graphical representation of the entire data-generation structure is shown in Figure 1A,B, with structure B indicating the effect modifier setting. As a reminder to the reader, we generate the parameters in the following manner:
are generated from a uniform distribution with bounds −5 and 5.
used in the multivariate mean for the latent relationship for via is generated from a uniform distribution with the bounds 0 and 1. These are the marginal probabilities of the binary vector .
is generated from a Wishart distribution with degrees of freedom and a diagonal mean matrix with entries . The variances are scaled to 1 to make off-diagonal matrices correlations–with this process being repeated until becomes positive definite.
When generating random datasets based on a given scenario, that is, , if any datasets result in separation this scenario is removed from consideration. Separation here is indicated by presence of zeros in entries contingency tables of pairwise relationships between entries of .
Each simulation structure pair, that is the effect modifier and non-effect modifier scenario for the same value of , share the same randomly generated parameters shown in Figure 1A,B. These parameters are generated 10,000 times giving us 10,000 random and different simulations scenarios for each of the simulation structure pairs. From each randomly generated simulation scenario, a dataset of size 10,000 is generated according the structures in Figure 1A or Figure 1B, which was chosen to emulate larger observational datasets that often have unspecified or unknown confounders. Finally, these simulation scenarios are analyzed through five different models: the true model unobserved by the researcher which includes the unmeasured confounder(s), a model with one proxy variable, a model with two proxy variables, a model with three proxy variables, and a model with only the treatment variable. In the preceding sections, we describe in more detail the model fitting process.
3 |. MODEL FITTING, OPERATING CHARACTERISTICS, AND PARAMETERS OF INTEREST
Here we describe how to assess effectiveness of the models with and without a proxy at estimating the true treatment effect, , in Equation 2. We also describe quantities that will be useful in exploring when regression models including a proxy variable improve estimation compared to models that do not include them.
3.1 |. Model fitting
The purpose of this study is to measure and evaluate the performance of a vector of proxy variables, , to stand in place of a vector of unobserved confounders, , that we cannot observe. In order to remove the confounding effect of , we must establish new relationships that includes the proxy variables . Formally, we define and a binary outcome modeled by proxy variable(s) as follows:
| (3) |
In this paper, we refer to the proxy model with one proxy as (i.e., ), the proxy model with two proxies as (i.e., ), and the proxy model with three proxies as (i.e., ). The goal of fitting any model shown in equation 3 is to obtain a sufficient , the estimated treatment effect, while removing the effect of through inclusion of proxy variables . We compare this to a model, denoted , that models the binary outcome without adjusting for the proxy variables as:
| (4) |
Here the estimated under model (4) is not adjusted by the unmeasured confounding vector in any way. The estimated under Equation (3) attempts to estimate the adjusted relationship in (2) through the relationship between the proxy variable and , established in the previous section.
Lastly, we examine the performance of the true model (2) in estimating the true adjusted treatment effect in (2), denoting this model as . We do this to see if inclusion of in a logistic regression model performs as well as inclusion of the correct . However, we should note that the rationale for using proxy variables is that researchers have already not collected information on –so proxy inclusion represents a partial remedy to this situation.
3.2 |. Operating characteristics
We first acknowledge that neither the proxy or no-proxy models can accurately estimate , unless the relationship between is collapsible. Therefore traditional measures like bias are not meaningful, as the proxy, no-proxy, and true confounder models are not directly comparable. Acknowledging this limitation, we attempt to make each model comparable by normalizing by the estimated standard error from the true model represented in Equation (2). This attempts to mitigate the spread of error variance between the terms for the true model, for the no-proxy model, and for the proxy model. This issue is unique to logistic regression, compared to typical linear regression, where differences in model choices lead to changes in the error variance, but not necessarily estimated coefficient standard error. Formally, for each simulated data set, we compute:
The quantities are calculated for every simulation replication, with using the true outcome model 2 estimated by fitting the true model with all confounders . This makes the quantities more comparable than the estimated values to the true values of found in Equation 2. For each simulation replication, we compute
which essentially approximates the increase in bias from omitting proxy variables while accounting for the difference in error rates. After completing 1000 replications of a single simulation structure scenario, the average value of across these simulations, denoted is computed. Positive values of indicate that for a specific randomly generated scenario, including a proxy variable on average, should improve inference on the true marginal treatment effect.
After obtaining values for each randomly generated scenario, we use the following operating characteristics to determine overall if proxy models perform best, and the specific scenario types where proxy models perform differently from overall trends. We investigate proxy performance within various scenario subsets by investigating , the average value of in that subset, , the standard deviation of , and , which is the proportion of times within that scenario subset where proxy variables showed promise in terms of . The value represents the likelihood that proxy variables improve treatment discovery within various true scenario subsets.
3.3 |. Parameters of interest
For the remainder of this section, we describe different scenario characteristics subsets of interest for exploring proxy variable performance, and how each is computed explicitly. We create two measures for describing these important characteristics. The first are effect sizes which we define to be the strength of the relationship between any variable and the unmeasured confounders, . The effect sizes simply quantify the degree to which any one variable is related to the unmeasured confounder through the random, true coefficients established in the previous section. The second is collapsibility, denoted , which measures the degree of collapsibility of a given simulation scenario’s randomly generated parameters in a logistic regression.
3.3.1 |. Effect sizes:
First we define the effect of the unmeasured confounders on as and the effect of the unmeasured confounders on as . These come directly from 1 as
where is obtained by marginalizing out from the joint distribution . Formally, this is can be obtained directly from the multivariate latent parameterization on . Here increased values of indicate a larger effect of the unmeasured confounders on treatment decision, while larger values of indicate a larger effect of the unmeasured confounder on outcome. For the effect modifier simulation scenarios, we add since given , these can be considered as direct effects of on . This term is down-weighted by since these terms only modify outcome for patients who received the treatment.
For the effect of on , we define three effect sizes deemed , and , and symbolically we calculate them as follows:
which is the average magnitude of the correlations between for all and , and represents an overall strength of the relationship between and the first entries of . We divide by here so that , and are all on the same scale, regardless of the number of unmeasured confounders . We assume that as increases, so does the relative performance of the proxy model compared to the non-proxy mode. The rationale is that as increases, becomes more highly correlated with and can almost serve as a stand in for in the model (2).
3.3.2 |. Collapsibility coefficient,
Collapsibility/Non-Collapsibility can be explained in two manners. Both Mood and Schuster et al explain the effect of non-collapsibility in logistic regression through the framework of developing the logistic regression model from an unobserved latent variable perspective.34,39 That is, instead of utilizing the link approach in generalized linear models, we build the logistic regression model directly from the logistic distribution. In this manner, the often used nested model approach to understand confounding in linear regression may be misleading because of how model variance is handled compared to logistic regression when explanatory variables are added to the models. When additional explanatory variables are added to a logistic regression model, both the explained variance and the total variance increase as the unexplained variance is set at a fixed value, while in linear regression models this causes the explained variance to increase and the unexplained variance to decrease by the same amount to keep the variance constant.
The second way to view collapsibility/non-collapsibility is through contingency tables, which our study directly relates to since we are modeling only binary variables. The effects of collapsing multi-dimensional contingency tables are well researched and understood.40,41 Within the contingency table framework, researchers have often defined differing degrees of collapsibility. Depending on how this measure behaves across the strata of a contingency table compared to the marginal contingency tables, this measure is defined as strongly collapsible, partially collapsible, or not collapsible (non-collapsibility).40,41
Therefore in our study, we have to take into account collapsibility when comparing across models to ensure fair and accurate comparisons. We devised two methods to attempt to account for and measure the collapsibility or lack thereof in our logistic regression models. We standardized our coefficients by the scaling, or standard error, estimate from the true model for each randomly generated data set. We also devised a measure, C, the collapsibility coefficient to relate the degree of collapsibility in our models, see Section 3.3.2. With this measure, we have a better understanding of the mixing of bias and non-collapsibility in our logistic regression models.
Here we define a measure of collapsibility in these logistic regression models in an attempt to determine how collapsibility affects the proxy model’s relative performance. A logistic regression model is said to be collapsible over if the intercept and treatment effects within various stratum defined by is equal to a marginal logistic regression model for while ignoring . Guo and Geng, define two scenarios for collapsibility with respect to logistic regression, which we attempt to leverage here. They define simple collapsibility as an event where the coefficients of a logistic regression conditional on the covariate (or confounder) equal the marginal coefficient, or when the covariate (confounder) is removed from the equation. Further, they define stronger collapsibility if the coefficients of a logistic regression remain unchanged now matter how the levels of a covariate (confounder) are pooled.40
To assess collapsibility, we create a measure , which assesses the average distance between unmeasured confounder stratum specific treatment and outcome relationships and marginal treatment outcome relationships. First, we need to create the true marginal logistic regression model , which we can obtain from the relationship:
All of these quantities are known from our simulation setup via Equation (1) and (2) for the latent multivariate normal distribution for . Once we obtain this, we solve for the true marginal relationship defined as . This is done by setting and . Note that these quantities by construction of depend on the relationship between and and between and , including effect modifiers. Using the true model 2, we determine the level of collapsibility via:
Higher values indicate that scenario is less collapsible. A case where is entirely collapsible, but is unlikely to occur here, so we focus on values of .
4 |. RESULTS
In this section, we describe the results of the simulation study. In these results, we are directly comparing the models with proxy variables and the model without any proxy variables . We will mostly focus on when comparing results to , but list all comparative results in the supplement. Additionally, we include the results for the true model that is unobtainable or unobservable by the researcher, .
We first examine overall results and then delve into the effects of the parameters of interest , and have on the comparative performance of the proxy and no-proxy models. Whether in tabular format or visual format, we utilize three metrics that were described in previous sections. These are the expected difference, , the standard deviation of the difference, , and the probability that the difference is greater than zero, . When the is greater than zero then is favored, and when the is greater than .5 then is also favored.
4.1 |. Overall results
The first column, labeled “Overall,” in Table 1 displays the overall results of our simulation without any conditioning by parameter of interests. The “all scenario” rows contain results from 30,000 simulation scenarios, as they encompass three simulation structures and therefore each individual simulation structure has 10,000 randomly generated scenarios. In this column, we see that is never negative nor is below .5. Adding an effect modifier suppresses these results some as shown by the smaller and compared to the non-effect modifier simulation structure pair and also increases the . Within a subsection of proxy models, we see that increasing simulation structure complexity, or the number of unmeasured confounders, results in a decrease in and . Additionally, increasing the number of proxy variables, , in the model also improves and , but at the expense of increasing , which indicates increased variability in the comparative fit to the model without proxies.
TABLE 1.
A table of summary statistics , and comparing the proxy models ( and the true model to the model with only a treatment effect . These results cover both the unconditional, or overall results, and results conditioned on strongly collapsible or weakly collapsible scenarios. We defined strongly collapsible to be smaller the percentile and weakly collapsible to be larger than the percentile in terms of the collapsible coefficient .
| Overall |
Strongly collapsible |
Weakly collapsible |
||||
|---|---|---|---|---|---|---|
| Scenario | () | () | () | |||
| Model with unmeasured confounder(s) | ||||||
| All scenarios with no effect modifiers | 8.561 (8.704) | 0.977 | 4.534 (5.812) | 0.942 | 13.780 (10.565) | 0.998 |
| All scenarios with effect Modifiers | 4.018 (10.982) | 0.673 | 3.158 (9.030) | 0.659 | 5.696 (12.286) | 0.713 |
| 1 unmeasured confounder | 5.730 (7.410) | 0.963 | 1.288 (2.548) | 0.886 | 10.894 (9.529) | 0.998 |
| 3 unmeasured confounders | 9.682 (8.953) | 0.985 | 5.497 (5.878) | 0.969 | 15.398 (10.865) | 0.999 |
| 5 unmeasured confounders | 10.271 (8.946) | 0.985 | 6.817 (6.605) | 0.971 | 15.049 (10.657) | 0.998 |
| 1 unmeasured confounder + effect modifiers | 3.709 (8.749) | 0.736 | 1.860 (5.286) | 0.665 | 5.744 (9.793) | 0.826 |
| 3 unmeasured confounders + effect modifiers | 4.508 (11.811) | 0.662 | 3.398 (9.798) | 0.650 | 7.241 (13.335) | 0.710 |
| 5 unmeasured confounders + effect modifiers | 3.838 (12.059) | 0.620 | 4.217 (10.858) | 0.653 | 4.103 (13.203) | 0.605 |
|
| ||||||
| Model with one proxy | ||||||
| All scenarios with no effect modifiers | 0.172 (0.600) | 0.730 | 0.099 (0.486) | 0.701 | 0.262 (0.745) | 0.756 |
| All scenarios with effect Modifiers | 0.121 (0.646) | 0.642 | 0.087 (0.531) | 0.625 | 0.164 (0.707) | 0.679 |
| 1 unmeasured confounder | 0.178 (0.536) | 0.891 | 0.045 (0.202) | 0.805 | 0.328 (0.817) | 0.922 |
| 3 unmeasured confounders | 0.176 (0.620) | 0.692 | 0.120 (0.547) | 0.681 | 0.253 (0.744) | 0.708 |
| 5 unmeasured confounders | 0.161 (0.639) | 0.618 | 0.132 (0.604) | 0.617 | 0.204 (0.661) | 0.638 |
| 1 unmeasured confounder + effect modifiers | 0.128 (0.590) | 0.726 | 0.052 (0.306) | 0.665 | 0.193 (0.737) | 0.796 |
| 3 unmeasured confounders + effect modifiers | 0.134 (0.657) | 0.619 | 0.107 (0.572) | 0.622 | 0.181 (0.685) | 0.659 |
| 5 unmeasured confounders + effect modifiers | 0.102 (0.686) | 0.580 | 0.103 (0.652) | 0.589 | 0.117 (0.695) | 0.582 |
|
| ||||||
| Model with two proxies | ||||||
| All scenarios with no effect modifiers | 0.338 (0.881) | 0.774 | 0.180 (0.658) | 0.736 | 0.536 (1.122) | 0.817 |
| All scenarios with effect Modifiers | 0.229 (0.960) | 0.663 | 0.152 (0.765) | 0.639 | 0.318 (1.073) | 0.704 |
| 1 unmeasured confounder | 0.345 (0.804) | 0.921 | 0.088 (0.298) | 0.842 | 0.652 (1.247) | 0.943 |
| 3 unmeasured confounders | 0.353 (0.907) | 0.729 | 0.219 (0.748) | 0.707 | 0.524 (1.075) | 0.787 |
| 5 unmeasured confounders | 0.317 (0.928) | 0.671 | 0.233 (0.799) | 0.660 | 0.433 (1.021) | 0.720 |
| 1 unmeasured confounder + effect modifiers | 0.243 (0.870) | 0.744 | 0.109 (0.447) | 0.678 | 0.370 (1.085) | 0.811 |
| 3 unmeasured confounders + effect modifiers | 0.248 (1.005) | 0.638 | 0.167 (0.851) | 0.625 | 0.344 (1.058) | 0.689 |
| 5 unmeasured confounders + effect modifiers | 0.195 (0.999) | 0.606 | 0.178 (0.912) | 0.615 | 0.239 (1.071) | 0.612 |
|
| ||||||
| Model with three proxies | ||||||
| All scenarios with no effect modifiers | 0.499 (1.105) | 0.791 | 0.269 (0.791) | 0.753 | 0.795 (1.410) | 0.838 |
| All scenarios with effect Modifiers | 0.333 (1.211) | 0.670 | 0.228 (0.962) | 0.642 | 0.450 (1.332) | 0.715 |
| 1 unmeasured confounder | 0.502 (1.005) | 0.925 | 0.122 (0.338) | 0.853 | 0.937 (1.515) | 0.945 |
| 3 unmeasured confounders | 0.540 (1.163) | 0.748 | 0.350 (0.900) | 0.737 | 0.810 (1.430) | 0.819 |
| 5 unmeasured confounders | 0.455 (1.139) | 0.691 | 0.335 (0.961) | 0.669 | 0.638 (1.256) | 0.751 |
| 1 unmeasured confounder + effect modifiers | 0.355 (1.112) | 0.746 | 0.157 (0.606) | 0.678 | 0.523 (1.366) | 0.812 |
| 3 unmeasured confounders + effect modifiers | 0.367 (1.294) | 0.652 | 0.256 (1.089) | 0.639 | 0.514 (1.341) | 0.707 |
| 5 unmeasured confounders + effect modifiers | 0.277 (1.220) | 0.612 | 0.272 (1.103) | 0.610 | 0.344 (1.282) | 0.626 |
The first subsection in Table 1 represents the results using the unmeasured confounder(s), or . We see that including the unmeasured confounders in our model results in the largest and . These results are to be expected as models that include the unmeasured confounder(s) more closely resemble the true data generating process, but these models are not available to the researcher in this study. Regardless of which model is used, , we see that the proxy model better estimates , the true treatment effect, better than , the model without the proxy, in the overall aggregated results.
The other two columns in Table 1 are conditioned on a parameter of interest , our measure of collapsibility. After conditioning on , the rows in these columns contain 2000 random scenarios as strongly collapsible and weakly collapsible encompass the bottom and top percentiles, respectively. Most of the trends discussed previously are conserved when conditioning on either strongly or weakly collapsible. However, there are differences. “Strongly” collapsible scenarios have smaller comparative results using the proxy than the “overall” results, while “weakly” collapsible scenarios show the strongest comparative advantage of using a proxy variable. This is expected, since collapsibility effectively measures the difference between the marginal treatment effects and those adjusted by –with larger values indicating a larger effect of , and hence a stronger rationale for use of . In no scenario set from Table 1 does perform worse than at estimating as neither below 0 nor below .5.
To further investigate our simulation study we partitioned the parameter space by the other measures described in Section 3 called effect sizes . Remember, we define a given effect size as the strength of the relationship between and another variable of interest . Larger values mean a stronger relationship between and that variable and smaller values mean a weaker relationship. In the context of our study, these effect sizes can be viewed as the confounder strength with respect to the relationship between and or (via ) and proxy strength (via ) with respect to the relationship between and . Figure 2 reveals how the estimation of is effected by increasing for each .
FIGURE 2.

Scatterplot of for each proxy model across , where is the number of proxy variables in the given model. Points of interest: Plot A at percentile for , and , respectively. Plot B at percentile for , and , respectively. From this plot, a model with any number of proxies better estimates than even at very weak effect sizes (smaller than percentile) on average.
Figure 2 displays across the entire range of for each for each simulation structure. We see that increasing , that is, the association, for all improves the estimation of across all simulation structures. Figure 2A,B are the plots that have the results that are the closest to , or parity between any and . However, we see that even at the smallest percentile bin, percentile, for Figure 2A,B that still better estimates . Moreover, referencing Figure S1, better estimates greater than of the time across all simulation structures even at a weak proxy strength as the median value of is above 0. Figure S2–S7 reveals similar marginal trends for both the mean and median of but with , and percentile bins.
4.2 |. Effect sizes: and collapsibility:
, offers the best, most intuitive insight into explaining and extrapolating the results of our overall study. Therefore, the remainder of the results section will focus on this model, or . Table 2 focuses on the results for by conditioning solely on , or and joint conditioning across all simulation structures. Like Table 1, the columns identified by weak and strong effect size scenarios and the strongly and weakly collapsible scenarios are the bottom and top percentiles, respectively.
TABLE 2.
A table containing the results of only , the model with only one proxy variable, for our operating characteristics , and , where in the table is is , and is . These results summarize every simulation structure scenario across either conditioning solely on , or , represented in the upper third of the table, or jointly conditioning on , or , represented in the lower two-thirds of the table. Each effect size can be conditioned by one of two different magnitudes shown as “weak” or “strong.” We define a weak effect size to be smaller than the percentile and a strong effect size to be greater than the percentile. Strongly and weakly collapsible are defined similarly with strongly collapsible being less than the percentile and weakly collapsible to be greater than the percentile across the randomly generated scenarios. This table shows a thorough investigation that reveals better estimates than across a range of effect sizes.
| Subsetted by |
Subsetted by |
Subsetted by |
||||
|---|---|---|---|---|---|---|
| Weak | Strong | Weak | Strong | Weak | Strong | |
| E (SD) [P] | E (SD) [P] | E (SD) [P] | E (SD) [P] | E (SD) [P] | E (SD) [P] | |
| Scenario | ||||||
| All scenarios with no effect modifiers | 0.024 (0.106) [0.685] | 0.447 (1.085) [0.753] | 0.111 (0.420) [0.712] | 0.215 (0.735) [0.732] | 0.119 (0.431) [0.727] | 0.212 (0.711) [0.727] |
| All scenarios with effect modifiers | 0.015 (0.115) [0.606] | 0.327 (1.205) [0.658] | 0.054 (0.444) [0.579] | 0.166 (0.772) [0.682] | 0.076 (0.486) [0.630] | 0.140 (0.675) [0.640] |
| 1 unmeasured confounder | 0.004 (0.007) [0.762] | 0.599 (1.056) [0.932] | 0.050 (0.183) [0.804] | 0.284 (0.789) [0.925] | 0.057 (0.218) [0.821] | 0.294 (0.774) [0.913] |
| 3 unmeasured confounders | 0.030 (0.096) [0.694] | 0.402 (1.115) [0.683] | 0.137 (0.442) [0.702] | 0.186 (0.714) [0.670] | 0.151 (0.468) [0.720] | 0.183 (0.692) [0.669] |
| 5 unmeasured confounders | 0.039 (0.155) [0.600] | 0.339 (1.067) [0.646] | 0.145 (0.542) [0.630] | 0.175 (0.694) [0.600] | 0.150 (0.535) [0.641] | 0.158 (0.655) [0.599] |
| 1 unmeasured confounder + effect modifiers | 0.003 (0.009) [0.654] | 0.424 (1.226) [0.740] | 0.021 (0.215) [0.591] | 0.231 (0.862) [0.835] | 0.055 (0.267) [0.691] | 0.186 (0.716) [0.737] |
| 3 unmeasured confounders + effect modifiers | 0.021 (0.101) [0.601] | 0.353 (1.189) [0.646] | 0.069 (0.457) [0.584] | 0.150 (0.735) [0.635] | 0.100 (0.515) [0.630] | 0.145 (0.667) [0.610] |
| 5 unmeasured confounders + effect modifiers | 0.019 (0.170) [0.563] | 0.205 (1.190) [0.588] | 0.070 (0.579) [0.563] | 0.118 (0.705) [0.578] | 0.071 (0.608) [0.569] | 0.090 (0.637) [0.575] |
|
| ||||||
| Strongly collapsible | ||||||
| All scenarios with no effect modifiers | 0.018 (0.093) [0.636] | 0.228 (0.863) [0.734] | 0.061 (0.348) [0.687] | 0.151 (0.629) [0.698] | 0.100 (0.413) [0.726] | 0.126 (0.747) [0.672] |
| All scenarios with effect modifiers | 0.011 (0.116) [0.580] | 0.193 (0.941) [0.626] | 0.048 (0.378) [0.571] | 0.119 (0.573) [0.665] | 0.066 (0.450) [0.637] | 0.083 (0.524) [0.631] |
| 1 unmeasured confounder | 0.001 (0.004) [0.619] | 0.150 (0.425) [0.885] | 0.006 (0.024) [0.699] | 0.097 (0.380) [0.869] | 0.046 (0.213) [0.797] | 0.118 (0.499) [0.818] |
| 3 unmeasured confounders | 0.025 (0.087) [0.694] | 0.284 (1.019) [0.666] | 0.083 (0.261) [0.718] | 0.161 (0.729) [0.651] | 0.124 (0.441) [0.707] | 0.092 (0.829) [0.649] |
| 5 unmeasured confounders | 0.028 (0.134) [0.597] | 0.246 (0.994) [0.655] | 0.112 (0.569) [0.640] | 0.184 (0.669) [0.613] | 0.151 (0.562) [0.646] | 0.179 (0.741) [0.625] |
| 1 unmeasured confounder + effect modifiers | 0.002 (0.008) [0.603] | 0.158 (0.628) [0.656] | 1.6e−5 (0.107) [0.553] | 0.123 (0.387) [0.792] | 0.058 (0.282) [0.699] | 0.041 (0.347) [0.664] |
| 3 unmeasured confounders + effect modifiers | 0.020 (0.083) [0.616] | 0.270 (1.008) [0.632] | 0.094 (0.461) [0.606] | 0.101 (0.634) [0.621] | 0.063 (0.403) [0.623] | 0.124 (0.680) [0.609] |
| 5 unmeasured confounders + effect modifiers | 0.011 (0.176) [0.529] | 0.143 (1.122) [0.588] | 0.056 (0.464) [0.557] | 0.135 (0.643) [0.596] | 0.076 (0.618) [0.580] | 0.085 (0.460) [0.615] |
|
| ||||||
| Weakly collapsible | ||||||
| All scenarios with no effect modifiers | 0.031 (0.111) [0.726] | 0.719 (1.380) [0.770] | 0.176 (0.504) [0.736] | 0.320 (0.912) [0.777] | 0.150 (0.511) [0.633] | 0.255 (0.769) [0.764] |
| All scenarios with effect modifiers | 0.021 (0.118) [0.653] | 0.476 (1.361) [0.691] | 0.059 (0.452) [0.610] | 0.212 (0.831) [0.729] | 0.084 (0.564) [0.658] | 0.171 (0.668) [0.672] |
| 1 unmeasured confounder | 0.005 (0.010) [0.837] | 1.180 (1.571) [0.930] | 0.080 (0.186) [0.842] | 0.451 (1.052) [0.939] | NA (NA) [NA] | 0.346 (0.868) [0.926] |
| 3 unmeasured confounders | 0.039 (0.095) [0.712] | 0.563 (1.340) [0.710] | 0.241 (0.622) [0.739] | 0.256 (0.877) [0.709] | 0.088 (0.263) [0.647] | 0.222 (0.732) [0.688] |
| 5 unmeasured confounders | 0.049 (0.164) [0.632] | 0.428 (1.086) [0.674] | 0.184 (0.525) [0.659] | 0.207 (0.698) [0.626] | 0.161 (0.545) [0.630] | 0.151 (0.614) [0.599] |
| 1 unmeasured confounder + effect modifiers | 0.004 (0.009) [0.696] | 0.692 (1.596) [0.817] | 0.055 (0.250) [0.661] | 0.289 (1.042) [0.882] | 0.054 (0.102) [0.881] | 0.209 (0.721) [0.766] |
| 3 unmeasured confounders + effect modifiers | 0.036 (0.103) [0.675] | 0.484 (1.251) [0.684] | 0.074 (0.474) [0.612] | 0.192 (0.705) [0.671] | 0.090 (0.332) [0.692] | 0.172 (0.656) [0.643] |
| 5 unmeasured confounders + effect modifiers | 0.002 (0.175) [0.590] | 0.283 (1.201) [0.588] | 0.047 (0.556) [0.562] | 0.140 (0.637) [0.606] | 0.089 (0.737) [0.574] | 0.121 (0.605) [0.584] |
Table 2 continues many of the trends previously established by Table 1 and Figure 2. First, we concentrate on the first 8 rows, or those rows that are conditioning solely on a single effect size. At no point is below .5 and neither is below zero. Simulation structures with effect modifiers have smaller and than their non effect modifier pair. Further, increasing increases . Like Figure 2, increasing an effect size increases and . However, when conditioning on weak effect sizes, we see that and increase with .
The bottom two-thirds of Table 2 consist of jointly conditioning on and a given effect size , or . The collapsibility results shown in Table 1 are echoed here. Conditioning on any weak effect size and strongly collapsible scenarios results in the smallest and in the entirety of Table 2. There is a small increase in and when comparing weak effect size strength and moving from strongly collapsible to weakly collapsible. The best result in the table occurs at a weakly collapsible scenario and strong for the simplest simulation structure with unmeasured confounder and no effect modifier: and . While the worst results lies at conditioning at weak and strongly collapsible for the simplest simulation structure with unmeasured confounder and no effect modifier: and . However, even with being close to zero and close to .5 at certain conditioning scenarios, Table 2 shows that better estimations than at every conditioning scenario.
4.3 |. A closer look at
Figure 3, a heatmap of , and Figure 4, a heatmap of , allow for a closer examination of the trends previously discussed under across the pairs of effect sizes and . Marginally (on the top and right of each plot) each bin represents about of the randomly generated scenarios, which are defined by the quantiles of each marginal measure. The interior squares contain scenarios that are jointly within the top and right effect size percentile groups, and do not necessarily contain of the randomly generated scenarios.
FIGURE 3.

Heatmaps for from only the simulation scenario without effect modifiers . These heamaps depict across the entire range of effect sizes and collapsibility, jointly (the heatmap) and marginally (the column vector to the right or row vector above). Each row or column represents one of the 10 percentile bins of each effect size. This figure shows in greater detail than any previous table or column that better estimates than .
FIGURE 4.

Heatmaps for from only the simulation scenario without effect modifiers . These heamaps depict across the entire range of effect sizes and collapsibility, jointly (the heatmap) and marginally (the column vector to the right or row vector above). Each row or column represents one of the 10 percentile bins of each effect size. This figure shows in greater detail than any previous table or column that better estimates than . Even at the smallest effect sizes, the is above .5.
Both Figure 3 and 4 reinforce previous findings. Across the entire heatmap for both figures, is never below 0, and is never below .5. Visually, we see that increasing any effect size or jointly or marginally results in increasing and . This is more visually obvious in Figure 3 than Figure 4, as the does not increase as smoothly. Figure 3D–F and Figure 4D–F depict the results of our collapsibility parameter both jointly and marginally with other effect sizes. At the smallest jointly conditioned bins, we see that and are above 0 and .5. For plot D, and . For plot E, and . For plot and . Therefore, even at very small values of , or what would be strongly collapsible, still better estimates than .
Figures S20 and S21 show the results for unmeasured confounders with effect modifiers for the statistics and under . These heatmaps reveal already discussed trends such as increasing effect sizes and result in larger , and small values of , result in smaller and . These heatmaps also support previously mentioned variability in the results. Visually, the variability is revealed in the less striking trend of smooth color change found in Figure 3 and 4. In Figures S20 and S21, the color change is more abrupt from bin to bin. Additionally, these supplemental heatmaps have the first negative values for and below .5, favoring over . Nearly all of these values occur at small effect sizes or small values of . Further, it is not a given that a negative is paired with a below .5. Often times, they do not coincide at all.
4.4 |. A closer look at
Figures S22–S25 are the heatmaps for simulation structure for both effect modifier and non-effect modifier scenarios. Figures S22 and S23 contains results for the simulation structure with unmeasured confounders for and , respectively. Figures S24 and S25 contains results for the simulation structure with unmeasured confounders with effect modifiers for and , respectively. These heatmaps reveal similar results and trends that have been previously reported. That is, increasing an effect size or increases both and . This is more obvious when looking at the marginal results instead of the joint heatmap. Nearly every where and at every effect size or better estimates as there are very few negative values of and very few below 0.5. For Figures S22 and S23, when either and favor it is due to the number of scenarios at those values being very small. What is different from previous heatmaps is that the variability in the results is more obvious. Adding an effect modifier does not guarantee that and decrease in value. Though, adding an effect modifier does seem to increase the number of below 0 and below .5 compared to the non-effect modifier scenarios. However, these instances that favor are few in number and exhibit no real trend.
Figures S26–S29 are the heat maps for simulation structure for both effect modifier and non-effect modifier scenarios. Figures S26 and S27 contains results for the simulation structure with unmeasured confounders for and , respectively. Figures S28 and S29 contains results for the simulation structure with unmeasured confounders with effect modifiers for and , respectively. Similar to prior results, these heatmaps show that increasing an effect size or results in increasing and . However, this trend of increasing and is more noisey and less obvious, especially for Figures S28 and S29 that introduce the effect modifier. Further, introducing the effect modifier results in and being below 0 and .5 more often. When is below 0 and is not guaranteed to be below .5. These occurrences that favor still have very little trend to them, but are more often than not associated with smaller values of an effect size or .
5 |. APPLICATION
Utilizing a de-identified dataset from a retrospective cohort study conducted by Woman’s Hospital (Baton Rouge, Louisiana), we sought to determine the true treatment effect of the specific dressing, negative pressure wound therapy (NPWT), on the risk of infection. Clinicians at Woman’s hospital reported an increased likelihood of using NPWT if the patient had a . Since is also associated with a higher risk of infection, it is a confounder for the NPWT treatment effect.
In the results reported below, we assumed that we did not have access to the variable BMI. Instead, we used diabetes as a proxy for , as diabetes is associated with high BMI. This was to properly mirror the simulation study and relationship structure described in previous sections and in Figure 1A. The notation used in previous sections continues in this section: is proxy, is outcome, is treatment, and is the unmeasured confounder. Therefore, can be thought of as the unmeasured confounder, , that is correlated with the infection, , an outcome and NPWT, , the treatment. We examine the estimated treatment effect for in three different models: (1) with only, (2) with and , and (3) with and .
The data analyzed from the retrospective cohort study had an data points, consistent with the simulation study presented in previous sections. In the data, the participants are either African American, , or White, . The median and mean age of participants in the study is both 29 years old.
, is a known confounder for and we can see that and , representing a large shift in treatment assignment probability based on high BMI and an estimated effect size of .22. The data also shows that is correlated with , infection, because those in the class were twice as likely to get an infection, compared to and thus had an estimated effect size of .023. Finally, those in the class were correlated with diabetes status, , because and , that is, the probability of being diabetic given was twice the probability of being diabetic for patients who did not have a and thus had an estimated effect size of .
Table 3 shows the results of the three models using the application data. Since we have the unmeasured confounder, , we can run the true model, denoted . From the logistic regression of , we see that including the “unmeasured” confounder and the treatment, that the treatment effect estimate is with confidence intervals of .
TABLE 3.
Table of true model, proxy model, and no proxy model estimated treatment effects and confidence intervals.
| Application results by model | ||
|---|---|---|
| Model | Treatment effect | 95% confidence interval |
| 0.504 | [0.134,0.857] | |
| 0.794 | [0.452, 1.11] | |
| 0.844 | [0.507,1.15] | |
The logistic regression of , which included the proxy and the treatment, resulted in a treatment effect estimate with confidence intervals of . Last, the model with only the treatment, , resulted in a treatment effect estimate , with confidence interval of . Here the proxy estimate effect compared to models without proxies is closer to the confounding adjusted estimate–which also holds for confidence intervals.
6 |. DISCUSSION
Focusing on the model , and the simulation structure with unmeasured confounders without effect modifiers may best explain the results to our study. For one, we know that using the model best represents the truth in this scenario. That is, there is just one , it is standing in for exactly one , and there are no effect modifiers. While in other scenarios, any number of s are standing in for any number of s and their also may exist effect modifiers. With this in mind, we can look back at the presented results and see that even conditioning on weak effect sizes and strongly collapsible scenarios reveals that still best estimates . Jointly conditioning on both weak effect sizes and strongly collapsible scenarios, however, do reveal times where approaches very close to 0 and approaches .5. Though, at no time is worse at estimating .
The simulation scenario of with no effect modifiers under the model should be the most readily collapsible scenario (strongly collapsible). This is supported by the fact that is closest to the null and closest to .5 when conditioning on strongly collapsible scenarios at weak effect sizes. Comparing jointly conditioned strongly collapsible and weak effect size scenarios to that of weakly collapsible scenarios and weak effect sizes reveals what is termed as non-collapsibility in the literature. This non-collapsibility is the small difference in between weakly collapsible scenarios and strongly collapsible scenarios when at a weak effect sizes and can often look like bias. Therefore, collapsible scenarios should see the two models, and , move closer together in estimating power of , which is exactly what we see in our results. Further, non-collapsible scenarios (weakly collapsible) would introduce an effect often misinterpreted as bias in favor of any model with the a proxy, also a result supported by our study. However, this difference between the collapsible and non-collapsible scenarios according to our parameter, , is quite small. Revealing that collapsibility does have an effect in our scenarios but here at the most simplest simulation structure the effect is quite small. Therefore, non-collapsibility cannot account for all of the advantage or difference in results with respect to or any proxy model, . Lastly, moving beyond in the collapsibility discussion, we see that the results for or 5 move further from the null. This is because these scenarios are difficult to collapse, especially when adding in effect modifiers.
The effect sizes , and are also important to the performance of the proxy variable. Our results reveal that the stronger these relationship are the more easily the proxy variable can stand in for the unmeasured confounder, which should make intuitive sense. This is supported by the fact that at weak effect sizes, is closest to the null and closest to .5. However, even at these small effect sizes is still better estimating . Therefore, our results show that even a weak proxy standing in for a weak confounder is enough to improve the estimation of .
From our results, it is hard to conclusively say beyond which effect size is most important. Given that confounding is present, our results do show that prioritizing the correlation between and results in the greatest improvement in estimating . Beyond prioritizing , the answer is much less clear. Our results show that difference between the strength of and is negligible when focused on proxy variable performance. It is obvious that having a sufficient enough relationship between and is important as to ensure confounding exists. However, this confounding relationship described between and need not be overly strong, but can in fact be quite weak and still result in better estimation of under .
Moving beyond without effect modifiers to or with and without effect modifiers, reveals similar trends and results as previously discussed. That is, the sensitivities to effect size strength and collapsibility still exist at these other simulation scenarios. However, these more complex simulation scenarios reveal an increase in variability that is shown by the larger . This variability can be explained by the fact that the proxy variable is now having to stand in for more unmeasured confounders than before as well as causal mechanisms that it is not strongly correlated with. As we increase the count of unmeasured confounders, there is a larger chance that is related to any . Conversely, there is chance that the proxy variable is either only weakly related to any , or only related to a subset of ’s. Further, the effect modifier is a causal mechanisms that may not be strongly correlated with. This mechanism is difficult for the proxy variable to sufficiently stand in for in a model. The increased count of unmeasured confounders and the possibility of effect modifiers impart variability to our results and influence the larger standard deviations observed by these simulation scenarios. However, even at these more complex simulation scenarios better estimates on average and in the long run as supported by and .
Increasing the number of proxy variables, , available to the model, , results in better estimation of . The simple explanation here is that in having variables that are correlated with a single unmeasured confounder(s) means greater a chance that one or more of the proxy variables are an adequate stand in. The improvement is not linear from one proxy to two proxies to three. The results show nearly a doubling between one and two proxy variables but tapers off at three. Showing that there is an attenuation in the improvement as more and more proxy variables are added. However, there are drawbacks to adding more proxy variables. Adding more, according to our results, increase the variability in the comparative results. Meaning, there may be times that the proxy variables are not suitable stand ins for an unmeasured confounder(s) and, in fact, make your estimation of worse.
The real-world application of proxy variables requires careful consideration and domain expertise to properly use to ensure the bias reducing effects are captured. As outlined in our study, one could utilize a proxy variable after the fact of data collection, in which control of confounding was not complete, poor, not done, or new confounders are possibly discovered. Unmeasured confounders are often noticed when our expectations of treatment effects are subverted. This is seen in our application. The clinician’s expectations of what the conclusion of the treatment effect would be was different than initial results. This subversion was caused by an unmeasured confounder. When this is noticed, a search for a proxy can occur that is a suitable stand in for the unmeasured confounder. The hard part is determining what possibly could be confounding the treatment and outcome and then finding a suitable substitute for the unmeasured confounder.
Our results definitively show that a single proxy variable, or even up to 3 proxy variables, in a simple logistic regression improves the estimation of a true adjusted treatment effect, . Table 1 shows from an overall view that any better estimates than . These results stand to further scrutiny when various conditioning procedures are applied as shown in Table 2, Figures 2–4, as well as their analogs in the supplemental section. The times that any is worse than at estimating is few and far between. In the results reviewed in the main manuscript, no aggregated result shows better estimating , and in the Supplemental material, is only better estimating at extremely weak effect sizes, which also may be an artifact of the few number of scenarios in these cases. These occurrences of better estimation do not happen often, have no real trend other than being at the extremes of effect sizes, and are not as large in magnitude compared to how much better estimates at large effect sizes. Lastly, we showed that even when a simulation is nearly collapsible (Tables 1 and 2, Figures 3D–F and 4D–F, as well as numerous Supplemental Figure), any is still the preferred model in the aggregate. These results hold over increasingly complex simulation scenarios and with the addition of effect modifiers.
7 |. CONCLUSION
In our study backed by extensive simulation studies, we found that inclusion of proxy variables in logistic regression models can improve (remove bias) the estimated treatment effects in retrospective studies when correlated with unmeasured confounders. Using one, two, or three proxy variables improves the estimation of the treatment effect compared to using no proxy variables. Increasing the number of proxy variables in the model improved bias linearly, which makes sense intuitively because three proxy variables should have a higher probability to be more correlated with the unmeasured confounder. We also constructed three measures that we called effect sizes that summarize the strength of the relationship between the variables of interest in our model. These effect sizes measured the strength of the collapsibility of the model and the relationships between (1) proxy variables and unmeasured confounders, (2) unmeasured confounders and the treatment, and (3) unmeasured confounders and the outcome. We found that at very small values of these effect sizes (essentially weak relationships) that proxy variables can provide an improved estimation of the true treatment effect when compared to a model without the proxy variable. This means if we know that the proxy variable has a relationship with the treatment variables in our model, we should be able to use them in such a manner to de-bias the estimated treatment effect. Finally, our simulation studies included treatment effect modifiers and different numbers of unmeasured confounders. The results described above hold in these settings, but the results have more variance due to the increasing complexity of the constructed simulation studies. In summary, we found that inclusion of proxy variables in logistic regression for retrospective studies, within the constraints of our simulation study, can improve the estimated treatment effect toward the true treatment effect.
Supplementary Material
ACKNOWLEDGMENTS
This research was done using resources provided by the Open Science Grid, which is supported by the National Science Foundation award #2030508.42,43
Footnotes
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
SUPPORTING INFORMATION
Additional supporting information can be found online in the Supporting Information section at the end of this article.
DATA AVAILABILITY STATEMENT
The dataset used in the application is not available for data sharing.
REFERENCES
- 1.Drake C, Mcquarrie A. A note on the bias due to omitted confounders. Biometrika. 1995;82(3):633–641. [Google Scholar]
- 2.Pearl J. Why there Is No Statistical Test for Confounding, why Many Think there Is, and why they Are Almost Right.” tech. rep. University California; 1998. [Google Scholar]
- 3.Rothman KJ, Greenland S, Lash TL. Modern epidemiology. LWW; 3rd ed.; 2012. [Google Scholar]
- 4.VanderWeele TJ, Shpitser I. On the definition of a confounder. Ann Statist. 2013;41:196–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Benedetto U, Head SJ, Angelini GD, Blackstone EH. Statistical primer: propensity score matching and its alternatives. Eur J Cardiothorac Surg. 2018;53:1112–1117. [DOI] [PubMed] [Google Scholar]
- 6.Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Statist Sci. 1999;14(1):29–46. [Google Scholar]
- 7.Haneuse S, Vanderweele TJ, Arterburn D. Using the E-Value to Assess the Potential sEffect of Unmeasured Confounding in Observational Studies Feb. 2019, Using the E-Value to Assess the Potential Effect of Unmeasured Confounding in Observational Studies. [DOI] [PubMed] [Google Scholar]
- 8.Haukoos JS, Lewis RJ. The propensity score. JAMA. 2015;314:1637–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17:360–372. [DOI] [PubMed] [Google Scholar]
- 10.MacIejewski ML, Brookhart MA. Using instrumental variables to address bias from unobserved confounders. JAMA. 2019;321(21):2124–2125. [DOI] [PubMed] [Google Scholar]
- 11.Rothman KJ. Epidemiology: an Introduction. Oxford University Press; 2002. [Google Scholar]
- 12.Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15:291–303. [DOI] [PubMed] [Google Scholar]
- 13.Stürmer T, Wyss R, Glynn RJ, Brookhart MA. Propensity scores for confounder adjustment when assessing the effects of medical interventions using nonexperimental study designs. J Intern Med. 2014;275:570–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nørgaard M, Ehrenstein V, Vandenbroucke JP. Confounding in observational studies based on large health care databases: problems and potential solutions – a primer for the clinician. Clin Epidemiol. 2017;9:185–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bosco JL, Silliman RA, Thwin SS, et al. A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies. J Clin Epidemiol. 2010;63:64–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kahlert J, Gribsholt SB, Gammelager H, Dekkers OM, Luta G. Control of confounding in the analysis phase – an overview for clinicians. Clin Epidemiol. 2017;9:195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wickens MR. A note on the use of proxy variables. Econometrica. 1972;40:759. [Google Scholar]
- 18.Chiba Y. Sensitivity analysis of unmeasured confounding for the causal risk ratio by applying marginal structural models. Commun Stat–Theory Methods. 2010;39:65–76. [Google Scholar]
- 19.Kasza J, Wolfe R, Schuster T. Assessing the impact of unmeasured confounding for binary outcomes using confounding functions. Int J Epidemiol. 2017;46:1303–1311. [DOI] [PubMed] [Google Scholar]
- 20.Gravlee CC. How race becomes biology: embodiment of social inequality. Am J Phys Anthropol. 2009;139:47–57. [DOI] [PubMed] [Google Scholar]
- 21.Howell EA. Reducing disparities in severe maternal morbidity and mortality. Clin Obstet Gynecol. 2018;61:387–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jackson JS, Knight KM, Rafferty JA. Race and unhealthy behaviors: chronic stress, the HPA Axis, and physical and mental health disparities over the life course. Am J Public Health. 2010;100:933–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Geronimus AT, Bound J. Use of census-based aggregate variables to proxy for socioeconomic group: Evidence from National Samples. Am J Epidemiol. 1998;148:475–486. [DOI] [PubMed] [Google Scholar]
- 24.Hyland A, Cummings KM, Lynn WR, Corle D, Giffen CA. Effect of proxy-reported smoking status on population estimates of smoking prevalence. Am J Epidemiol. 1997;145:746–751. [DOI] [PubMed] [Google Scholar]
- 25.McDonald JF. The use of proxy variables in housing price analysis. J Urban Econ. 1980;7:75–83. [Google Scholar]
- 26.Montgomery MR, Gragnolati M, Burke KA, Paredes E. Measuring living standards with proxy variables. Demography. 2000;37:155–174. [PubMed] [Google Scholar]
- 27.Root M. The use of race in medicine as a proxy for genetic differences. Philos Sci. 2003;70:1173–1183. [DOI] [PubMed] [Google Scholar]
- 28.Kim Y, Steiner PM. Gain scores revisited: a graphical models perspective. Sociol Methods Res. 2019;50(3):1353–1375. [Google Scholar]
- 29.Steiner PM, Kim Y. The mechanics of omitted variable bias: bias amplification and cancellation of offsetting biases. J Causal Inference. 2016;4(2):1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Miao W, Geng Z, Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105:987–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lubotsky D, Wittenberg M. Interpretation of Regressions with Multiple Proxies p. 29. [Google Scholar]
- 32.Kuroki M. Measurement Bias and Effect Restoration in Causal In-ference p. 25. [Google Scholar]
- 33.Fewell Z, Davey Smith G, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007;166:646–655. [DOI] [PubMed] [Google Scholar]
- 34.Schuster NA, Twisk JWR, ter Riet G, Heymans MW, Rijnhart JJM. Noncollapsibility and its role in quantifying confounding bias in logistic regression. BMC Med Res Methodol. 2021;21:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int Stat Rev/Revue Internationale de Stat. 1991;59:227. [Google Scholar]
- 36.Causality Pearl J.. Cambridge University Press; 2009. [Google Scholar]
- 37.Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1984;71(1):1–10. ZSCC: 0001138 Publisher: [Oxford University Press, Biometrika Trust]. [Google Scholar]
- 38.Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379. [DOI] [PubMed] [Google Scholar]
- 39.Mood C. Logistic regression: why we cannot do what we think we can do, and what we can do about it. Eur Sociol Rev. 2010;26:67–82. [Google Scholar]
- 40.Guo J, Geng Z. Collapsibility of logistic regression coefficients. J R Stat Soc B Methodol. 1995;57:263–267. [Google Scholar]
- 41.Cantis S, Oliveri AM. Classification, Clustering, and Data Mining Applications. In: Banks D, McMorris FR, Arabie P, Gaul W, eds. An Overview of Collapsibility. Springer Berlin Heidelberg; 2004:587–596. [Google Scholar]
- 42.Pordes R, Petravick D, Kramer B, et al. The open science grid. J Phys Conf Ser. 2007;78:012057. [Google Scholar]
- 43.Sfiligoi I, Bradley DC, Holzman B, Mhashilkar P, Padhi S, Würthwein F. The Pilot Way to Grid Resources Using glideinWMS. IEEE Computer Society; 2009:428–432. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The dataset used in the application is not available for data sharing.
