Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: Pharm Stat. 2023 Jul 16;22(6):995–1015. doi: 10.1002/pst.2323

Inclusion of binary proxy variables in logistic regression improves treatment effect estimation in observational studies in the presence of binary unmeasured confounding variables

Cornelius Rosenbaum 1, Qingzhao Yu 1, Sarah Buzhardt 2, Elizabeth Sutton 3, Andrew G Chapple 4
PMCID: PMC11345871  NIHMSID: NIHMS2014961  PMID: 37986712

Abstract

We present a simulation study and application that shows inclusion of binary proxy variables related to binary unmeasured confounders improves the estimate of a related treatment effect in binary logistic regression. The simulation study included 60,000 randomly generated parameter scenarios of sample size 10,000 across six different simulation structures. We assessed bias by comparing the probability of finding the expected treatment effect relative to the modeled treatment effect with and without the proxy variable. Inclusion of a proxy variable in the logistic regression model significantly reduced the bias of the treatment or exposure effect when compared to logistic regression without the proxy variable. Including proxy variables in the logistic regression model improves the estimation of the treatment effect at weak, moderate, and strong association with unmeasured confounders and the outcome, treatment, or proxy variables. Comparative advantages held for weakly and strongly collapsible situations, as the number of unmeasured confounders increased, and as the number of proxy variables adjusted for increased.

Keywords: proxy variables, confounding adjustment, logistic regression

1 |. INTRODUCTION

The presence of unmeasured confounders (U)—variables outside the range of examination but causally linking treatment (T) and outcome (Y)—is common in observational data analysis. Without adjusting for unmeasured confounder in a regression of a treatment on an outcome, the estimated causal effect of T on Y can often result in increased bias. Adequate adjustment of confounding variables is critical to producing robust, unbiased causal effect estimates in observational study designs.14 Experimental or trial designs address bias due to confounders through random allocation of subjects to exposure conditions. A series of methods have been proposed to approximate the strength of evidence from this process in observational designs.513

With the advent of Big Data, large scale epidemiological studies can more easily occur due to electronic health care records. With the inclusion of larger cohorts and more variables confounding is an obvious issue.1416 Clinicians may plan ahead to adjust for certain confounders, but the collected data may lack several unmeasured confounding variables. Proxy variables, denoted S in our study, are correlated with the unmeasured confounding variables U but not directly with the treatment T or outcome Y. A proxy variable that is associated with U is a surrogate for this unmeasured confounder, so inclusion of S in a logistic regression model when U is not available conceivably would remove some confounding bias.

Wickens showed that under certain conditions asymptotic estimation of treatment/exposure effects T on Y improves by including continuous proxy variable S for a continuous U in the least squares regression model for predicting a normally distributed outcome Y.17 In modern large scale observational studies, U,S, and Y are likely binary outcomes, resulting in interpretations dependent upon causal risk ratios.18,19 For example, consider an observational study of how anti-depressants affect suicide risk among veterans with post-traumatic stress disorder. Gravlee et al. modeled the complex interplay between racial status and measured levels of biological stress.20 High stress, which may be unobservable, might increase the probability that a person would seek out anti-depressants–but also may increase the probability that a person commits suicide. Race may be a useful proxy variable in this case because many researchers have shown that African Americans harbor higher stress levels than that of other Americans.21,22 By adjusting for race, which may be associated with the unmeasured confounder high stress, the estimate of the treatment effect for anti-depressants might be more accurate than without adjustment.

While there has been plenty of academic work in the use of proxy variables due to their importance in econometrics, social sciences, finance, epidemiology, and medical sciences; very little work has been done to explore the role of proxy variables in logistic regression with binary explanatory variables.2327 Within the causal inference field, proxy variables have also been discussed and researched.2832 However, these papers do not discuss logistic regression and do not discuss binary explanatory variables. Many of these authors present a causal diagram similar to the one we present in Figure 1A. With this diagram, they do discuss the benefits and limits of the proxy variable. However, these discussions are limited to linear relationships and continuous variables. Kim and Steiner find similar results to the paper by Wickens. They both find asymptotically there is a benefit to always using the proxy.17 In contrast to Wickens, Kim and Steiner do discuss more theoretical results that show the limits of the usefulness of the proxy and potential pitfalls based on parameter relationships within their causal diagram.17,28

FIGURE 1.

FIGURE 1

Simulation structure(s) depicting true relationships between U,S,T, and Y, where (A) represents the structure without effect modifiers and (B) represents the structure with effect modifiers (represented by the dotted line). In both simulation structures, U and S are generated as a latent continuous variable Z from the multi-variety normal distribution parameterized by a vector of means, M, and a symmetric matrix of variances and covariances, Σ. This continuous variable Z is dichotomized according to the indicator variable I[Z>0] to reveal the binary variables U and S that are of importance. T is dependent only on U and is generated from the Bernoulli distribution with probability that is calculated from the anti-logit defined below the casual diagram in (A) and (B) and in Section 2.2. Y is dependent upon U and T and is generated from the bernoulli distribution with probability determined from the logit model specific to the chosen simulation structure.

In our study, the treatment (T), outcome (Y), unmeasured confounders (U), and associated proxy variables (S) are binary–while the logistic regression setting is explored. Contrasted with linear regression models, adequate adjustment of confounding through nested models is more difficult.3335 This difficulty stems from collapsibility, with respect to nested models in linear regression, or with non-collapsibility of the logistic regression models.34,35 In logistic regression models, non-collapsibility is often mistaken for bias.6,34 Collapsibility and the steps taken to adjust for this phenomenon in our study are more fully explained in Section 3.3.2.

The effectiveness of utilizing binary proxy variables in place of binary unmeasured confounder(s) in a logistic regression setting has yet to be explored. We seek to answer this question through an extensive simulation study, with 1000 of randomly generated simulation scenarios, and 1000 of simulation replications within each scenario. We randomly generated scenarios instead of displaying results of only a few to decide whether proxy inclusion is a good idea in general, and to avoid cherry-picking some scenarios that showed a benefit of proxy variables.

In this paper, we examine the effect of adjusting for 1, 2, and 3 proxy variables on the estimation of the true treatment effect. 10,000 randomly generated scenarios were constructed under six sets of scenario structures–defined by the number of unmeasured confounders (1, 3, or 5), and whether or not the unmeasured confounder is also an effect modifier of the TY relationship. For each randomly generated scenario, a 1000 datasets are generated and models with and without proxy variables are fit. We average operating characteristics over the dataset replications for each model fit, and compare the average model performance across the 10,000 randomly generated scenarios.

The rest of this paper is outlined as follows. In Section 2, we describe the simulation study setup under various scenario types, and how we randomly generated these parameters. We also list out the formal proxy and marginal treatment effect models that we will compare. In Section 3, we will describe the operating characteristics used to assess whether proxy variables should be used, and parameters of interest to explore their effects on this performance. Section 4 will describe the simulation study results and in Section 5 we include an application from Woman’s Hospital of Baton Rouge, Louisiana. This application investigates infection risk in pregnant women who received a post-operative wound dressing called negative pressure wound therapy (NPWT) or a standard dressing type. Here high body mass index >40kg/m2 was a known binary confounder with diabetic status known in the literature as a proxy variable for this confounder. We close the paper with a discussion in Section 6.

2 |. SIMULATION STUDY SETUP

To determine when proxy variables are useful to include in multivariable logistic regression models, we first define the true data generation process. Recall that Y is a binary outcome variable, T is a binary treatment indicator, U is a length G binary vector of unmeasured confounders where G[1,3,5], and S is length 3 vector of binary proxy variables associated with U.

Our simulation study includes six different simulation scenario structures and within each simulation structure 10,000 different scenarios are randomly generated to determine if the inclusion of S results in better treatment effect estimation in general. Three of the structures are determined by the number of unmeasured confounders in the scenario by setting G to either 1, 3, or 5. We double the three structures to six simulation structures by requiring the unmeasured confounder(s) to be an effect modifier. Descriptions of the true models assumed and how parameters are generated proceeds in the following subsections.

2.1 |. Treatment and outcome models

In order for U to be a vector of unmeasured confounders, we must have that U is related to both T and Y as depicted in Figure 1A,B. Further, we assume that a vector of proxy variables S is unrelated to Y,TU. From these fact, we get the following true logistic regression formulations:

logit(P[T=1U,θ0T,θUT])=θ0T+g=1GUgθUgT (1)
logit(P[Y=1U,T,θ0Y,θUY,θU,TY,θTY])=θ0Y+TθTY+g=1GUgθUgY+Tg=1GUgθUg,TY (2)

In Equation 2, when all entries θU,TY0, then the unmeasured confounding vector U modifies the effect of the binary treatment T on Y as shown in Figure 1B. That is, U can both be an unmeasured confounder and an effect modifier. We will investigate settings with G[1,3,5] unmeasured confounders with effect modifiers and without effect modifiers.3,36 Note that Equations (1 and 2) implies that conditional on U,S is independent of both treatment T and outcome Y, which is the definition of a set of proxy variables.

To examine the benefits of using proxy variables, S, under a myriad of different scenarios, we generated 10,000 random parameters θ0T,θUT,θ0Y,θUY,θU,TY,θTY, which quantify the relationships between binary (U,T,Y) for the six different simulations structures as described and depicted in Figure 1A,B. These randomly generated parameters are drawn from a uniform distribution with a upper and lower bound of −5 and +5.

For each replicated dataset, data for the N=10,000 patients is generated sequentially by first generating U, which will be discussed in the next section. Then we draw T from a Bernoulli distribution with probability equal to the antilogit of 1 conditional on U. Finally, we draw Y from a Bernoulli distribution with probability equal to the antilogit of 2 conditional on both U and T.

2.2 |. Relationship between S and U

The multivariate normal distribution was leveraged to generate binary and correlated (U,S). In our study, we randomly generated (U,S) from a latent continuous vector Z in the following manner

(S,U)=I[Z>0]withZMVNG+3(Φ1(Π),Σ),

where I[Z>0] is the element-wise indicator of whether each entry of Z is greater than 0 and Φ-1() is the inverse cumulative distribution function of the normal distribution. Π is a length G+3 vector, which represents the marginal probabilities of each binary outcome, obtained by drawing independent samples from a uniform distribution with lower bound 0 and upper bound 1.

Next, we construct the covariance–variance matrix of the multivariate normal distribution, Σ, to induce correlation between the proxy variables S and the unmeasured confounders U. In our study, the variances are set to be 1 and therefore the covariances are just the correlations. The specific correlations between any U and S denoted ΣUg,Sk, where g{1,3,5} and k{1,2,3}, is of interest to our study and their usefulness is described later, in Section 3.3.1.

Our covariance-variance matrices are generated from a wishart distribution, ΣW(G+3,V), where V a G+3 by G+3 scale matrix where the diagonals are of value 1G+3 and the off diagonals are 0. The sampled symmetric, positive-definite matrices from the wishart distribution then have their variances set to 1, the positive-definite property of said matrix is checked again, and then saved as Σ if it is still positive-definite in nature. Finally, with binary U data, we can now generate binary T and Y data through the true treatment and outcome models described previously.

2.3 |. Separation issues in simulation structures

Prior to fitting the models described in later Sections in 3.1, we check the plausibility of each simulation scenario through the use of contingency tables to determine if there are complete separation and quasi-separation issues. Complete separation is when a set of predictors has no overlap in predicting the outcome, and quasi-separation occurs when there is a small overlap in the set of predictors when predicting outcome.37 Separation can occur when the data is small in number and the probability of having a covariate perfectly predict the outcome is non negligible. When using many covariates, the data is effectively chopped into many small pieces per covariate, which may cause a non-negligible probability of perfect prediction of the outcome for one of the pieces.37,38

These types of separation are easily seen through a contingency table of each binary covariate (T,U,S) and the outcome Y and checking for cells with 0 counts. If for any simulation replication separation occurred (indicated by a 0 in the contingency tables), we deemed these scenarios to be pathological and removed these scenarios from consideration. We could confidently say that this was due to pathological scenarios, instead of a small sample size. Failure to remove even one generated dataset with separation could lead to averages over the simulation replications that do not reflect the actual model performance. Additionally, we removed any scenarios that resulted in no variability for any of the entries (Y,T,U,S).

2.4 |. A graphical depiction of the simulation structure

A graphical representation of the entire data-generation structure is shown in Figure 1A,B, with structure B indicating the effect modifier setting. As a reminder to the reader, we generate the parameters in the following manner:

  1. θ0T,θUT,θ0Y,θUY,θU,TY,θTY are generated from a uniform distribution with bounds −5 and 5.

  2. Π used in the multivariate mean for the latent relationship for (U,S) via Z is generated from a uniform distribution with the bounds 0 and 1. These are the marginal probabilities of the binary vector (U,S).

  3. Σ is generated from a Wishart distribution with G+3 degrees of freedom and a diagonal mean matrix with entries 1/(G+3). The variances are scaled to 1 to make off-diagonal matrices correlations–with this process being repeated until Σ becomes positive definite.

  4. When generating random datasets based on a given scenario, that is, θ0T,θUT,θ0Y,θUY,θU,TY,θTY,Π,Σ, if any datasets result in separation this scenario is removed from consideration. Separation here is indicated by presence of zeros in entries contingency tables of pairwise relationships between entries of (S,U,T,Y).

Each simulation structure pair, that is the effect modifier and non-effect modifier scenario for the same value of G[1,3,5], share the same randomly generated parameters shown in Figure 1A,B. These parameters are generated 10,000 times giving us 10,000 random and different simulations scenarios for each of the simulation structure pairs. From each randomly generated simulation scenario, a dataset of size 10,000 is generated according the structures in Figure 1A or Figure 1B, which was chosen to emulate larger observational datasets that often have unspecified or unknown confounders. Finally, these simulation scenarios are analyzed through five different models: the true model unobserved by the researcher which includes the unmeasured confounder(s), a model with one proxy variable, a model with two proxy variables, a model with three proxy variables, and a model with only the treatment variable. In the preceding sections, we describe in more detail the model fitting process.

3 |. MODEL FITTING, OPERATING CHARACTERISTICS, AND PARAMETERS OF INTEREST

Here we describe how to assess effectiveness of the models with and without a proxy at estimating the true treatment effect, θTY, in Equation 2. We also describe quantities that will be useful in exploring when regression models including a proxy variable improve estimation compared to models that do not include them.

3.1 |. Model fitting

The purpose of this study is to measure and evaluate the performance of a vector of proxy variables, S, to stand in place of a vector of unobserved confounders, U, that we cannot observe. In order to remove the confounding effect of U, we must establish new relationships that includes the proxy variables S. Formally, we define and a binary outcome Y modeled by proxy variable(s) as follows:

logit(P[Y=1T,S,θ0Y,θTY,θSY])=θ0Y+TθTY+k=1KSkθSY;whereK{1,2,3} (3)

In this paper, we refer to the proxy model with one proxy as mS1 (i.e., K=1), the proxy model with two proxies as mS2 (i.e., K=2), and the proxy model with three proxies as mS3 (i.e., K=3). The goal of fitting any model shown in equation 3 is to obtain a sufficient θ^TY, the estimated treatment effect, while removing the effect of U through inclusion of proxy variables S. We compare this to a model, denoted mN, that models the binary outcome Y without adjusting for the proxy variables S as:

logit(P[Y=1T,θ0Y,θTY])=θ0Y+TθTY (4)

Here the estimated θ^TY under model (4) is not adjusted by the unmeasured confounding vector U in any way. The estimated θ^TY under Equation (3) attempts to estimate the adjusted relationship in (2) through the relationship between the proxy variable S and U, established in the previous section.

Lastly, we examine the performance of the true model (2) in estimating the true adjusted treatment effect θTY in (2), denoting this model as mU. We do this to see if inclusion of S in a logistic regression model performs as well as inclusion of the correct U. However, we should note that the rationale for using proxy variables S is that researchers have already not collected information on U–so proxy inclusion represents a partial remedy to this situation.

3.2 |. Operating characteristics

We first acknowledge that neither the proxy or no-proxy models can accurately estimate θTY, unless the relationship between (Y,T,U) is collapsible. Therefore traditional measures like bias are not meaningful, as the proxy, no-proxy, and true confounder models are not directly comparable. Acknowledging this limitation, we attempt to make each model comparable by normalizing by the estimated standard error S^Eθ^TY from the true model represented in Equation (2). This attempts to mitigate the spread of error variance between the terms θ0Y,θTY,θUY for the true model, θ0Y,θTY for the no-proxy model, and θ0Y,θTY,θSY for the proxy model. This issue is unique to logistic regression, compared to typical linear regression, where differences in model choices lead to changes in the error variance, but not necessarily estimated coefficient standard error. Formally, for each simulated data set, we compute:

BU=θTYS^E(θ^TY)under(2)BS=θ^TYS^E(θ^TY)under(3)BN=θ^TYS^E(θ^TY)under(4)

The quantities BU,BN,BS are calculated for every simulation replication, with S^Eθ^TY using the true outcome model 2 estimated by fitting the true model with all confounders U. This makes the quantities BU,BN,BS more comparable than the estimated θ^TY values to the true values of θTY found in Equation 2. For each simulation replication, we compute

BD=|BNBU||BSBU|

which essentially approximates the increase in bias from omitting proxy variables while accounting for the difference in error rates. After completing 1000 replications of a single simulation structure scenario, the average value of BD across these simulations, denoted BD is computed. Positive values of BD indicate that for a specific randomly generated scenario, including a proxy variable on average, should improve inference on the true marginal treatment effect.

After obtaining BD values for each randomly generated scenario, we use the following operating characteristics to determine overall if proxy models perform best, and the specific scenario types where proxy models perform differently from overall trends. We investigate proxy performance within various scenario subsets by investigating EBD, the average value of BD in that subset, SDBD, the standard deviation of BD, and PBD>0, which is the proportion of times within that scenario subset where proxy variables showed promise in terms of BD. The value PBD>0 represents the likelihood that proxy variables improve treatment discovery within various true scenario subsets.

3.3 |. Parameters of interest

For the remainder of this section, we describe different scenario characteristics subsets of interest for exploring proxy variable performance, and how each is computed explicitly. We create two measures for describing these important characteristics. The first are effect sizes which we define to be the strength of the relationship between any variable (S,T,Y) and the unmeasured confounders, U. The effect sizes simply quantify the degree to which any one variable is related to the unmeasured confounder through the random, true coefficients established in the previous section. The second is collapsibility, denoted C, which measures the degree of collapsibility of a given simulation scenario’s randomly generated parameters in a logistic regression.

3.3.1 |. Effect sizes: ΔT,ΔY,ΔS

First we define the effect of the unmeasured confounders on T as ΔT and the effect of the unmeasured confounders on Y as ΔY. These come directly from 1 as

ΔT=1G|θUgT|andΔY=1G|θUgY|+|θUg,TY|P[T=1],

where P[T=1] is obtained by marginalizing out U from the joint distribution (T,U). Formally, this is P[T=1]=UP[T=1U]P[U].P[U] can be obtained directly from the multivariate latent parameterization on (S,U). Here increased values of ΔT indicate a larger effect of the unmeasured confounders on treatment decision, while larger values of ΔY indicate a larger effect of the unmeasured confounder on outcome. For the effect modifier simulation scenarios, we add |θUg,TY|P[T=1] since given T=1, these can be considered as direct effects of U on Y. This term is down-weighted by P[T=1] since these terms only modify outcome for patients who received the treatment.

For the effect of U on S, we define three effect sizes deemed ΔS1,ΔS2, and ΔS3, and symbolically we calculate them as follows:

ΔSk=1G+Kg=1Gk=1K|ΣUg,Sk|,

which is the average magnitude of the correlations between Sk for all k<K and U, and represents an overall strength of the relationship between U and the first K entries of S. We divide by G+K here so that ΔS1,ΔS2, and ΔS3 are all on the same scale, regardless of the number of unmeasured confounders G. We assume that as ΔSk increases, so does the relative performance of the proxy model compared to the non-proxy mode. The rationale is that as ΔSk increases, S becomes more highly correlated with U and can almost serve as a stand in for U in the model (2).

3.3.2 |. Collapsibility coefficient, C

Collapsibility/Non-Collapsibility can be explained in two manners. Both Mood and Schuster et al explain the effect of non-collapsibility in logistic regression through the framework of developing the logistic regression model from an unobserved latent variable perspective.34,39 That is, instead of utilizing the link approach in generalized linear models, we build the logistic regression model directly from the logistic distribution. In this manner, the often used nested model approach to understand confounding in linear regression may be misleading because of how model variance is handled compared to logistic regression when explanatory variables are added to the models. When additional explanatory variables are added to a logistic regression model, both the explained variance and the total variance increase as the unexplained variance is set at a fixed value, while in linear regression models this causes the explained variance to increase and the unexplained variance to decrease by the same amount to keep the variance constant.

The second way to view collapsibility/non-collapsibility is through contingency tables, which our study directly relates to since we are modeling only binary variables. The effects of collapsing multi-dimensional contingency tables are well researched and understood.40,41 Within the contingency table framework, researchers have often defined differing degrees of collapsibility. Depending on how this measure behaves across the strata of a contingency table compared to the marginal contingency tables, this measure is defined as strongly collapsible, partially collapsible, or not collapsible (non-collapsibility).40,41

Therefore in our study, we have to take into account collapsibility when comparing across models to ensure fair and accurate comparisons. We devised two methods to attempt to account for and measure the collapsibility or lack thereof in our logistic regression models. We standardized our coefficients by the scaling, or standard error, estimate from the true model for each randomly generated data set. We also devised a measure, C, the collapsibility coefficient to relate the degree of collapsibility in our models, see Section 3.3.2. With this measure, we have a better understanding of the mixing of bias and non-collapsibility in our logistic regression models.

Here we define a measure of collapsibility in these logistic regression models in an attempt to determine how collapsibility affects the proxy model’s relative performance. A logistic regression model is said to be collapsible over U if the intercept and treatment effects within various stratum defined by U is equal to a marginal logistic regression model for YT while ignoring U. Guo and Geng, define two scenarios for collapsibility with respect to logistic regression, which we attempt to leverage here. They define simple collapsibility as an event where the coefficients of a logistic regression conditional on the covariate (or confounder) equal the marginal coefficient, or when the covariate (confounder) is removed from the equation. Further, they define stronger collapsibility if the coefficients of a logistic regression remain unchanged now matter how the levels of a covariate (confounder) are pooled.40

To assess collapsibility, we create a measure C, which assesses the average distance between unmeasured confounder stratum specific treatment and outcome relationships and marginal treatment outcome relationships. First, we need to create the true marginal logistic regression model P[YT], which we can obtain from the relationship:

P[YT]=P[Y,T]P[T]=ΣUP[YT,U]P[TU]P[U]ΣUP[TU]P[U].

All of these quantities are known from our simulation setup via Equation (1) and (2) for the latent multivariate normal distribution for (S,U). Once we obtain this, we solve for the true marginal relationship defined as logit(P[Y=1T])=β0*+β1*T. This is done by setting β0*=logit(P[Y=1T=0]) and β1*=logit(P[Y=1T=1])-β0*. Note that these quantities by construction of P[Y=1T] depend on the relationship between T and U and between Y and (T,U), including effect modifiers. Using the true model 2, we determine the level of collapsibility via:

C=1Gg=1G|θ0Y+θUgYβ0*|+|θTY+θUg,TYβ1*|.

Higher values indicate that scenario is less collapsible. A case where C=0 is entirely collapsible, but is unlikely to occur here, so we focus on values of C0.

4 |. RESULTS

In this section, we describe the results of the simulation study. In these results, we are directly comparing the models with proxy variables mS1,mS2,mS3 and the model without any proxy variables mN. We will mostly focus on mS1 when comparing results to mN, but list all comparative results in the supplement. Additionally, we include the results for the true model that is unobtainable or unobservable by the researcher, mU.

We first examine overall results and then delve into the effects of the parameters of interest C,ΔSk,ΔY, and ΔT have on the comparative performance of the proxy and no-proxy models. Whether in tabular format or visual format, we utilize three metrics that were described in previous sections. These are the expected difference, EBD, the standard deviation of the difference, SDBD, and the probability that the difference is greater than zero, PBD>0. When the EBD is greater than zero then mSk is favored, and when the PBD>0 is greater than .5 then mSk is also favored.

4.1 |. Overall results

The first column, labeled “Overall,” in Table 1 displays the overall results of our simulation without any conditioning by parameter of interests. The “all scenario” rows contain results from 30,000 simulation scenarios, as they encompass three simulation structures and therefore each individual simulation structure has 10,000 randomly generated scenarios. In this column, we see that EBD is never negative nor is PBD>0 below .5. Adding an effect modifier suppresses these results some as shown by the smaller EBD and PBD>0 compared to the non-effect modifier simulation structure pair and also increases the SDBD. Within a subsection of proxy models, we see that increasing simulation structure complexity, or G the number of unmeasured confounders, results in a decrease in EBD and PBD>0. Additionally, increasing the number of proxy variables, K, in the model also improves EBD and PBD>0, but at the expense of increasing SDBD, which indicates increased variability in the comparative fit to the model without proxies.

TABLE 1.

A table of summary statistics EBD,SDBD, and PBD>0 comparing the proxy models (mS1,mS2,mS3 and the true model mU to the model with only a treatment effect mN. These results cover both the unconditional, or overall results, and results conditioned on strongly collapsible or weakly collapsible scenarios. We defined strongly collapsible to be smaller the 20th percentile and weakly collapsible to be larger than the 80th percentile in terms of the collapsible coefficient C.

Overall
Strongly collapsible
Weakly collapsible
Scenario EBD (SDBD) PBD>0 EBD (SDBD) PBD>0 EBD (SDBD) PBD>0
Model with unmeasured confounder(s)
All scenarios with no effect modifiers  8.561 (8.704) 0.977 4.534 (5.812) 0.942 13.780 (10.565) 0.998
All scenarios with effect Modifiers  4.018 (10.982) 0.673 3.158 (9.030) 0.659  5.696 (12.286) 0.713
1 unmeasured confounder  5.730 (7.410) 0.963 1.288 (2.548) 0.886 10.894 (9.529) 0.998
3 unmeasured confounders  9.682 (8.953) 0.985 5.497 (5.878) 0.969 15.398 (10.865) 0.999
5 unmeasured confounders 10.271 (8.946) 0.985 6.817 (6.605) 0.971 15.049 (10.657) 0.998
1 unmeasured confounder + effect modifiers  3.709 (8.749) 0.736 1.860 (5.286) 0.665  5.744 (9.793) 0.826
3 unmeasured confounders + effect modifiers  4.508 (11.811) 0.662 3.398 (9.798) 0.650  7.241 (13.335) 0.710
5 unmeasured confounders + effect modifiers  3.838 (12.059) 0.620 4.217 (10.858) 0.653  4.103 (13.203) 0.605

Model with one proxy
All scenarios with no effect modifiers  0.172 (0.600) 0.730 0.099 (0.486) 0.701  0.262 (0.745) 0.756
All scenarios with effect Modifiers  0.121 (0.646) 0.642 0.087 (0.531) 0.625  0.164 (0.707) 0.679
1 unmeasured confounder  0.178 (0.536) 0.891 0.045 (0.202) 0.805  0.328 (0.817) 0.922
3 unmeasured confounders  0.176 (0.620) 0.692 0.120 (0.547) 0.681  0.253 (0.744) 0.708
5 unmeasured confounders  0.161 (0.639) 0.618 0.132 (0.604) 0.617  0.204 (0.661) 0.638
1 unmeasured confounder + effect modifiers  0.128 (0.590) 0.726 0.052 (0.306) 0.665  0.193 (0.737) 0.796
3 unmeasured confounders + effect modifiers  0.134 (0.657) 0.619 0.107 (0.572) 0.622  0.181 (0.685) 0.659
5 unmeasured confounders + effect modifiers  0.102 (0.686) 0.580 0.103 (0.652) 0.589  0.117 (0.695) 0.582

Model with two proxies
All scenarios with no effect modifiers  0.338 (0.881) 0.774 0.180 (0.658) 0.736  0.536 (1.122) 0.817
All scenarios with effect Modifiers  0.229 (0.960) 0.663 0.152 (0.765) 0.639  0.318 (1.073) 0.704
1 unmeasured confounder  0.345 (0.804) 0.921 0.088 (0.298) 0.842  0.652 (1.247) 0.943
3 unmeasured confounders  0.353 (0.907) 0.729 0.219 (0.748) 0.707  0.524 (1.075) 0.787
5 unmeasured confounders  0.317 (0.928) 0.671 0.233 (0.799) 0.660  0.433 (1.021) 0.720
1 unmeasured confounder + effect modifiers  0.243 (0.870) 0.744 0.109 (0.447) 0.678  0.370 (1.085) 0.811
3 unmeasured confounders + effect modifiers  0.248 (1.005) 0.638 0.167 (0.851) 0.625  0.344 (1.058) 0.689
5 unmeasured confounders + effect modifiers  0.195 (0.999) 0.606 0.178 (0.912) 0.615  0.239 (1.071) 0.612

Model with three proxies
All scenarios with no effect modifiers  0.499 (1.105) 0.791 0.269 (0.791) 0.753  0.795 (1.410) 0.838
All scenarios with effect Modifiers  0.333 (1.211) 0.670 0.228 (0.962) 0.642  0.450 (1.332) 0.715
1 unmeasured confounder  0.502 (1.005) 0.925 0.122 (0.338) 0.853  0.937 (1.515) 0.945
3 unmeasured confounders  0.540 (1.163) 0.748 0.350 (0.900) 0.737  0.810 (1.430) 0.819
5 unmeasured confounders  0.455 (1.139) 0.691 0.335 (0.961) 0.669  0.638 (1.256) 0.751
1 unmeasured confounder + effect modifiers  0.355 (1.112) 0.746 0.157 (0.606) 0.678  0.523 (1.366) 0.812
3 unmeasured confounders + effect modifiers  0.367 (1.294) 0.652 0.256 (1.089) 0.639  0.514 (1.341) 0.707
5 unmeasured confounders + effect modifiers  0.277 (1.220) 0.612 0.272 (1.103) 0.610  0.344 (1.282) 0.626

The first subsection in Table 1 represents the results using the unmeasured confounder(s), or mU. We see that including the unmeasured confounders in our model results in the largest EBD and PBD>0. These results are to be expected as models that include the unmeasured confounder(s) more closely resemble the true data generating process, but these models are not available to the researcher in this study. Regardless of which model is used, mSk, we see that the proxy model better estimates θTY, the true treatment effect, better than mN, the model without the proxy, in the overall aggregated results.

The other two columns in Table 1 are conditioned on a parameter of interest C, our measure of collapsibility. After conditioning on C, the rows in these columns contain 2000 random scenarios as strongly collapsible and weakly collapsible encompass the bottom 20% and top 20% percentiles, respectively. Most of the trends discussed previously are conserved when conditioning on either strongly or weakly collapsible. However, there are differences. “Strongly” collapsible scenarios have smaller comparative results using the proxy than the “overall” results, while “weakly” collapsible scenarios show the strongest comparative advantage of using a proxy variable. This is expected, since collapsibility effectively measures the difference between the marginal treatment effects and those adjusted by U–with larger values indicating a larger effect of U, and hence a stronger rationale for use of S. In no scenario set from Table 1 does mSk perform worse than mN at estimating θTY as neither EBD below 0 nor PBD>0 below .5.

To further investigate our simulation study we partitioned the parameter space by the other measures described in Section 3 called effect sizes ΔSk,ΔT,ΔY. Remember, we define a given effect size as the strength of the relationship between U and another variable of interest (S,T,Y). Larger values mean a stronger relationship between U and that variable (S,T,Y) and smaller values mean a weaker relationship. In the context of our study, these effect sizes can be viewed as the confounder strength with respect to the relationship between U and T or Y (via ΔT,ΔY) and proxy strength (via ΔSk) with respect to the relationship between U and S. Figure 2 reveals how the estimation of θTY is effected by increasing ΔSk for each mSk.

FIGURE 2.

FIGURE 2

Scatterplot of EBD for each proxy model mS1,mS2,mS3 across ΔSk, where k is the number of proxy variables in the given model. Points of interest: Plot A at ΔSk<10th percentile EBD=0.00139,0.0151,0.0465 for mS1,mS2, and mS3, respectively. Plot B at ΔSk<10th percentile EBD=0.00164,0.0134,0.0376 for mS1,mS2, and mS3, respectively. From this plot, a model with any number of proxies better estimates θTY than mN even at very weak effect sizes (smaller than 10th percentile) on average.

Figure 2 displays EBD across the entire range of ΔSk for each mSk for each simulation structure. We see that increasing ΔSk, that is, the (S,U) association, for all mSk improves the estimation of θTY across all simulation structures. Figure 2A,B are the plots that have the results that are the closest to EBD=0, or parity between any mSk and mN. However, we see that even at the smallest percentile bin, 10th percentile, for Figure 2A,B that mSk still better estimates θTY. Moreover, referencing Figure S1, mSk better estimates θTY greater than 50% of the time across all simulation structures even at a weak proxy strength as the median value of BD is above 0. Figure S2S7 reveals similar marginal trends for both the mean and median of BD but with ΔY,ΔT, and C percentile bins.

4.2 |. Effect sizes: ΔS,ΔT,ΔY and collapsibility: C

mS1, offers the best, most intuitive insight into explaining and extrapolating the results of our overall study. Therefore, the remainder of the results section will focus on this model, k=1 or mS1. Table 2 focuses on the results for mS1 by conditioning solely on ΔS1,ΔT, or ΔY and joint conditioning ΔS1,C,ΔT,C,orΔY,C across all simulation structures. Like Table 1, the columns identified by weak and strong effect size scenarios and the strongly and weakly collapsible scenarios are the bottom 20% and top 20% percentiles, respectively.

TABLE 2.

A table containing the results of only mS1, the model with only one proxy variable, for our operating characteristics EBD,SDBD, and PBD>0, where in the table EBD is E,SDBD is (SD), and PBD is [P]. These results summarize every simulation structure scenario across either conditioning solely on ΔS1,ΔT, or ΔY, represented in the upper third of the table, or jointly conditioning on ΔS1,C,ΔT,C, or ΔY,C, represented in the lower two-thirds of the table. Each effect size can be conditioned by one of two different magnitudes shown as “weak” or “strong.” We define a weak effect size to be smaller than the 20th percentile and a strong effect size to be greater than the 80th percentile. Strongly and weakly collapsible are defined similarly with strongly collapsible being less than the 20th percentile and weakly collapsible to be greater than the 80th percentile across the randomly generated scenarios. This table shows a thorough investigation that reveals mS1 better estimates θTY than mN across a range of effect sizes.

Subsetted by ΔS1
Subsetted by ΔT
Subsetted by ΔY
Weak Strong Weak Strong Weak Strong
E (SD) [P] E (SD) [P] E (SD) [P] E (SD) [P] E (SD) [P] E (SD) [P]
Scenario
All scenarios with no effect modifiers 0.024 (0.106) [0.685] 0.447 (1.085) [0.753] 0.111 (0.420) [0.712] 0.215 (0.735) [0.732] 0.119 (0.431) [0.727] 0.212 (0.711) [0.727]
All scenarios with effect modifiers 0.015 (0.115) [0.606] 0.327 (1.205) [0.658] 0.054 (0.444) [0.579] 0.166 (0.772) [0.682] 0.076 (0.486) [0.630] 0.140 (0.675) [0.640]
1 unmeasured confounder 0.004 (0.007) [0.762] 0.599 (1.056) [0.932] 0.050 (0.183) [0.804] 0.284 (0.789) [0.925] 0.057 (0.218) [0.821] 0.294 (0.774) [0.913]
3 unmeasured confounders 0.030 (0.096) [0.694] 0.402 (1.115) [0.683] 0.137 (0.442) [0.702] 0.186 (0.714) [0.670] 0.151 (0.468) [0.720] 0.183 (0.692) [0.669]
5 unmeasured confounders 0.039 (0.155) [0.600] 0.339 (1.067) [0.646] 0.145 (0.542) [0.630] 0.175 (0.694) [0.600] 0.150 (0.535) [0.641] 0.158 (0.655) [0.599]
1 unmeasured confounder + effect modifiers 0.003 (0.009) [0.654] 0.424 (1.226) [0.740] 0.021 (0.215) [0.591] 0.231 (0.862) [0.835] 0.055 (0.267) [0.691] 0.186 (0.716) [0.737]
3 unmeasured confounders + effect modifiers 0.021 (0.101) [0.601] 0.353 (1.189) [0.646] 0.069 (0.457) [0.584] 0.150 (0.735) [0.635] 0.100 (0.515) [0.630] 0.145 (0.667) [0.610]
5 unmeasured confounders + effect modifiers 0.019 (0.170) [0.563] 0.205 (1.190) [0.588] 0.070 (0.579) [0.563] 0.118 (0.705) [0.578] 0.071 (0.608) [0.569] 0.090 (0.637) [0.575]

Strongly collapsible
All scenarios with no effect modifiers 0.018 (0.093) [0.636] 0.228 (0.863) [0.734] 0.061 (0.348) [0.687] 0.151 (0.629) [0.698] 0.100 (0.413) [0.726] 0.126 (0.747) [0.672]
All scenarios with effect modifiers 0.011 (0.116) [0.580] 0.193 (0.941) [0.626] 0.048 (0.378) [0.571] 0.119 (0.573) [0.665] 0.066 (0.450) [0.637] 0.083 (0.524) [0.631]
1 unmeasured confounder 0.001 (0.004) [0.619] 0.150 (0.425) [0.885] 0.006 (0.024) [0.699] 0.097 (0.380) [0.869] 0.046 (0.213) [0.797] 0.118 (0.499) [0.818]
3 unmeasured confounders 0.025 (0.087) [0.694] 0.284 (1.019) [0.666] 0.083 (0.261) [0.718] 0.161 (0.729) [0.651] 0.124 (0.441) [0.707] 0.092 (0.829) [0.649]
5 unmeasured confounders 0.028 (0.134) [0.597] 0.246 (0.994) [0.655] 0.112 (0.569) [0.640] 0.184 (0.669) [0.613] 0.151 (0.562) [0.646] 0.179 (0.741) [0.625]
1 unmeasured confounder + effect modifiers 0.002 (0.008) [0.603] 0.158 (0.628) [0.656] 1.6e−5 (0.107) [0.553] 0.123 (0.387) [0.792] 0.058 (0.282) [0.699] 0.041 (0.347) [0.664]
3 unmeasured confounders + effect modifiers 0.020 (0.083) [0.616] 0.270 (1.008) [0.632] 0.094 (0.461) [0.606] 0.101 (0.634) [0.621] 0.063 (0.403) [0.623] 0.124 (0.680) [0.609]
5 unmeasured confounders + effect modifiers 0.011 (0.176) [0.529] 0.143 (1.122) [0.588] 0.056 (0.464) [0.557] 0.135 (0.643) [0.596] 0.076 (0.618) [0.580] 0.085 (0.460) [0.615]

Weakly collapsible
All scenarios with no effect modifiers 0.031 (0.111) [0.726] 0.719 (1.380) [0.770] 0.176 (0.504) [0.736] 0.320 (0.912) [0.777] 0.150 (0.511) [0.633] 0.255 (0.769) [0.764]
All scenarios with effect modifiers 0.021 (0.118) [0.653] 0.476 (1.361) [0.691] 0.059 (0.452) [0.610] 0.212 (0.831) [0.729] 0.084 (0.564) [0.658] 0.171 (0.668) [0.672]
1 unmeasured confounder 0.005 (0.010) [0.837] 1.180 (1.571) [0.930] 0.080 (0.186) [0.842] 0.451 (1.052) [0.939] NA (NA) [NA] 0.346 (0.868) [0.926]
3 unmeasured confounders 0.039 (0.095) [0.712] 0.563 (1.340) [0.710] 0.241 (0.622) [0.739] 0.256 (0.877) [0.709] 0.088 (0.263) [0.647] 0.222 (0.732) [0.688]
5 unmeasured confounders 0.049 (0.164) [0.632] 0.428 (1.086) [0.674] 0.184 (0.525) [0.659] 0.207 (0.698) [0.626] 0.161 (0.545) [0.630] 0.151 (0.614) [0.599]
1 unmeasured confounder + effect modifiers 0.004 (0.009) [0.696] 0.692 (1.596) [0.817] 0.055 (0.250) [0.661] 0.289 (1.042) [0.882] 0.054 (0.102) [0.881] 0.209 (0.721) [0.766]
3 unmeasured confounders + effect modifiers 0.036 (0.103) [0.675] 0.484 (1.251) [0.684] 0.074 (0.474) [0.612] 0.192 (0.705) [0.671] 0.090 (0.332) [0.692] 0.172 (0.656) [0.643]
5 unmeasured confounders + effect modifiers 0.002 (0.175) [0.590] 0.283 (1.201) [0.588] 0.047 (0.556) [0.562] 0.140 (0.637) [0.606] 0.089 (0.737) [0.574] 0.121 (0.605) [0.584]

Table 2 continues many of the trends previously established by Table 1 and Figure 2. First, we concentrate on the first 8 rows, or those rows that are conditioning solely on a single effect size. At no point is PBD>0 below .5 and neither is EBD below zero. Simulation structures with effect modifiers have smaller PBD>0 and EBD than their non effect modifier pair. Further, increasing G increases SDBD. Like Figure 2, increasing an effect size increases EBD and PBD>0. However, when conditioning on weak effect sizes, we see that EBD and PBD>0 increase with G.

The bottom two-thirds of Table 2 consist of jointly conditioning on C and a given effect size ΔT,ΔY, or ΔS. The collapsibility results shown in Table 1 are echoed here. Conditioning on any weak effect size and strongly collapsible scenarios results in the smallest EBD and PBD>0 in the entirety of Table 2. There is a small increase in EBD and PBD>0 when comparing weak effect size strength and moving from strongly collapsible to weakly collapsible. The best result in the table occurs at a weakly collapsible scenario and strong ΔS1 for the simplest simulation structure with G=1 unmeasured confounder and no effect modifier: EBD=1.180 and PBD>0=.931. While the worst results lies at conditioning at weak ΔT and strongly collapsible for the simplest simulation structure with G=1 unmeasured confounder and no effect modifier: EBD=1.6e-6 and PBD>0=.553. However, even with EBD being close to zero and PBD>0 close to .5 at certain conditioning scenarios, Table 2 shows that mS1 better estimations θTY than mN at every conditioning scenario.

4.3 |. A closer look at G=1

Figure 3, a heatmap of EBD, and Figure 4, a heatmap of PBD>0, allow for a closer examination of the trends previously discussed under mS1 across the pairs of effect sizes ΔS,ΔT,ΔY and C. Marginally (on the top and right of each plot) each bin represents about 10% of the randomly generated scenarios, which are defined by the 0.1,0.2,,0.9 quantiles of each marginal measure. The interior squares contain scenarios that are jointly within the top and right effect size percentile groups, and do not necessarily contain .1×.1=.01 of the randomly generated scenarios.

FIGURE 3.

FIGURE 3

Heatmaps for mS1 from only the simulation scenario without effect modifiers (G=1). These heamaps depict EBD across the entire range of effect sizes and collapsibility, jointly (the heatmap) and marginally (the column vector to the right or row vector above). Each row or column represents one of the 10 percentile bins of each effect size. This figure shows in greater detail than any previous table or column that mS1 better estimates θTY than mN.

FIGURE 4.

FIGURE 4

Heatmaps for mS1 from only the simulation scenario without effect modifiers (G=1). These heamaps depict PBD>0 across the entire range of effect sizes and collapsibility, jointly (the heatmap) and marginally (the column vector to the right or row vector above). Each row or column represents one of the 10 percentile bins of each effect size. This figure shows in greater detail than any previous table or column that mS1 better estimates θTY than mN. Even at the smallest effect sizes, the PBD>0 is above .5.

Both Figure 3 and 4 reinforce previous findings. Across the entire heatmap for both figures, EBD is never below 0, and PBD>0 is never below .5. Visually, we see that increasing any effect size or C jointly or marginally results in increasing EBD and PBD>0. This is more visually obvious in Figure 3 than Figure 4, as the PBD>0 does not increase as smoothly. Figure 3DF and Figure 4DF depict the results of our collapsibility parameter both jointly and marginally with other effect sizes. At the smallest jointly conditioned bins, we see that EBD and PBD>0 are above 0 and .5. For plot D, EBDΔT,C[0.1,0.1]=9e-4 and PBDΔT,C[0.1,0.1]=.584. For plot E, EBDΔS1,C[0.1,0.1]=7e-4 and PBDΔS1,C[0.1,0.1]=.512. For plot F,EBDΔY,C[0.1,0.1]=0.0442 and PBDΔY,C[0.1,0.1]=.742. Therefore, even at very small values of C, or what would be strongly collapsible, mS1 still better estimates θTY than mN.

Figures S20 and S21 show the results for G=1 unmeasured confounders with effect modifiers for the statistics EBD and PBD>0 under mS1. These heatmaps reveal already discussed trends such as increasing effect sizes and C result in larger EBD,PBD>0, and small values of C, result in smaller EBD and PBD>0. These heatmaps also support previously mentioned variability in the results. Visually, the variability is revealed in the less striking trend of smooth color change found in Figure 3 and 4. In Figures S20 and S21, the color change is more abrupt from bin to bin. Additionally, these supplemental heatmaps have the first negative values for EBD and PBD>0 below .5, favoring mN over mS1. Nearly all of these values occur at small effect sizes or small values of C. Further, it is not a given that a negative EBD is paired with a PBD>0 below .5. Often times, they do not coincide at all.

4.4 |. A closer look at G=3,5

Figures S22S25 are the heatmaps for G=3 simulation structure for both effect modifier and non-effect modifier scenarios. Figures S22 and S23 contains results for the simulation structure with G=3 unmeasured confounders for EBD and PBD>0, respectively. Figures S24 and S25 contains results for the simulation structure with G=3 unmeasured confounders with effect modifiers for EBD and PBD>0, respectively. These heatmaps reveal similar results and trends that have been previously reported. That is, increasing an effect size or C increases both EBD and PBD>0. This is more obvious when looking at the marginal results instead of the joint heatmap. Nearly every where and at every effect size or C,mS1 better estimates θTY as there are very few negative values of EBD and very few PBD>0 below 0.5. For Figures S22 and S23, when either EBD and PBD>0 favor mN it is due to the number of scenarios at those values being very small. What is different from previous heatmaps is that the variability in the results is more obvious. Adding an effect modifier does not guarantee that EBD and PBD>0 decrease in value. Though, adding an effect modifier does seem to increase the number of EBD below 0 and PBD>0 below .5 compared to the non-effect modifier scenarios. However, these instances that favor mN are few in number and exhibit no real trend.

Figures S26S29 are the heat maps for G=5 simulation structure for both effect modifier and non-effect modifier scenarios. Figures S26 and S27 contains results for the simulation structure with G=5 unmeasured confounders for EBD and PBD>0, respectively. Figures S28 and S29 contains results for the simulation structure with G=5 unmeasured confounders with effect modifiers for EBD and PBD>0, respectively. Similar to prior results, these heatmaps show that increasing an effect size or C results in increasing EBD and PBD>0. However, this trend of increasing EBD and PBD>0 is more noisey and less obvious, especially for Figures S28 and S29 that introduce the effect modifier. Further, introducing the effect modifier results in EBD and PBD>0 being below 0 and .5 more often. When EBD is below 0 and PBD>0 is not guaranteed to be below .5. These occurrences that favor mN still have very little trend to them, but are more often than not associated with smaller values of an effect size or C.

5 |. APPLICATION

Utilizing a de-identified dataset from a retrospective cohort study conducted by Woman’s Hospital (Baton Rouge, Louisiana), we sought to determine the true treatment effect of the specific dressing, negative pressure wound therapy (NPWT), on the risk of infection. Clinicians at Woman’s hospital reported an increased likelihood of using NPWT if the patient had a BMI>40kg/m2. Since BMI>40kg/m2 is also associated with a higher risk of infection, it is a confounder for the NPWT treatment effect.

In the results reported below, we assumed that we did not have access to the variable BMI. Instead, we used diabetes as a proxy for BMI>40kg/m2, as diabetes is associated with high BMI. This was to properly mirror the simulation study and relationship structure described in previous sections and in Figure 1A. The notation used in previous sections continues in this section: S is proxy, Y is outcome, T is treatment, and U is the unmeasured confounder. Therefore, BMI>40kg/m2 can be thought of as the unmeasured confounder, U, that is correlated with the infection, Y, an outcome and NPWT, T, the treatment. We examine the estimated treatment effect for T in three different models: (1) with T only, (2) with T and S, and (3) with T and U.

The data analyzed from the retrospective cohort study had an N=10,517 data points, consistent with the simulation study presented in previous sections. In the data, the participants are either African American, NAA=4461, or White, NW=6056. The median and mean age of participants in the study is both 29 years old.

BMI>40kg/m2,U, is a known confounder for T and we can see that P^(T=1U=1)=.240 and P^(T=1U=0)=.020, representing a large shift in treatment assignment probability based on high BMI and an estimated effect size of .22. The data also shows that U is correlated with Y, infection, because those in the class BMI>40kg/m2 were twice as likely to get an infection, P^(Y=1U=1)=.048 compared to P^(Y=1U=0)=.025 and thus had an estimated effect size of .023. Finally, those in the class U were correlated with diabetes status, S, because P^(S=1U=1)=.250 and P^(S=1U=0)=.125, that is, the probability of being diabetic given BMI>40kg/m2 was twice the probability of being diabetic for patients who did not have a BMI>40kg/m2 and thus had an estimated effect size of =.125.

Table 3 shows the results of the three models using the application data. Since we have the unmeasured confounder, BMI>40kg/m2, we can run the true model, denoted mU. From the logistic regression of mU, we see that including the “unmeasured” confounder and the treatment, that the treatment effect estimate is θTY=.504 with 95% confidence intervals of [0.134,0.857].

TABLE 3.

Table of true model, proxy model, and no proxy model estimated treatment effects and confidence intervals.

Application results by model
Model Treatment effect 95% confidence interval
mT 0.504 [0.134,0.857]
mS 0.794 [0.452, 1.11]
mN 0.844 [0.507,1.15]

The logistic regression of mS, which included the proxy and the treatment, resulted in a treatment effect estimate θ^T,SY=0.794 with 95% confidence intervals of [0.452,1.11]. Last, the model with only the treatment, mN, resulted in a treatment effect estimate θ^T,NY=0.844, with 95% confidence interval of [0.507,1.15]. Here the proxy estimate effect compared to models without proxies is closer to the confounding adjusted estimate–which also holds for confidence intervals.

6 |. DISCUSSION

Focusing on the model mS1, and the simulation structure with G=1 unmeasured confounders without effect modifiers may best explain the results to our study. For one, we know that using the model mS1 best represents the truth in this scenario. That is, there is just one S, it is standing in for exactly one U, and there are no effect modifiers. While in other scenarios, any number of Ss are standing in for any number of U's and their also may exist effect modifiers. With this in mind, we can look back at the presented results and see that even conditioning on weak effect sizes and strongly collapsible scenarios reveals that mS1 still best estimates θTY. Jointly conditioning on both weak effect sizes and strongly collapsible scenarios, however, do reveal times where EBD approaches very close to 0 and PBD>0 approaches .5. Though, at no time is mS1 worse at estimating θTY.

The simulation scenario of G=1 with no effect modifiers under the model mS1 should be the most readily collapsible scenario (strongly collapsible). This is supported by the fact that EBD is closest to the null and PBD>0 closest to .5 when conditioning on strongly collapsible scenarios at weak effect sizes. Comparing jointly conditioned strongly collapsible and weak effect size scenarios to that of weakly collapsible scenarios and weak effect sizes reveals what is termed as non-collapsibility in the literature. This non-collapsibility is the small difference in EBD between weakly collapsible scenarios and strongly collapsible scenarios when at a weak effect sizes and can often look like bias. Therefore, collapsible scenarios should see the two models, mS1 and mN, move closer together in estimating power of θTY, which is exactly what we see in our results. Further, non-collapsible scenarios (weakly collapsible) would introduce an effect often misinterpreted as bias in favor of any model with the a proxy, also a result supported by our study. However, this difference between the collapsible and non-collapsible scenarios according to our parameter, C, is quite small. Revealing that collapsibility does have an effect in our scenarios but here at the most simplest simulation structure the effect is quite small. Therefore, non-collapsibility cannot account for all of the advantage or difference in results with respect to mS1 or any proxy model, mSK. Lastly, moving beyond G=1 in the collapsibility discussion, we see that the results for G=3 or 5 move further from the null. This is because these scenarios are difficult to collapse, especially when adding in effect modifiers.

The effect sizes ΔS,ΔT, and ΔY are also important to the performance of the proxy variable. Our results reveal that the stronger these relationship are the more easily the proxy variable can stand in for the unmeasured confounder, which should make intuitive sense. This is supported by the fact that at weak effect sizes, EBD is closest to the null and PBD>0 closest to .5. However, even at these small effect sizes mS1 is still better estimating θTY. Therefore, our results show that even a weak proxy standing in for a weak confounder is enough to improve the estimation of θTY.

From our results, it is hard to conclusively say beyond ΔS which effect size is most important. Given that confounding is present, our results do show that prioritizing the correlation between U and S results in the greatest improvement in estimating θTY. Beyond prioritizing ΔS, the answer is much less clear. Our results show that difference between the strength of UT and UY is negligible when focused on proxy variable performance. It is obvious that having a sufficient enough relationship between UT and UY is important as to ensure confounding exists. However, this confounding relationship described between UT and UY need not be overly strong, but can in fact be quite weak and still result in better estimation of θTY under mS1.

Moving beyond G=1 without effect modifiers to G=3 or G=5 with and without effect modifiers, reveals similar trends and results as previously discussed. That is, the sensitivities to effect size strength and collapsibility still exist at these other simulation scenarios. However, these more complex simulation scenarios reveal an increase in variability that is shown by the larger SDBD. This variability can be explained by the fact that the proxy variable is now having to stand in for more unmeasured confounders than before as well as causal mechanisms that it is not strongly correlated with. As we increase the count of unmeasured confounders, there is a larger chance that S is related to any U. Conversely, there is chance that the proxy variable is either only weakly related to any U, or only related to a subset of U’s. Further, the effect modifier is a causal mechanisms that S may not be strongly correlated with. This mechanism is difficult for the proxy variable to sufficiently stand in for in a model. The increased count of unmeasured confounders and the possibility of effect modifiers impart variability to our results and influence the larger standard deviations observed by these simulation scenarios. However, even at these more complex simulation scenarios mS1 better estimates θTY on average and in the long run as supported by EBD and PBD>0.

Increasing the number of proxy variables, K, available to the model, mSk, results in better estimation of θTY. The simple explanation here is that in having variables that are correlated with a single unmeasured confounder(s) means greater a chance that one or more of the proxy variables are an adequate stand in. The improvement is not linear from one proxy to two proxies to three. The results show nearly a doubling between one and two proxy variables but tapers off at three. Showing that there is an attenuation in the improvement as more and more proxy variables are added. However, there are drawbacks to adding more proxy variables. Adding more, according to our results, increase the variability in the comparative results. Meaning, there may be times that the proxy variables are not suitable stand ins for an unmeasured confounder(s) and, in fact, make your estimation of θTY worse.

The real-world application of proxy variables requires careful consideration and domain expertise to properly use to ensure the bias reducing effects are captured. As outlined in our study, one could utilize a proxy variable after the fact of data collection, in which control of confounding was not complete, poor, not done, or new confounders are possibly discovered. Unmeasured confounders are often noticed when our expectations of treatment effects are subverted. This is seen in our application. The clinician’s expectations of what the conclusion of the treatment effect would be was different than initial results. This subversion was caused by an unmeasured confounder. When this is noticed, a search for a proxy can occur that is a suitable stand in for the unmeasured confounder. The hard part is determining what possibly could be confounding the treatment and outcome and then finding a suitable substitute for the unmeasured confounder.

Our results definitively show that a single proxy variable, or even up to 3 proxy variables, in a simple logistic regression improves the estimation of a true adjusted treatment effect, θTY. Table 1 shows from an overall view that any mSk better estimates θTY than mN. These results stand to further scrutiny when various conditioning procedures are applied as shown in Table 2, Figures 24, as well as their analogs in the supplemental section. The times that any mSK is worse than mN at estimating θTY is few and far between. In the results reviewed in the main manuscript, no aggregated result shows mN better estimating θTY, and in the Supplemental material, mN is only better estimating θTY at extremely weak effect sizes, which also may be an artifact of the few number of scenarios in these cases. These occurrences of mN better estimation θTY do not happen often, have no real trend other than being at the extremes of effect sizes, and are not as large in magnitude compared to how much better msk estimates θTY at large effect sizes. Lastly, we showed that even when a simulation is nearly collapsible (Tables 1 and 2, Figures 3DF and 4DF, as well as numerous Supplemental Figure), any mSk is still the preferred model in the aggregate. These results hold over increasingly complex simulation scenarios and with the addition of effect modifiers.

7 |. CONCLUSION

In our study backed by extensive simulation studies, we found that inclusion of proxy variables in logistic regression models can improve (remove bias) the estimated treatment effects in retrospective studies when correlated with unmeasured confounders. Using one, two, or three proxy variables improves the estimation of the treatment effect compared to using no proxy variables. Increasing the number of proxy variables in the model improved bias linearly, which makes sense intuitively because three proxy variables should have a higher probability to be more correlated with the unmeasured confounder. We also constructed three measures that we called effect sizes that summarize the strength of the relationship between the variables of interest in our model. These effect sizes measured the strength of the collapsibility of the model and the relationships between (1) proxy variables and unmeasured confounders, (2) unmeasured confounders and the treatment, and (3) unmeasured confounders and the outcome. We found that at very small values of these effect sizes (essentially weak relationships) that proxy variables can provide an improved estimation of the true treatment effect when compared to a model without the proxy variable. This means if we know that the proxy variable has a relationship with the treatment variables in our model, we should be able to use them in such a manner to de-bias the estimated treatment effect. Finally, our simulation studies included treatment effect modifiers and different numbers of unmeasured confounders. The results described above hold in these settings, but the results have more variance due to the increasing complexity of the constructed simulation studies. In summary, we found that inclusion of proxy variables in logistic regression for retrospective studies, within the constraints of our simulation study, can improve the estimated treatment effect toward the true treatment effect.

Supplementary Material

supplementary material

ACKNOWLEDGMENTS

This research was done using resources provided by the Open Science Grid, which is supported by the National Science Foundation award #2030508.42,43

Footnotes

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

SUPPORTING INFORMATION

Additional supporting information can be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The dataset used in the application is not available for data sharing.

REFERENCES

  • 1.Drake C, Mcquarrie A. A note on the bias due to omitted confounders. Biometrika. 1995;82(3):633–641. [Google Scholar]
  • 2.Pearl J. Why there Is No Statistical Test for Confounding, why Many Think there Is, and why they Are Almost Right.” tech. rep. University California; 1998. [Google Scholar]
  • 3.Rothman KJ, Greenland S, Lash TL. Modern epidemiology. LWW; 3rd ed.; 2012. [Google Scholar]
  • 4.VanderWeele TJ, Shpitser I. On the definition of a confounder. Ann Statist. 2013;41:196–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Benedetto U, Head SJ, Angelini GD, Blackstone EH. Statistical primer: propensity score matching and its alternatives. Eur J Cardiothorac Surg. 2018;53:1112–1117. [DOI] [PubMed] [Google Scholar]
  • 6.Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Statist Sci. 1999;14(1):29–46. [Google Scholar]
  • 7.Haneuse S, Vanderweele TJ, Arterburn D. Using the E-Value to Assess the Potential sEffect of Unmeasured Confounding in Observational Studies Feb. 2019, Using the E-Value to Assess the Potential Effect of Unmeasured Confounding in Observational Studies. [DOI] [PubMed] [Google Scholar]
  • 8.Haukoos JS, Lewis RJ. The propensity score. JAMA. 2015;314:1637–1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17:360–372. [DOI] [PubMed] [Google Scholar]
  • 10.MacIejewski ML, Brookhart MA. Using instrumental variables to address bias from unobserved confounders. JAMA. 2019;321(21):2124–2125. [DOI] [PubMed] [Google Scholar]
  • 11.Rothman KJ. Epidemiology: an Introduction. Oxford University Press; 2002. [Google Scholar]
  • 12.Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15:291–303. [DOI] [PubMed] [Google Scholar]
  • 13.Stürmer T, Wyss R, Glynn RJ, Brookhart MA. Propensity scores for confounder adjustment when assessing the effects of medical interventions using nonexperimental study designs. J Intern Med. 2014;275:570–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Nørgaard M, Ehrenstein V, Vandenbroucke JP. Confounding in observational studies based on large health care databases: problems and potential solutions – a primer for the clinician. Clin Epidemiol. 2017;9:185–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bosco JL, Silliman RA, Thwin SS, et al. A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies. J Clin Epidemiol. 2010;63:64–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kahlert J, Gribsholt SB, Gammelager H, Dekkers OM, Luta G. Control of confounding in the analysis phase – an overview for clinicians. Clin Epidemiol. 2017;9:195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wickens MR. A note on the use of proxy variables. Econometrica. 1972;40:759. [Google Scholar]
  • 18.Chiba Y. Sensitivity analysis of unmeasured confounding for the causal risk ratio by applying marginal structural models. Commun Stat–Theory Methods. 2010;39:65–76. [Google Scholar]
  • 19.Kasza J, Wolfe R, Schuster T. Assessing the impact of unmeasured confounding for binary outcomes using confounding functions. Int J Epidemiol. 2017;46:1303–1311. [DOI] [PubMed] [Google Scholar]
  • 20.Gravlee CC. How race becomes biology: embodiment of social inequality. Am J Phys Anthropol. 2009;139:47–57. [DOI] [PubMed] [Google Scholar]
  • 21.Howell EA. Reducing disparities in severe maternal morbidity and mortality. Clin Obstet Gynecol. 2018;61:387–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Jackson JS, Knight KM, Rafferty JA. Race and unhealthy behaviors: chronic stress, the HPA Axis, and physical and mental health disparities over the life course. Am J Public Health. 2010;100:933–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Geronimus AT, Bound J. Use of census-based aggregate variables to proxy for socioeconomic group: Evidence from National Samples. Am J Epidemiol. 1998;148:475–486. [DOI] [PubMed] [Google Scholar]
  • 24.Hyland A, Cummings KM, Lynn WR, Corle D, Giffen CA. Effect of proxy-reported smoking status on population estimates of smoking prevalence. Am J Epidemiol. 1997;145:746–751. [DOI] [PubMed] [Google Scholar]
  • 25.McDonald JF. The use of proxy variables in housing price analysis. J Urban Econ. 1980;7:75–83. [Google Scholar]
  • 26.Montgomery MR, Gragnolati M, Burke KA, Paredes E. Measuring living standards with proxy variables. Demography. 2000;37:155–174. [PubMed] [Google Scholar]
  • 27.Root M. The use of race in medicine as a proxy for genetic differences. Philos Sci. 2003;70:1173–1183. [DOI] [PubMed] [Google Scholar]
  • 28.Kim Y, Steiner PM. Gain scores revisited: a graphical models perspective. Sociol Methods Res. 2019;50(3):1353–1375. [Google Scholar]
  • 29.Steiner PM, Kim Y. The mechanics of omitted variable bias: bias amplification and cancellation of offsetting biases. J Causal Inference. 2016;4(2):1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Miao W, Geng Z, Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105:987–993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lubotsky D, Wittenberg M. Interpretation of Regressions with Multiple Proxies p. 29. [Google Scholar]
  • 32.Kuroki M. Measurement Bias and Effect Restoration in Causal In-ference p. 25. [Google Scholar]
  • 33.Fewell Z, Davey Smith G, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007;166:646–655. [DOI] [PubMed] [Google Scholar]
  • 34.Schuster NA, Twisk JWR, ter Riet G, Heymans MW, Rijnhart JJM. Noncollapsibility and its role in quantifying confounding bias in logistic regression. BMC Med Res Methodol. 2021;21:136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int Stat Rev/Revue Internationale de Stat. 1991;59:227. [Google Scholar]
  • 36.Causality Pearl J.. Cambridge University Press; 2009. [Google Scholar]
  • 37.Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1984;71(1):1–10. ZSCC: 0001138 Publisher: [Oxford University Press, Biometrika Trust]. [Google Scholar]
  • 38.Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379. [DOI] [PubMed] [Google Scholar]
  • 39.Mood C. Logistic regression: why we cannot do what we think we can do, and what we can do about it. Eur Sociol Rev. 2010;26:67–82. [Google Scholar]
  • 40.Guo J, Geng Z. Collapsibility of logistic regression coefficients. J R Stat Soc B Methodol. 1995;57:263–267. [Google Scholar]
  • 41.Cantis S, Oliveri AM. Classification, Clustering, and Data Mining Applications. In: Banks D, McMorris FR, Arabie P, Gaul W, eds. An Overview of Collapsibility. Springer Berlin Heidelberg; 2004:587–596. [Google Scholar]
  • 42.Pordes R, Petravick D, Kramer B, et al. The open science grid. J Phys Conf Ser. 2007;78:012057. [Google Scholar]
  • 43.Sfiligoi I, Bradley DC, Holzman B, Mhashilkar P, Padhi S, Würthwein F. The Pilot Way to Grid Resources Using glideinWMS. IEEE Computer Society; 2009:428–432. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary material

Data Availability Statement

The dataset used in the application is not available for data sharing.

RESOURCES