Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 5.
Published in final edited form as: Eur J Epidemiol. 2019 Jul 25;35(1):87–88. doi: 10.1007/s10654-019-00538-x

The validity of propensity score analysis using complete cases with partially observed covariates

Byeong Yeob Choi 1, Jonathan Gelfond 1
PMCID: PMC8098813  NIHMSID: NIHMS1641749  PMID: 31346889

In their simulation study, Choi et al. [1] compared different methods for handling missing covariate data in the context of propensity score analysis. This study addressed an important question on how propensity score matching and weighting perform coupled with different methods handing missing data when a treatment effect is homogenous or heterogeneous. Their study concluded that if a treatment effect is homogenous then a complete case analysis yields a valid average treatment effect estimate under missing at random (MAR) and even under missing not at random (MNAR) mechanisms. We are concerned that the authors made this conclusion from a limited range of simulation scenarios for a homogeneous treatment effect that considered only a null treatment effect and missing mechanisms independent of the outcome.

From the study of Leyrat et al. [2], we see evidence for which complete case analysis may be biased for a homogeneous treatment effect, if the missingness of covariate values is dependent on the outcome under a missing at random (MAR) mechanism. Leyrat et al. simulated a binary outcome from a logistic regression model with a binary treatment and three covariates, among which two covariates were subject to a missingness with the probability that was a function of the outcome and the fully observed covariate. Leyrat et al. evaluated propensity score weighting analysis using complete cases and after multiple imputation in terms of the marginal relative risk (RR). The logistic regression model for the outcome had no interaction terms between the treatment and any covariate, and the zero regression coefficient for the treatment was used to generate a null treatment effect, RR = 1, which was also homogeneous. When RR = 1 and the missingness was independent of the outcome, complete case analysis appeared to be unbiased. However, when RR = 1 and the missingness was dependent on the outcome, complete case analysis appeared to be biased. Leyrat et al. also considered other three casual estimands under scenarios of a homogeneous treatment effect and showed that there was bias in treatment effect estimates from complete case analysis whenever the missingness was dependent on the outcome.

We need to note that the homogeneity of a treatment effect depends on the types of outcome and causal estimand. A homogeneous treatment effect holds for a continuous outcome when the conditional mean model has no interactions between the treatment and the covariates. With a binary outcome, a homogeneous treatment effect may hold for one causal estimand, but not for other causal estimands, unless there is no treatment effect [3]. Thus, the assumption of homogeneous treatment effect is most likely to be violated if multiple causal estimands with a binary outcome are considered.

Choi et al. used the potential outcome framework to describe when complete case analysis yields unbiased treatment effect estimates. Let Y1 andY0 denote potential outcomes that would have seen if an individual had been treated and not treated, and Z denote a binary treatment (Z = 1 if treated, Z = 0 otherwise). Then, the average treatment effect (ATE) is defined as E[Y1Y0]. For simplicity, as in Choi et al., we consider two covariates, X = (X1, X2); only X2 has missing values, and R denotes the corresponding missing indicator(R = 1 if there are missing values, R = 0 otherwise). If treatment effects are the same for all individuals, then the ATEs for the populations with and without missing values are identical to the ATE for the whole population:

E[Y1Y0|R=0]=E[Y1Y0|R=1]=E[Y1Y0]. (1)

Condition (1) holds because Y1Y0 for any group of subjects is identical to the ATE for the whole population. Therefore, if the treatment is heterogeneous, then complete case analysis with any propensity score methods such as inverse probability of treatment weighting (IPTW) and matching will be invalid, because the ATE for the R = 0 group will not be the same as the ATE for the whole population. Choi et al. used condition (1) to address the validity of complete case analysis. However, condition (1) is the necessary condition for complete case analysis to be valid for the ATE. Importantly, a propensity score analysis using complete cases can yield biased estimates for the ATE under condition (1).

The fundamental assumption of propensity score analysis is that treatment assignment should be statistically independent of the potential outcomes conditional on the observed covariates. This is called the assumption of strongly ignorable treatment assignment [4], and can be written as

(Y1Y0)Z|X, (2)

where denotes statistical independence. The IPTW estimator is known to be consistent for the ATE under condition (2). Therefore, the IPTW estimator using complete cases will be consistent if condition (2) holds for the R = 0 group, i.e.,

(Y1Y0)Z|X,R=0, (3)

A question arises as to what additional conditions are needed to have condition (3) from condition (2). Clearly, if it is missing completely at random (MCAR), then condition (3) holds from condition (2) because condition (2) holds regardless of the value of R. Under MAR, simulation studies [1, 2, 5] have shown that complete case analysis is unbiased if the missingness is independent of the outcome and there are no treatment effects for all subjects.

Acknowledgment

This research was supported in part by the National Cancer Institute for the Mays Cancer Center (P30CA054174) at the UT Health Science Center at San Antonio.

Footnotes

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Choi J, Dekkers OM, le Cessie S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur J Epidemiol. 2019;34(1):23–36. 10.1007/s10654-018-0447-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Leyrat C, Seaman SR, White IR, Douglas I, Smeeth L, Kim J, et al. Propensity score analysis with partially observed covariates: How should multiple imputation be used? Stat Methods Med Res. 2019;28(1):3–19. 10.1177/0962280217713032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Austin PC. The performance of different propensity score methods for estimating marginal odds ratios. Stat Med. 2007;26(16):3078–94. 10.1002/sim.2781. [DOI] [PubMed] [Google Scholar]
  • 4.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. 10.1093/biomet/70.1.41. [DOI] [Google Scholar]
  • 5.White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med. 2010;29(28):2920–31. 10.1002/sim.3944. [DOI] [PubMed] [Google Scholar]

RESOURCES