Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Dec 1.
Published in final edited form as: Stat Methods Med Res. 2012 Feb 23;24(6):1003–1008. doi: 10.1177/0962280212437451

Consistent causal effect estimation under dual misspecification and implications for confounder selection procedures

Susan Gruber 1, Mark J van der Laan 2
PMCID: PMC4081493  NIHMSID: NIHMS603974  PMID: 22368176

Abstract

In a previously published article in this journal, Vansteeland et al. [Stat Methods Med Res. Epub ahead of print 12 November 2010. DOI: 10.1177/0962280210387717] address confounder selection in the context of causal effect estimation in observational studies. They discuss several selection strategies and propose a procedure whose performance is guided by the quality of the exposure effect estimator. The authors note that when a particular linearity condition is met, consistent estimation of the target parameter can be achieved even under dual misspecification of models for the association of confounders with exposure and outcome and demonstrate the performance of their procedure relative to other estimators when this condition holds. Our earlier published work on collaborative targeted minimum loss based learning provides a general theoretical framework for effective confounder selection that explains the findings of Vansteelandt et al. and underscores the appropriateness of their suggestions that a confounder selection procedure should be concerned with directly targeting the quality of the estimate and that desirable estimators produce valid confidence intervals and are robust to dual misspecification.

Keywords: collaborative double robustness, TMLE, collaborative targeted maximum likelihood estimation, propensity score, confounder selection, causal effect, causal inference, dual misspecification

1 Introduction

In a statistical analysis of observational data, a number of events, including differential selection into exposure groups, informative treatment switches, and drop-out over time, can bias causal effect estimates if not appropriately handled. Moreover, unless one is willing to rely on untestable modeling assumptions, there must be experimentation within strata defined by combinations ofcovariates causally related to both treatment and outcome (confounders) in order to adjust a causal effect estimate in a manner that reduces bias. A finite sample from an observational study may contain borderline-sufficient information for identifying the desired causal effect. An exposure effect estimate from such a dataset will tend to be highly variable and often remains biased. Confounder selection is thus an especially important issue in causal inference when there is sparsity in the data,1-6 and estimator performance depends on employing a principled strategy. A theme running through our previous work on targeted minimum loss based learning and targeted maximum likelihood estimation (TMLE) is that estimation procedures should be tailored to provide high-quality answers to questions of scientific interest. From a statistical perspective, this means making a bias variance trade-off that is targeted to yield maximally efficient, unbiased estimation of a parameter of a statistical distribution that provides an answer to the scientific research question.7-11

2 Collaborative double robust estimation

Double robust (DR) estimators solve an estimating equation defined by a gradient of the pathwise derivative of the target parameter viewed as mapping from the statistical model to the parameter space. In particular, if the estimating equation corresponds with the so-called canonical gradient (also called the efficient influence curve), then these DR estimators are also tailored to be asymptotically efficient. These estimators have been shown to be consistent for coarsened at random data structures when either the full data distribution (Q0) or censoring mechanism (g0) is consistently estimated.12-14 In a simple binary point treatment (exposure) setting where the data consists of n independent and identically distributed copies of data structure O = (W, A, Y) drawn from joint probability distribution P0 = (Q0, g0), g0 corresponds to the conditional distribution of treatment indicator A, given baseline covariate vector W, (i.e. the conditional propensity score distribution), and Q0 factorizes into the conditional distribution of outcome Y, given A and W and the distribution of W, (Q0 = (Q0Y, Q0W). This observed data can be viewed as a missing data structure Y = (W, A, Y = Y(A)) on the full data X = (W, Y(0), Y(1)), and one might assume the randomization assumption, AX | W, so that target parameters of Q0 can be interpreted as causal effects.

Consider the additive treatment effect (ATE) target parameter, defined non-parametrically as E0(Y(1) − Y(0)). This causal quantity is identified by the statistical mapping ψ(P0) = E0(E0(Y | A = 1, W) − E0(Y | A = 0, W)) defined on a non-parametric statistical model which maps the probability distribution to a real number. An asymptotically linear estimator has an influence curve that describes the behavior of the estimator under perturbances in the empirical distribution of the data. Among all the influence curves generated by the class of regular asymptotically linear estimators, the one with the minimum variance is known as the efficient influence curve D*(P). The efficient influence curve can be calculated for any given target parameter mapping: IR and statistical model (i.e. class of probability distributions), M, at any PM. An estimator is efficient at P if and only if it is asymptotically linear with an influence curve equal to D*(P). Continuing our example, the efficient influence curve for the ATE parameter is given by

D(P)=2A1g(AW)(YQ(A,W))+Q(1,W)Q(0,W)Ψ(Q)

where Q(A,W)=EP(YA,W) and g(1 | W)=P(A=1 | W).

All DR estimators based on D* solve PnDn1nΣi=1nDn(Oi) in some way. For example, an estimating equation approach defines ψn the solution of PnD*(Qn, gn, ψ) = 0 in ψ for given estimators Qn, gn, (where the subscript n indicates an estimate of the truth). A targeted minimum loss based estimator Ψ(Qn) (TMLE) involves constructing an estimator Qn of Q0 that also satisfies the equation PnD*(Q, gn, ψ(Q)) = 0 in Q. TMLEs are substitution estimators Ψ(Qn) obtained by plugging in a targeted estimate Qn of Q0 in the parameter mapping. By construction of the TMLE Qn, the linear span of the score equations solved by TMLE includes the efficient influence curve estimating equation, which explains why the double robustness result also applies to TMLEs.

In this setting, a DR estimator is consistent if either the outcome regression Q0(A,W)=E0(YA,W) or the treatment assignment mechanism g0(1 | W) = P(A = 1 | W) is consistently estimated. Beyond this, we have previously shown that for estimators satisfying Pn D*(ψn, Qn, gn) = 0, given a limit Q of Qn, there exist a specified set of possible limits g of gn for which this estimator ψn remains consistent for ψ0.8,10 Let G(Q, P0) be the set of all conditional distributions satisfying this condition: that is, for each data distribution P0, and Q, we define G(Q, P0)= {g: P0 D*(Q, g, ψ0) = 0} as the candidate censoring/treatment mechanisms that would result in an unbiased estimating function for the target ψ0. At a minimum, this set of conditional distributions, G(Q, P0), contains g0. It also contains any additional conditional distribution that is sufficient for removing residual bias in the estimate. For example (Theorem 2, van der Laan and Gruber8), if residual bias (Q0(A,W)Q(A,W))=f0(A,W(Q)) only depends on W through W(Q) and gs(Q) is a conditional distribution of A, given W(Q) (or more), then P0 D*(0, Q, gs(Q)) = 0 and thus gs(Q) ∈ G(Q, P0).

In addition, for the ATE parameter E(Y(1) − Y(0)), a conditional distribution of A, given S(W) with

Hg(QQ0(QQ0)(1,W)g(1W)+(QQ0)(0,W)g(0W)

being a function of S(W), is also an element of G(Q, P0). In fact, our general result presented in a paper8 in 2010 and described below, applied to this example shows that we just need that g solves the single score equation P0Hg(QQ0)(W)(Ag(1W))=0, A would which would be solved by a logistic regression with offset logit(g) and clever covariate Hg(QQ0).

A DR relying on (Qn, gn) is asymptotically unbiased when gn will converge to an element in G(Q, P0), with Q being the limit of Qn, but the finite sample efficiency of the estimator of ψ0 varies with the choice of estimator gn. This fundamental collaborative double robustness of the efficient influence curve has important implications for nuisance parameter estimation procedures, which should be tailored for effective estimation of the parameter of scientific interest.

In previous papers inspired by this collaborative double robustness of the efficient influence curve, we presented an estimator within the targeted minimum loss based estimation (TMLE) framework that we refer to as a collaborative targeted maximum likelihood estimator (C-TMLE).8,10 We use the term collaborative to draw attention to the fact that the fits for the outcome regression and the propensity score work in tandem to a achieve a full bias reduction for the target parameter. Specifically, candidate updates of a fit of the propensity score (e.g. corresponding with adding a variable to the model for the propensity score) are evaluated by the goodness of fit of the corresponding targeted maximum likelihood update of the current estimator of Q0. In this manner, gn is indeed constructed in response to residual bias QnQ0, so that gn is aimed to converge to an element in G(Q, P0).

In our previous work, we gave a general characterization of G(Q, P0) as follows.8 In the case where the efficient influence curve D*(P) can be represented as D*(ψ, Q, g), the efficient influence curve estimating equation for ψ at a (Q, g) is given by P0D*(ψ0, Q, g) = 0. Classical double robustness theory tells us that this equation is solved at the true ψ0 when Q = Q0, at some gg0 or if g = g0 at any Q. For consistency of an estimator ψn solving 0 = Pn D*(ψn, Qn, gn), we require that the limits (Q, g) of (Qn, gn) satisfy P0D*(ψ0, Q, g) = 0. Equivalently, we can write

P0[D(ψ0,Q,g)D(ψ0,Q0,g)]=0 (1)

Recall that the efficient influence curve can be decomposed as D*(ψ, Q, g) = DIPTW(ψ, g)−DCAR(Q, g) in terms of an inverse probability of treatment weighted (IPTW) estimating function and a score DCAR of g in the model that only assumes coarsening at random12 (see Theorem 1.3 in van der Laan and Robins14). Here, DCAR is a function of O with conditional mean zero, given full data X, and QDCAR(Q, g) is linear in Q. For many statistical models and target parameters, this representation of the efficient influence curve exists. Substituting this representation into equation (1) yields,

P0[DIPTW(ψ0,g)DCAR(Q,g)(DIPTW(ψ0,g)DCAR(Q0,g))]=0

or equivalently, P0[DCAR(Q0Q, g)] = 0. Thus, P0 DCAR(Q0Q, g) = 0 implies P0 D*(ψ0, Q, g) = 0, and thereby under regularity conditions, any estimator, ψn, that solves PnD*(ψn, Qn, gn) = 0 for (Qn, gn) converging to (Q, g) satisfying P0DCAR(QQ0, g) = 0 will be consistent for ψ0 (see Theorem 1 in van der Laan and Gruber8). In particular, we can define G(Q, P0) = {g: P0 DCAR(QQ0, g) = 0}, which in practice is saying that the estimator gn needs to approximately solve the score equation PnDCAR(QnQ0, gn) so that in the limit P0DCAR(QQ0, g) = 0. Note that in our additive effect example, we have that DCAR(QQ0,g)=Hg(QQ0)(Ag(1W)).

3 Causal effect estimation

If we now turn our attention to the estimation of causal effects, knowledge of the collaborative double robustness property helps us to understand why even for non-DR estimators (e.g. estimators that solve the efficient influence curve equation PnD*(Q, gn, ψn) = 0 at an intentionally misspecified Q, such as Q = 0), the likelihood for g is not the most relevant guide for selecting confounders into propensity score models. Predictors of treatment are not necessarily strong predictors of the outcome, and because the goal is to achieve an optimal bias/variance trade-off for the target parameter, the mean squared error for the target parameter should factor into confounder selection for any estimation procedure. Other researchers have reached a similar conclusion and suggest propensity score estimators are best evaluated with respect to their effect on estimation of the causal effect of interest, not by metrics such as likelihoods or classification rates.15-17 Vansteelandt et al.18 propose a stabilized propensity score estimator and report a limited set of conditions under which consistent estimation of a marginal treatment effect is possible even when Q and g are both misspecified. We recognize this as a specific instance of collaborative double robustness.

Section 3.2 of Vansteelandt et al.18 focuses on a space of semi-parametric models of the form Y = βA + θ(W) + ε, E(ε| A, W) = 0, and a target parameter β. If the conditional variance of Y, given A, W, only depends on W, the efficient influence curve for this parameter is upto a standardizing constant given by

D(θ,g,β)=(Ag(W))(YβAθ(W)),

where g(W) = E(A|W) (for formal derivations and theorems, see, e.g. Yu and van der Laan19 and Robins and Rotnitzky20). Given θ, we define a set G(θ, P0) of conditional distributions, g, that satisfy P0D*(θ, g, β0) = 0, where β0 is the true parameter value. Recall the decomposition D(θ,g,β)=DIPTW(g,β)+DCAR(θ,g), where DCAR(θ,g) is an element of the tangent space of A, given W. The equation for DCAR in this semi-parametric model is given by DCAR(θ, g) = (Ag(1 | W)) θ(W) (page 24 of the original paper7 on TMLE). Thus, G(θ, P0) contains all conditional distributions g such that P0(AE(A|W)) (θθ0)(W) = 0. If, for example, g is fitted with logistic regression with covariate θθ0, (with θ0 being the truth), then this remains an unbiased estimating function in. In the special case described in Vansteelandt et al.18 that P0 is restricted such that θ(W) = γ(W), with γ(W) linear, and g is fitted with logistic linear regression using W, the estimated gn is asymptotically a member of G(θ, P0) for all θ linear in W, including θ = 0. This special case of collaborative double robustness corresponds exactly with the insight provided in Section 3.2 of Vansteelandt et al.18

A general C-TMLE introduced in our earlier papers has been implemented and applied to point treatment and longitudinal data.21,22 The development of a targeted forward selection algorithm to select covariates to include in the propensity score model is guided by the theory outlined above and fully presented in our above referenced articles on C-TMLE, which is DR, and inference can be based on bootstrap variance estimates as well as the variance of the efficient influence curve. Results when C-TMLE is applied to data generated as described in Section 3.3 of Vansteelandt et al.18 were presented at the WNAR 2011 Spring Meeting23 and are described in a forthcoming paper.

Acknowledgments

Funding

This study was supported by the National Institutes of Health (grant no. 5R01AI74345-5) and the National Institutes of Health/National Heart, Lung, and Blood Institute (grant no. R01HL080644).

References

  • 1.Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Math Model. 1986;7:1393–1512. [Google Scholar]
  • 2.Robins JM. addendum to: A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect [Math Model 1986; 7(9-12): 1393-1512; MR 87m:92078] Comput Math Appl. 1987;14(9-12):923–945. [Google Scholar]
  • 3.Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proc Am Stat Assoc Sec Bayesian Stat Sci. 2000:6–10. [Google Scholar]
  • 4.Tan Z. Bounded, efficient, and doubly robust estimation with inverse weighting. Biometrika. 2008;94:1–22. [Google Scholar]
  • 5.Tan Z. Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika. 2010;97(3):661–682. [Google Scholar]
  • 6.Petersen ML, Porter KE, Gruber S, et al. Diagnosing and responding to violations in the positivity assumption. Stat Meth Med Res. doi: 10.1177/0962280210386207. Published Online 28 October 2010. DOI: 10.1177/0962280210386207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int J Biostat. 2006;2(1) doi: 10.2202/1557-4679.1211. article 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.van der Laan MJ, Gruber S. Collaborative double robust penalized targeted maximum likelihood estimation. Int J Biostat. 2010;6(1) doi: 10.2202/1557-4679.1181. article 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat. 2010;6(1) doi: 10.2202/1557-4679.1260. article 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gruber S, van der Laan MJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat. 2010;6(1) doi: 10.2202/1557-4679.1182. article 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.van der Laan MJ, Rose S. Targeted learning: prediction and causal inference for observational and experimental data. Springer; New York: 2011. [Google Scholar]
  • 12.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS epidemiology: methodological issues. Birkhäuser; Boston: 1992. pp. 297–331. [Google Scholar]
  • 13.Robins JM, Wang N. Inference for imputation estimators. Biometrika. 2000;87:113–124. [Google Scholar]
  • 14.van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer; New York: 2003. [Google Scholar]
  • 15.Lee BK, Lessler J, Stuart EA. Improved propensity score weighting using machine learning. Stat Med. 2009;29:337–346. doi: 10.1002/sim.3782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schneeweiss S, Rassen JA, Glynn RJ, et al. Highdimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522. doi: 10.1097/EDE.0b013e3181a663cc. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Westreich D, Cole SR, Funk MJ, et al. The role of the c-statistic in variable selection for propensity scores. Pharmacoepidemiol Drug Saf. 2011;20:317–320. doi: 10.1002/pds.2074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vansteelandt S, Bekaert M, Claeskens G. On model selection and model misspecification in causal inference. Stat Methods Med Res. 2012;21(1):7–30. doi: 10.1177/0962280210387717. [DOI] [PubMed] [Google Scholar]
  • 19.Yu A, van der Laan MJ. U.C. Berkeley Division of Biostatistics Working Paper Series. Sep, 2003. Measuring treatment effects using semiparametric models. Working Paper 136. [Google Scholar]
  • 20.Robins JM, Rotnitzky A. Bickel PJ, Kwon J, editors. Comment on Inference for semiparametric models: some questions and an answer. Stat Sinica. 2001;11:920–935. [Google Scholar]
  • 21.Porter KE, Gruber S, van der Laan MJ, et al. The relative performance of targeted maximum likelihood estimators. Int J Biostat. 2011;7(1) doi: 10.2202/1557-4679.1308. article 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Stitelman OM, van der Laan MJ. Collaborative targeted maximum likelihood for time to event data. Int J Biostat. 2010;6(1) doi: 10.2202/1557-4679.1249. article 21. [DOI] [PubMed] [Google Scholar]
  • 23.Gruber S, van der Laan MJ. Collaborative targeted maximum likelihood estimation; WNAR 2011, Spring Meeting; Archorage, AL. 6-14 May, 2011. [Google Scholar]

RESOURCES