Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2017 Oct 16;104(4):863–880. doi: 10.1093/biomet/asx053

Doubly robust nonparametric inference on the average treatment effect

D Benkeser 1,, M Carone 2, M J Van Der Laan 3, P B Gilbert 4
PMCID: PMC5793673  PMID: 29430041

Summary

Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.

Keywords: Adaptive estimation, Doubly robust estimation, Efficient influence function, Targeted minimum loss-based estimation

1. Introduction

In recent years, doubly robust estimators have gained immense popularity in many fields, including causal inference. An estimator is said to be doubly robust if it is consistent for the target parameter of interest when any one of two nuisance parameters is consistently estimated. This property gives doubly robust estimators a natural appeal: any possible inconsistency in the estimation of one nuisance parameter may be mitigated by the consistent estimation of the other. In many problems, doubly robust estimators arise naturally in the pursuit of asymptotic efficiency. Locally efficient estimators often exhibit double robustness due to the form of the efficient influence function of the estimated parameter in the statistical model considered. For many parameters that arise in causal inference, the efficient influence function assumes a doubly robust form, which may explain why doubly robust estimators arise so frequently in that area. For example, under common causal identification assumptions, the statistical parameter identifying the mean counterfactual response under a point treatment yields a doubly robust efficient influence function in a nonparametric model (Robins et al., 1994). Thus, locally efficient estimators of this statistical target parameter are naturally doubly robust. General frameworks for constructing locally efficient estimators can therefore be utilized to generate estimators that are doubly robust.

While the conceptual appeal of doubly robust estimators is clear, questions remain about how they should be constructed in practice. It has long been noted that finite-dimensional models are generally too restrictive to permit consistent estimation of nuisance parameters (Bang & Robins, 2005), but much current work on double robustness involves parametric working models and maximum likelihood estimation. Kang & Schafer (2007) showed that doubly robust estimators can be poorly behaved if both nuisance parameters are inconsistently estimated, leading to recent proposals for estimators that minimize bias resulting from misspecification (Vermeulen & Vansteelandt, 2014, 2016). While providing an improvement over conventional techniques, these estimators rely upon consistent estimation of at least one nuisance parameter using a parametric model. An alternative approach is to employ flexible, data-adaptive estimation techniques for both nuisance parameters to reduce the risk of inconsistency (van der Laan & Rose, 2011).

A general study of the behaviour of doubly robust estimators under inconsistent estimation of a nuisance parameter is needed in order to understand how to perform robust inference. This topic has not received much attention, perhaps because the problems arising from misspecification are well understood when parametric models are used. For example, if nuisance parameters are estimated using maximum likelihood, the resulting estimator of the parameter of interest is asymptotically linear even if one of the nuisance parameter models has been misspecified. Although in this scenario the asymptotic variance of the estimator may not be easy to calculate explicitly, resampling techniques may be employed for inference. When the estimator is the solution of an estimating equation, robust sandwich-type variance estimators may also be available. In contrast, when nuisance parameters are estimated using data-adaptive approaches, the implications of inconsistently estimating one nuisance parameter are much more serious. Generally, the resulting estimator is irregular, exhibits large bias and has a convergence rate slower than root-Inline graphic. As we show below, the implications for inference are dire: regardless of nominal level, the coverage of naïve two-sided confidence intervals tends to zero and the Type I error rate of two-sided hypothesis tests tends to unity as the sample size increases. This phenomenon cannot simply be avoided by better variance estimation, and it occurs even when the true variance of the estimator is known exactly. Neither is the nonparametric bootstrap a remedy, as, due to the use of data-adaptive procedures, it is not generally valid for inference.

In view of these challenges, investigators may believe it simpler to use to parametric models. However, this is unappealing since both nuisance parameters, and hence also the parameter of interest, are then likely to be inconsistently estimated. The use of flexible data-adaptive techniques, such as ensemble machine learning, appears necessary if one is to have any reasonable expectation of consistency for any of the nuisance parameter estimators (van der Laan & Polley, 2007). However, because such methods are highly adaptive, research is needed into developing appropriate methods for doubly robust inference that use such tools.

A first theoretical study of the problem of doubly robust nonparametric inference is the work of van der Laan (2014), who focused on the counterfactual mean under a single time-point intervention and considered targeted minimum loss-based estimation. As the average treatment effect is the difference between two counterfactual means under different treatments, it too was directly addressed. The estimators proposed therein were shown to be doubly robust with respect to not only consistency but also asymptotic linearity. Furthermore, under regularity conditions, the analytic form of the influence function in van der Laan (2014) is known, paving the way for the construction of doubly robust confidence intervals and Inline graphic-values. The proposed procedure is quite complex and has never been implemented. We are therefore motivated to study theoretically and numerically the following three questions about doubly robust nonparametric inference on an average treatment effect or, equivalently, on a counterfactual mean:

  1. 1. How badly does inconsistent nuisance parameter estimation affect inference using data-adaptive estimators, and how do estimators that allow doubly robust inference perform?

  2. 2. Can existing targeted minimum loss-based estimators be improved by new versions that require estimation of lower-dimensional nuisance parameters?

  3. 3. Can simpler alternatives to targeted minimum loss-based estimation be used to construct estimators that are doubly robust for inference and also easier to implement?

As we shall demonstrate in § 5, the answer to question 1 is that na§vely constructed confidence intervals can have very poor coverage, whereas intervals constructed based on appropriate correction procedures have coverage near their nominal level. This suggests that the methods discussed in the present paper are truly needed and may be quite useful. For question 2, we show that it is possible to reduce the dimension of the nuisance parameters introduced in the quest for doubly robust inference, which can provide finite-sample benefits relative to van der Laan (2014). More importantly, this methodological advance is likely to be critical to any extension of the methods discussed here to the setting of treatments defined by multiple time-point interventions. For question 3, we show that the most popular alternative framework to targeted minimum loss-based estimation, the so-called one-step approach, may not be used to theoretically guarantee doubly robust inference unless one knows which nuisance parameter is consistently estimated.

2. Doubly robust estimation

2.1. Notation and background

Suppose that the observed data unit is Inline graphic, where Inline graphic is a vector of baseline covariates, Inline graphic is a binary treatment, Inline graphic is an outcome, and Inline graphic is the true data-generating distribution, known only to lie in some model Inline graphic. We take Inline graphic, where Inline graphic is a nonparametric model, although arbitrary restrictions on the distribution of Inline graphic given Inline graphic are allowed without affecting our derivations. We focus on the parameter Inline graphic defined as

graphic file with name M14.gif

where Inline graphic is the so-called outcome regression and Inline graphic is the distribution function of the covariate vector. The parameter value Inline graphic represents the treatment-specific, covariate-adjusted mean outcome implied by Inline graphic. Under additional causal assumptions, it can be interpreted as the mean counterfactual outcome under the treatment corresponding to Inline graphic (Rubin, 1974). Because all developments below immediately apply to the Inline graphic case, and therefore to the average treatment effect, without loss of generality we explicitly examine only the case where Inline graphic.

As the parameter of interest depends on Inline graphic only through Inline graphic, we will at times write Inline graphic in place of Inline graphic. We will denote Inline graphic by Inline graphic for short, where Inline graphic is the true outcome regression and Inline graphic the true distribution of Inline graphic. The propensity score, defined as Inline graphic, plays an important role, and throughout this paper the true propensity score Inline graphic is assumed to satisfy Inline graphic for some Inline graphic and all Inline graphic in the support of Inline graphic. Below, we make use of empirical process notation, letting Inline graphic denote Inline graphic for each Inline graphic and Inline graphic-integrable function Inline graphic. We also denote by Inline graphic the empirical distribution function based on Inline graphic, so Inline graphic is the average Inline graphic.

Recall that a regular estimator Inline graphic of Inline graphic is asymptotically linear if and only if it can be written as Inline graphic, where Inline graphic is a gradient of Inline graphic at Inline graphic relative to model Inline graphic. Here, for each Inline graphic, we denote by Inline graphic the Hilbert space of all real-valued functions with zero mean and finite variance defined on the support of Inline graphic endowed with the covariance inner product. The function Inline graphic is said to be a gradient of Inline graphic at Inline graphic relative to Inline graphic if

graphic file with name M60.gif

for any regular one-dimensional parametric submodel Inline graphic with score Inline graphic for Inline graphic at Inline graphic and such that Inline graphic. In the nonparametric case, where Inline graphic, an example of such a submodel is given by Inline graphic, where Inline graphic is a normalizing constant and Inline graphic for any Inline graphic. Under sampling from Inline graphic, a regular and asymptotically linear estimator is efficient if and only if its influence function is the efficient influence function Inline graphic. The efficient influence function is the unique gradient that lies in the tangent space Inline graphic of Inline graphic at Inline graphic, and it is a crucial ingredient in the construction of asymptotically efficient estimators. For an overview of efficiency theory, see Bickel et al. (1997).

The efficient influence function of Inline graphic at Inline graphic relative to Inline graphic is

graphic file with name M79.gif

with Inline graphic denoting a realized value of Inline graphic (van der Laan & Robins, 2003).

2.2. Doubly robust consistency

Suppose that Inline graphic and Inline graphic are estimators of Inline graphic and Inline graphic, respectively, and denote by Inline graphic and Inline graphic their respective in-probability limits. We write Inline graphic, where Inline graphic is the empirical distribution based on observations Inline graphic. A linearization of the parameter allows us to write

graphic file with name M91.gif

where Inline graphic. The first equality is expected because Inline graphic is strongly differentiable in the sense of Pfanzagl (1982). As shorthand, we will write Inline graphic and Inline graphic. Using this notation, we can express Inline graphic as

graphic file with name M97.gif (1)

This representation reduces the analysis of the plug-in estimator Inline graphic to the consideration of four terms. The first term, Inline graphic, is the average of Inline graphic independent draws of the random variable Inline graphic, which has mean zero if either Inline graphic or Inline graphic. This observation is a simple but fundamental fact underlying doubly robust estimation. Since Inline graphic is known to converge to Inline graphic, the statement Inline graphic is equivalent to Inline graphic. The second term, Inline graphic, is a first-order bias term that must be accounted for to allow Inline graphic to be asymptotically linear. The third term is an empirical process term that is often asymptotically negligible, that is, Inline graphic. This is true, for example, if Inline graphic falls in a Inline graphic-Donsker class with probability tending to 1 and Inline graphic converges to zero in probability. For a comprehensive reference on empirical processes, we refer readers to van der Vaart & Wellner (1996). Finally, the fourth term, Inline graphic, is the remainder from the linearization. By inspection, this term tends to zero at a rate determined by how fast the nuisance functions Inline graphic and Inline graphic are estimated.

To correct for the first-order bias term Inline graphic, two general strategies may be used: the one-step Newton–Raphson approach and targeted minimum loss-based estimation. The first, hereafter called the one-step approach, suggests performing an additive correction for the first-order bias, leading to the estimator

graphic file with name M118.gif

This approach appeared in Ibragimov & Has’minskii (1981) and Pfanzagl (1982), and is the infinite-dimensional extension of the one-step Newton–Raphson technique for efficient estimation in parametric models. In this paper, the efficient influence function is a linear function of the parameter of interest. As such, the one-step estimator equals the solution of the optimal estimating equation for this parameter and is therefore equivalent to the so-called augmented inverse probability of treatment estimator (Robins et al., 1994; van der Laan & Robins, 2003). Owing to their simple construction, one-step estimators are generally computationally convenient, though this convenience comes at a cost, as the one-step correction may produce estimates outside the parameter space, such as probability estimates either below zero or above one. Targeted minimum loss-based estimation, formally developed in van der Laan & Rubin (2006) and comprehensively discussed in van der Laan & Rose (2011), provides an algorithm to convert Inline graphic into a targeted estimator Inline graphic of Inline graphic such that Inline graphic, which may then be used to define the targeted plug-in estimator Inline graphic. The first update of Inline graphic in this recursive scheme consists of the minimizer of an empirical risk over a least favourable submodel through Inline graphic. The process is iterated using updated versions of Inline graphic until convergence to yield Inline graphic. In the problem considered here, convergence occurs in a single step. By virtue of being a plug-in estimator, Inline graphic may exhibit improved finite-sample behaviour relative to its one-step counterpart (Porter et al., 2011).

The large-sample properties of both Inline graphic and Inline graphic can be studied through (1). As above, suppose that the empirical process term Inline graphic is asymptotically negligible. If both Inline graphic and Inline graphic are estimated consistently, so that Inline graphic and Inline graphic, and if estimation of these nuisance functions is fast enough to ensure that the remainder term Inline graphic is asymptotically negligible, then Inline graphic is asymptotically linear with influence function equal to Inline graphic, and thus it is asymptotically efficient. The same can be said of Inline graphic if the same conditions on Inline graphic and Inline graphic hold with Inline graphic replaced by Inline graphic. If only one of Inline graphic or Inline graphic holds, it is impossible to guarantee the asymptotic negligibility of the remainder term, even if both Inline graphic and Inline graphic lie in parametric models. Nevertheless, provided Inline graphic or Inline graphic, under very mild conditions, the remainder term Inline graphic based on either Inline graphic or Inline graphic tends to zero in probability, and the empirical process term Inline graphic remains asymptotically negligible. Since Inline graphic has mean zero if either Inline graphic or Inline graphic and has finite variance, the central limit theorem implies that Inline graphic is Inline graphic, so both Inline graphic and Inline graphic are consistent estimators of Inline graphic. This is so-called double robustness: consistent estimation of Inline graphic if either of the nuisance functions Inline graphic or Inline graphic is consistently estimated.

2.3. Doubly robust asymptotic linearity

Doubly robust asymptotic linearity is a more stringent requirement than doubly robust consistency. It is also arguably more important, since without it the construction of valid confidence intervals and tests may be very difficult, if not impossible. A careful study of Inline graphic is required to determine how doubly robust inference may be obtained.

When both the outcome regression and the propensity score are consistently estimated, Inline graphic is a second-order term consisting of the product of two differences, both tending to zero. Hence, provided Inline graphic and Inline graphic are estimated sufficiently fast, Inline graphic. This holds, for example, if both Inline graphic and Inline graphic are Inline graphic with respect to the Inline graphic-norm. If only one of the outcome regression or propensity score is consistently estimated, one of the differences in Inline graphic does not tend to zero. Consequently, Inline graphic is either of the same order as or tends to zero more slowly than Inline graphic. As such, Inline graphic at least contributes to the first-order behaviour of the estimator and may determine it entirely. If a correctly specified parametric model is used to estimate either Inline graphic or Inline graphic, the delta method generally implies that Inline graphic is asymptotically linear. In this case, both Inline graphic and Inline graphic are also asymptotically linear, though their influence function is the sum of two terms: Inline graphic and the influence function of Inline graphic as an estimator of zero. Correctly specifying a parametric model is seldom feasible in realistic settings, however, so it may be preferable to use adaptive estimators of the nuisance functions. In such a situation, whenever one nuisance parameter is inconsistently estimated, the remainder term Inline graphic tends to zero slowly and thus dominates the first-order behaviour of the estimator of Inline graphic. Therefore, in this case the one-step and targeted minimum loss-based estimators are doubly robust with respect to consistency but not with respect to asymptotic linearity.

To illustrate the deleterious effect of the remainder on inference in these situations, suppose that we construct an asymptotic level-Inline graphic two-sided Wald-type confidence interval for Inline graphic based on a consistent estimator Inline graphic, say with true standard error Inline graphic. Suppose further that Inline graphic tends to Inline graphic in probability, which often occurs when the bias of Inline graphic tends to zero more slowly than its standard error. Denoting by Inline graphic the Inline graphic quantile of the standard normal distribution, the coverage of the oracle Wald-type interval Inline graphic is such that

graphic file with name M197.gif

as Inline graphic. This remains true if we replace Inline graphic by any random sequence that converges to zero at the same rate or faster. If asymptotic linearity were preserved under inconsistent estimation of one of the nuisance parameters, Inline graphic would instead tend to a standard normal variate. The oracle Wald-type intervals, and in fact any Wald-type interval using a consistent standard error estimator, would have correct asymptotic coverage. This argument therefore stresses the benefit of constructing estimators that are doubly robust with respect to asymptotic linearity for the sake of obtaining confidence intervals and tests whose validity is doubly robust.

3. Doubly robust inference via targeted minimum loss-based estimation

3.1. Existing construction

van der Laan (2014) proposed a targeted minimum loss-based estimator of Inline graphic that is not only locally efficient and doubly robust for consistency but also doubly robust for asymptotic linearity. To do so he showed that, with some additional bias correction, Inline graphic may be rendered asymptotically linear with a well-described influence function. This requires approximating the first-order behaviour of Inline graphic using additional nuisance parameters, which consist of a bivariate and a univariate regression, defined respectively as

graphic file with name M204.gif (2)
graphic file with name M205.gif (3)

Expression (2) is the bivariate regression of the true propensity of treatment on an outcome regression and a propensity score, whereas (3) is the univariate regression of the residual from an outcome regression on a propensity score in the treated subgroup. The subscript Inline graphic emphasizes that these nuisance parameters are of reduced dimension relative to Inline graphic and Inline graphic. This dimension reduction is critical since it essentially guarantees that consistent estimators of these parameters can be constructed in practice. For example, we may be unable to consistently estimate Inline graphic, which is a function of the entire vector of potential confounders; however, we can guarantee consistent estimation of Inline graphic, which involves only a bivariate summary of Inline graphic.

Key to the study of how these additional nuisance parameters may be used to approximate the first-order behaviour of the remainder term Inline graphic are the functions

graphic file with name M213.gif

In Appendix A, we show that the remainder term Inline graphic can be represented as

graphic file with name M215.gif (4)

where Inline graphic and Inline graphic are bias terms, and Inline graphic, Inline graphic and Inline graphic are second-order terms. The specific form of these terms is provided in Appendix A, where we also discuss sufficient conditions for ensuring their asymptotic negligibility. Just as the bias term in (1) had to be accounted for to achieve doubly robust consistency, so too must the bias terms in (4) to achieve doubly robust asymptotic linearity.

The iterative targeted minimum loss-based estimation algorithm proposed in Theorem 3 of van der Laan (2014) produces estimators Inline graphic, Inline graphic, Inline graphic and Inline graphic from initial estimators Inline graphic and Inline graphic in such a manner as to ensure that

graphic file with name M227.gif

In view of (1) and (4), Inline graphic is asymptotically linear with influence function

graphic file with name M229.gif

provided either Inline graphic or Inline graphic is estimated consistently. If Inline graphic and Inline graphic are estimated consistently, then Inline graphic and Inline graphic. In this case, Inline graphic and Inline graphic are identically zero, which establishes local efficiency of Inline graphic.

3.2. Novel reduced-dimension construction

We now show how to theoretically improve upon the proposal of van der Laan (2014) through an alternative formulation of a targeted minimum loss-based estimator. We derive an approximation of the remainder that relies on alternative nuisance parameters of lower dimension than those presented in § 3.1. This makes the estimation problem involved more feasible and may also pave the way to a generalization of this work to settings with longitudinal treatments.

In Appendix B, we argue that the remainder term in (4) can be represented using Inline graphic as previously defined and the additional nuisance parameters

graphic file with name M240.gif

These are univariate regressions, in contrast to the bivariate regression Inline graphic described in § 3.1 and used in van der Laan (2014). Nonparametric estimators of these univariate parameters often achieve better rates than those proposed therein. Use of this alternative representation yields estimators guaranteed to be asymptotically linear under weaker conditions than previously required.

Here, we state the main result describing the behaviour of the estimator implied by this parameterization of the remainder term; we also discuss an iterative implementation of the estimator. Redefining

graphic file with name M242.gif

we have the following result.

Theorem 1.

Suppose that either Inline graphic or Inline graphic. Provided that the nuisance estimators Inline graphic satisfy

Theorem 1. (5)

and that the second-order terms Inline graphic and Inline graphic described in Appendix B are Inline graphic, the plug-in estimator Inline graphic is asymptotically linear with influence function Inline graphic. Furthermore, Inline graphic converges in law to a zero-mean normal random variable with variance estimated consistently by

Theorem 1.

An algorithm to construct nuisance estimators that satisfy (5) can be devised based on targeted minimum loss-based estimation. Without loss of generality, suppose Inline graphic. Defining Inline graphic, Inline graphic and Inline graphic, we implement the following recursive procedure.

Step 1.

Construct initial estimates Inline graphic and Inline graphic of Inline graphic and Inline graphic, and set Inline graphic.

Step 2.

Define Inline graphic and Inline graphic; fit a logistic regression with outcome Inline graphic, covariate Inline graphic and offset Inline graphic using only data points with Inline graphic; set Inline graphic as the estimated coefficient of Inline graphic; and define

Step 2.

Step 3.

Construct estimates Inline graphic and Inline graphic of Inline graphic and Inline graphic based on Inline graphic and Inline graphic.

Step 4.

Define Inline graphic and Inline graphic; fit a logistic regression with outcome Inline graphic, covariate Inline graphic and offset Inline graphic using only data points with Inline graphic; set Inline graphic as the estimated coefficient of Inline graphic; and define

Step 4.

Step 5.

Construct estimates Inline graphic of Inline graphic based on Inline graphic and Inline graphic.

Step 6.

Define Inline graphic and Inline graphic; fit a logistic regression with outcome Inline graphic, covariate Inline graphic and offset Inline graphic; set Inline graphic as the estimated coefficient of Inline graphic; and define

Step 6.

Step 7.

Set Inline graphic and iterate Steps 1–6 until Inline graphic is large enough that Inline graphic.

Step 8.

Set Inline graphic, Inline graphic, Inline graphic, Inline graphic and Inline graphic.

Theorem 1 implies that doubly robust confidence intervals and tests can readily be crafted. For example, the Wald construction Inline graphic is a doubly robust Inline graphic asymptotic confidence interval for Inline graphic, and prescribing rejection of the null hypothesis Inline graphic versus the alternative Inline graphic only when Inline graphic constitutes a doubly robust hypothesis test with asymptotic level Inline graphic. Thus, valid statistical inference is preserved when one nuisance parameter is inconsistently estimated, in contrast to conventional doubly robust estimation, wherein only consistency is preserved.

4. Doubly robust inference via one-step estimation

In § 2 we discussed the construction of doubly robust, locally efficient estimators of Inline graphic and argued that both the one-step approach and targeted minimum loss-based estimation can be used for bias correction. For the sake of constructing asymptotically efficient estimators, these two strategies are generally considered to be alternatives, with targeted minimum loss-based estimation possibly delivering better finite-sample behaviour but the one-step approach often simpler to implement. In § 3, we outlined how the bias-correction feature of the targeted minimum loss-based estimation framework could be leveraged to achieve doubly robust asymptotic linearity and thus perform doubly robust inference. Since targeted minimum loss-based estimation can be more complicated to implement than the one-step correction procedure, it is natural to wonder whether a one-step approach could also account for the additional bias terms that result from the inconsistent estimation of either Inline graphic or Inline graphic. If so, the resulting one-step estimator could provide a computationally convenient alternative to the algorithm described in § 3.2.

We recall that the doubly robust, locally efficient one-step estimator Inline graphic was constructed by adding the bias term Inline graphic to the initial plug-in estimator Inline graphic. By extension, it seems sensible to investigate whether the estimator

graphic file with name M320.gif (6)

is doubly robust with respect to asymptotic linearity. By (1) and (4), the estimator

graphic file with name M321.gif

is asymptotically linear with influence function Inline graphic, just as the targeted minimum loss-based estimators in the previous section. Therefore, Inline graphic is locally efficient and doubly robust with respect to asymptotic linearity. Nevertheless, to compute Inline graphic, the analyst must know which nuisance parameter, if any, is inconsistently estimated. Such information will generally not be available, except in the case of a randomized trial, where Inline graphic may be known to the experimenter. To study the properties of Inline graphic, we note that

graphic file with name M327.gif (7)

The one-step estimator Inline graphic corrects for both inconsistent estimation of Inline graphic and inconsistent estimation of Inline graphic. However, for consistent estimation of Inline graphic, no more than one of these two nuisance parameters can in reality be inconsistently estimated. In this case there is necessarily overcorrection in Inline graphic, and it is not a priori obvious whether this may be detrimental to the behaviour of the estimator. Elucidating this issue requires a careful study of each of the two bias-correction terms in settings where they are not in fact needed. For example, the term Inline graphic, used to correct for bias resulting from inconsistent estimation of Inline graphic, must be analysed in the scenario where it is actually Inline graphic that has been inconsistently estimated.

In Appendix C, we show that under reasonable conditions, we can represent the first summand on the right-hand side of (7) as

graphic file with name M336.gif

when Inline graphic, and we can represent the second summand as

graphic file with name M338.gif

when Inline graphic. This implies that the first-order behaviour of Inline graphic is driven by these terms. In particular, the rate of convergence of Inline graphic is determined by that of the estimators Inline graphic, Inline graphic and Inline graphic of the reduced-dimension nuisance parameters. In practice, these terms are unlikely to be estimable at the parametric rate, since this would require the correct specification of a parametric model for a complex object, and adaptive techniques are likely to be used. Because the rates achieved by these techniques are generally slower than Inline graphic, the estimator Inline graphic fails to be root-Inline graphic consistent and hence doubly robust with respect to asymptotic linearity. Using an argument identical to that in § 2.3, we can show that Wald-type confidence intervals for Inline graphic have similarly poor asymptotic coverage. Therefore, at least theoretically, the one-step construction does not appear helpful.

This warrants further discussion. The above theory shows that the targeted minimum loss-based estimation framework is able to simultaneously account for inconsistent estimation of either the outcome regression or the propensity score without the need to know which is required. In contrast, the one-step approach requires knowledge of which nuisance parameter is possibly inconsistently estimated to achieve doubly robust asymptotic linearity. Without this knowledge, asymptotic linearity cannot be guaranteed in a doubly robust fashion. This is relevant for future work to derive procedures for doubly robust inference on other parameters admitting doubly or multiply robust estimators.

5. Simulation study

5.1. Data-generating mechanism and crossvalidation set-up

In each of the simulations below, the baseline covariate vector Inline graphic had independent components with Inline graphic distributed according to a uniform distribution over the interval Inline graphic and Inline graphic distributed as a binary random variable with success probability Inline graphic. The conditional probability of Inline graphic given Inline graphic was Inline graphic. The outcome Inline graphic was a binary random variable whose conditional probability of occurrence given Inline graphic is Inline graphic.

We implemented and compared the performance of the following six estimators:

  • (i) the standard, uncorrected targeted minimum loss-based estimator;

  • (ii) the corrected targeted minimum loss-based estimator using bivariate nuisance regression, as proposed by van der Laan (2014);

  • (iii) the corrected targeted minimum loss-based estimator using univariate nuisance regressions, as introduced in Theorem 1;

  • (iv) the standard, uncorrected one-step estimator, commonly referred to as the augmented inverse probability weighted estimator;

  • (v) the corrected one-step estimator using bivariate nuisance regression;

  • (vi) the corrected one-step estimator using univariate nuisance regressions, as given in (6).

We evaluated these estimators in the following three scenarios.

  • I. Only the outcome regression is consistently estimated.

  • II. Only the propensity score is consistently estimated.

  • III. Both the outcome regression and the propensity score are consistently estimated.

The consistently estimated nuisance parameter, either the outcome regression or the propensity score, was estimated using a bivariate Nadaraya–Watson estimator with bandwidth selected by crossvalidation, while the inconsistently estimated nuisance parameter was estimated using a logistic regression model with main terms only, thus ignoring the interaction between Inline graphic and Inline graphic. The reduced-dimension nuisance parameters required for the additional correction procedure involved in computing estimators (ii), (iii), (v) and (vi) were estimated using the Nadaraya–Watson estimator with bandwidth selected by leave-one-out crossvalidation (Racine & Li, 2004). For scenarios I and II, we considered sample sizes Inline graphic. For scenario III, theory dictates that all estimators considered should be asymptotically equivalent, so we used only the sample sizes Inline graphic. For each Inline graphic we generated 5000 datasets. We summarized estimator performance in terms of bias, bias times Inline graphic, coverage of 95% confidence intervals, and accuracy of the standard error estimator, which we assessed by comparing the Monte Carlo variance of the estimator and the average estimated variance across simulations. We examined the following hypotheses based on our theoretical work.

  • (A) In scenarios I and II, the bias of estimators (i), (iv), (v) and (vi) tends to zero more slowly than Inline graphic, whereas that of estimators (ii) and (iii) does so faster than Inline graphic.

  • (B) In scenarios I and II, the slow convergence of the bias for estimators (i), (iv), (v) and (vi) adversely affects the nominal confidence interval coverage, whereas the corrected targeted minimum loss-based estimators (ii) and (iii) have asymptotically nominal coverage.

  • (C) In scenarios I and II, influence function-based variance estimators are accurate for the corrected estimators (ii), (iii), (v) and (vi), but not for the uncorrected estimators (i) and (iv).

  • (D) In scenario III, all estimators exhibit approximately the same behaviour.

5.2. Results

We first focus on the results for scenario I, in which only the outcome regression is consistently estimated. In Fig. 1(a), the bias of each estimator tends to zero, illustrating the conventional double robustness of these estimators. However, Fig. 1(b) supports hypothesis (A), as the bias of the uncorrected estimators tends to zero at a slower rate than Inline graphic, while the bias of the corrected targeted minimum loss-based estimators tends to zero faster. The bias of the corrected one-step estimators is reduced relative to the uncorrected estimators, and for the sample sizes considered we do not yet see the expected divergence in the bias when inflated by Inline graphic. Figure 1(c) indicates strong support for hypothesis (B): the coverage of intervals based on the uncorrected estimators is not only far from the nominal level but also U-shaped, suggesting worsening coverage in larger samples, as is expected based on our arguments in § 2.3. Intervals based on the corrected estimators have approximately nominal coverage in moderate and large samples. Figure 1(d) indicates that the variance estimators for the uncorrected estimators are liberal, which contributes to the poor coverage of intervals. The variance estimators for the corrected estimators are approximately accurate in larger samples, thus supporting hypothesis (C).

Fig. 1.

Fig. 1.

Simulation results when only the outcome regression is consistently estimated, with the following performance measures plotted against the sample size Inline graphic: (a) bias; (b) Inline graphicbias; (c) coverage of 95% confidence intervals; (d) accuracy of the standard error estimator. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.

Figure 2 summarizes the results for scenario II. In Fig. 2(b) we again see that the bias of the uncorrected estimators tends to zero more slowly than Inline graphic; this is also true of the corrected one-step estimators. In contrast, the bias of the corrected targeted minimum loss-based estimators appears to converge to zero faster than Inline graphic. Figure 2(c) partially supports hypothesis (B): intervals based on the uncorrected estimators achieve near-nominal coverage for moderate and large sample sizes in spite of the large bias. However, we again find the expected U-shape, with an eventual downturn in coverage as the sample size increases further. Intervals based on the corrected targeted minimum loss-based estimators using bivariate nuisance regression have improved coverage throughout, and intervals based on either corrected targeted minimum loss-based estimator have nearly nominal coverage in larger samples. Intervals based on the corrected one-step estimator with the univariate correction achieve approximately nominal coverage, while those based on the one-step estimator with bivariate correction do not, probably due to larger bias. Figure 2(d) shows that the variance estimator for the uncorrected one-step estimator is conservative, whereas that based on the uncorrected targeted minimum loss-based estimator is approximately accurate. The variance estimators based on the corrected one-step or targeted minimum loss-based estimators appear to be valid throughout, although that based on the latter using univariate nuisance regressions appears to be liberal in smaller samples.

Fig. 2.

Fig. 2.

Simulation results when only the propensity score is consistently estimated: (a) bias, (b) Inline graphicbias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against Inline graphic. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.

Finally, Fig. 3 supports hypothesis (D): when both the propensity score and the outcome regression are consistently estimated, all of the estimators perform approximately equally well, even in smaller samples. This suggests that implementing the correction procedures needed to achieve doubly robust asymptotic linearity and inference does not come at the cost of estimator performance in situations where the additional corrections are not needed.

Fig. 3.

Fig. 3.

Simulation results when both the outcome regression and the propensity score are consistently estimated: (a) bias, (b) Inline graphicbias, (c) coverage of 95% confidence intervals, and (d) accuracy of the standard error estimator plotted against Inline graphic. Squares represent estimators that do not account for inconsistent nuisance parameter estimation, circles represent estimators using the bivariate correction of van der Laan (2014), and triangles represent estimators using the proposed univariate corrections.

The run-times for the targeted minimum loss-based estimators (ii) and (iii) are on average two to three times as long as those of their one-step counterparts. The run-time required for the bivariate correction of estimators (ii) and (v) is on average one and a half times as long as the univariate correction for estimators (iii) and (iv). Two additional simulation studies in the Supplementary Material compare the proposed estimators with existing doubly robust estimators. The results demonstrate the advantage of estimators that allow for flexible nuisance parameter estimation in complex data-generating mechanisms. Results from a simulation study including a greater number of covariates, reported in the Supplementary Material, suggest a potential reduction in finite-sample bias for the proposed univariate-corrected targeted minimum loss-based estimator relative to the bivariate-corrected estimator of van der Laan (2014).

6. Concluding remarks

An interesting finding of this work is that it is possible to theoretically guarantee doubly robust inference under mild conditions using targeted minimum loss-based estimation, though not with the one-step approach. While we have found the corrected one-step estimators to perform relatively well in simulations, we cannot expect this in all scenarios, since theory suggests otherwise. Therefore, the preferred approach to providing doubly robust inference, in spite of its computational complexity, may be targeted minimum loss-based estimation. These methods are implemented in the R (R Development Core Team, 2017) package drtmle, available from the Comprehensive R Archive Network (Benkeser, 2017).

It may be fruitful to incorporate universally least favourable parametric submodels (van der Laan, 2016) into the targeted minimum loss-based estimation algorithms used here. Such submodels facilitate the construction of estimators using minimal data-fitting in the bias-reduction step of the algorithm. Rather than requiring iterations to perform bias reduction, use of these submodels would yield algorithms that converge in only a single step. This could yield improved performance in finite samples, particularly in extensions of this work to more complex parameters, such as average treatment effects defined by longitudinal interventions.

Supplementary Material

Supplementary Data

Acknowledgement

The authors thank the associate editor and reviewer for helpful suggestions. This work was a portion of the PhD thesis of Benkeser, supervised by Carone and Gilbert and funded by the Bill and Melinda Gates Foundation. Carone, van der Laan and Gilbert were partially funded by the National Institute of Allergy and Infectious Diseases. Carone was also funded by the University of Washington Department of Biostatistics Career Development Fund.

Supplementary material

Supplementary material available at Biometrika online includes results from an additional simulation study.

Appendix A

First-order expansion of the remainder

We derive equation (4) and sufficient conditions under which it holds. Note that

graphic file with name M378.gif

where we define the second-order remainder term by Inline graphic. Adding and subtracting Inline graphic and Inline graphic and simplifying, we find that

graphic file with name M382.gif

where the second-order remainder term is Inline graphic. Assuming that either Inline graphic or Inline graphic, we can write

graphic file with name M386.gif (A1)
graphic file with name M387.gif (A2)

Examining (A1), and with some abuse of notation, we observe that

graphic file with name M388.gif

where we have set Inline graphic. Then we may write

graphic file with name M390.gif

where we define

graphic file with name M391.gif

If, for example, each of Inline graphic, Inline graphic and Inline graphic is Inline graphic in the Inline graphic-norm, then Inline graphic and Inline graphic are Inline graphic. Furthermore, if Inline graphic falls in a Inline graphic-Donsker class with probability tending to one and Inline graphic, then Inline graphic.

Examining (A2) and again allowing some abuse of notation, we find that

graphic file with name M404.gif

where the additional nuisance parameter is defined as Inline graphic. Some algebraic manipulation allows us to write

graphic file with name M406.gif

where we define

graphic file with name M407.gif

If, for example, each of Inline graphic, Inline graphic and Inline graphic is Inline graphic in the Inline graphic-norm, then Inline graphic, Inline graphic and Inline graphic are Inline graphic. Furthermore, if Inline graphic falls in a Inline graphic-Donsker class with probability tending to one and Inline graphic, then Inline graphic. Thus, we have shown that (4) holds with Inline graphic, Inline graphic and Inline graphic.

Appendix B

Derivation of the reduced-dimension remainder representation

We proceed similarly to the derivations above, but now, with regard to (A2) and with some abuse of notation, we have

graphic file with name M424.gif

where we define the nuisance terms Inline graphic and Inline graphic. We can then write

graphic file with name M427.gif

where

graphic file with name M428.gif

If, for example, each of Inline graphic, Inline graphic and Inline graphic is Inline graphic in the Inline graphic-norm, it generally follows that Inline graphic and Inline graphic are Inline graphic. Furthermore, if Inline graphic falls in a Inline graphic-Donsker class with probability tending to one and Inline graphic, it also follows that Inline graphic. This implies that (4) holds with Inline graphic, Inline graphic and Inline graphic when the alternative reduced-dimension parameterization of the remainder is used.

Appendix C

Behaviour of unnecessary correction terms

We first examine the behaviour of Inline graphic when Inline graphic. We note that Inline graphic, where we define the empirical process term Inline graphic, which can reasonably be assumed to be Inline graphic. The second equality is a consequence of the fact that Inline graphic for all Inline graphic, because Inline graphic. With some abuse of notation, we can write

graphic file with name M452.gif

where

graphic file with name M453.gif

which is Inline graphic under the rate conditions outlined in Appendices A and B.

We next examine the behaviour of Inline graphic when Inline graphic. We have

graphic file with name M457.gif

where we define the empirical process term Inline graphic, which can reasonably be assumed to be Inline graphic. As above, the second equality is a consequence of the fact that Inline graphic for all Inline graphic, because Inline graphic when Inline graphic. With some abuse of notation we can write

graphic file with name M464.gif

where

graphic file with name M465.gif

which is Inline graphic under the rate conditions above.

References

  1. Bang H. & Robins J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–73. [DOI] [PubMed] [Google Scholar]
  2. Benkeser D. (2017). drtmle: Doubly-Robust Nonparametric Estimation and Inference. R package version 1.0.0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bickel P., Klaassen C., Ritov Y. & Wellner J. (1997). Efficient and Adaptive Estimation for Semiparametric Models. Baltimore, Maryland: Johns Hopkins University Press. [Google Scholar]
  4. Ibragimov I. A. & Has’minskii R. Z. (1981). Statistical Estimation: Asymptotic Theory, vol. 2.New York: Springer. [Google Scholar]
  5. Kang J. D. & Schafer J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22, 523–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Pfanzagl J. (1982). Contributions to a General Asymptotic Statistical Theory. New York: Springer. [Google Scholar]
  7. Porter K. E., Gruber S., van der Laan M. J. & Sekhon J. S. (2011). The relative performance of targeted maximum likelihood estimators. Int. J. Biostatist. 7, 1–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. R Development Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing,Vienna, Austria: ISBN 3-900051-07-0.http://www.R-project.org. [Google Scholar]
  9. Racine J. & Li Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. J. Economet. 119, 99–130. [Google Scholar]
  10. Robins J. M., Rotnitzky A. & Zhao L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Am. Statist. Assoc. 89, 846–66. [Google Scholar]
  11. Rubin D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66, 688–701. [Google Scholar]
  12. van der Laan M. J. (2014). Targeted estimation of nuisance parameters to obtain valid statistical inference. Int. J. Biostatist. 10, 29–57. [DOI] [PubMed] [Google Scholar]
  13. van der Laan M. J. (2016). One-step targeted minimum loss-based estimation based on universal least favorable one-dimensional submodels. Int. J. Biostatist. 12, 351–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. van der Laan M. J. & Polley E. C. (2007). Super learner. Statist. Appl. Genet. Molec. Biol. 6, 1–23. [DOI] [PubMed] [Google Scholar]
  15. van der Laan M. J. & Robins J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. New York: Springer. [Google Scholar]
  16. van der Laan M. J. & Rose S. (2011). Targeted Learning: Causal Inference for Observational and Experimental Data. New York: Springer. [Google Scholar]
  17. van der Laan M. J. & Rubin D. (2006). Targeted maximum likelihood learning. Int. J. Biostatist. 2, 1–40. [Google Scholar]
  18. van der Vaart A. W. & Wellner J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. New York: Springer. [Google Scholar]
  19. Vermeulen K. & Vansteelandt S. (2014). Bias-reduced doubly robust estimation. J. Am. Statist. Assoc. 110, 1024–36. [DOI] [PubMed] [Google Scholar]
  20. Vermeulen K. & Vansteelandt S. (2016). Data-adaptive bias-reduced doubly robust estimation. Int. J. Biostatist. 12, 253–82. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biometrika are provided here courtesy of Oxford University Press

RESOURCES