Skip to main content
The International Journal of Biostatistics logoLink to The International Journal of Biostatistics
. 2010 Aug 1;6(1):26. doi: 10.2202/1557-4679.1260

A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome

Susan Gruber 1, Mark J van der Laan 2
PMCID: PMC3126669  PMID: 21731529

Abstract

Targeted maximum likelihood estimation of a parameter of a data generating distribution, known to be an element of a semi-parametric model, involves constructing a parametric model through an initial density estimator with parameter ɛ representing an amount of fluctuation of the initial density estimator, where the score of this fluctuation model at ɛ = 0 equals the efficient influence curve/canonical gradient. The latter constraint can be satisfied by many parametric fluctuation models since it represents only a local constraint of its behavior at zero fluctuation. However, it is very important that the fluctuations stay within the semi-parametric model for the observed data distribution, even if the parameter can be defined on fluctuations that fall outside the assumed observed data model. In particular, in the context of sparse data, by which we mean situations where the Fisher information is low, a violation of this property can heavily affect the performance of the estimator. This paper presents a fluctuation approach that guarantees the fluctuated density estimator remains inside the bounds of the data model. We demonstrate this in the context of estimation of a causal effect of a binary treatment on a continuous outcome that is bounded. It results in a targeted maximum likelihood estimator that inherently respects known bounds, and consequently is more robust in sparse data situations than the targeted MLE using a naive fluctuation model.

When an estimation procedure incorporates weights, observations having large weights relative to the rest heavily influence the point estimate and inflate the variance. Truncating these weights is a common approach to reducing the variance, but it can also introduce bias into the estimate. We present an alternative targeted maximum likelihood estimation (TMLE) approach that dampens the effect of these heavily weighted observations. As a substitution estimator, TMLE respects the global constraints of the observed data model. For example, when outcomes are binary, a fluctuation of an initial density estimate on the logit scale constrains predicted probabilities to be between 0 and 1. This inherent enforcement of bounds has been extended to continuous outcomes. Simulation study results indicate that this approach is on a par with, and many times superior to, fluctuating on the linear scale, and in particular is more robust when there is sparsity in the data.

Keywords: targeted maximum likelihood estimation, TMLE, causal effect

Erratum

On September 20, 2010 the following acknowledgment was added for the article:

“This work was supported by NIH grant R01AI074345-04.”

1. Introduction

Targeted maximum likelihood estimation (TMLE) yields semi-parametric efficient substitution estimators of parameters in semi-parametric models (van der Laan and Rubin, 2006). In particular, it can be applied to estimating the statistical counterpart of a causal parameter. In this article a new targeted maximum likelihood estimator for estimating a causal effect of a binary treatment on a continuous outcome is introduced. This estimator is more robust than a previously presented TMLE procedure when there is sparsity in the data that decreases the identifiability of the parameter of interest.

Sparsity is defined as low information in the dataset for the purpose of learning the target parameter. Formally, the Fisher information, I, is defined as sample size n divided by the variance of the efficient influence curve: I = n/var(D* (O)), where D* (O) is the efficient influence curve of the target parameter at the true data generating distribution. The reciprocal of the variance of the efficient influence curve can be viewed as the information one observation contains for the purpose of learning the target parameter. Since the variance of the efficient influence curve divided by n times the variance of an asymptotically efficient estimator converges to 1 when the sample size converges to infinity, one can also think of the information I as the reciprocal of the variance of an efficient estimator of the target parameter. Thus, sparsity with respect to a particular target parameter corresponds with small sample size relative to the variance of the efficient influence curve for that target parameter.

Section 2 of the paper provides background on the application of TMLE methodology in the context of sparsity, and its power relative to other semi-parametric efficient estimators by being a substitution estimator respecting global constraints of the semi-parametric model. Even though an estimator can be asymptotically efficient without utilizing global constraints, the global constraints are instrumental in the context of sparsity with respect to the target parameter, motivating the need for semi-parametric efficient substitution estimators, and for a careful choice of fluctuation function for the targeted MLE step that fully respects these global constraints. A rigorous demonstration of the proposed targeted MLE of the causal effect of a binary treatment on a bounded continuous outcome follows, and it is contrasted to a targeted MLE that makes use a fluctuation function that does not respect the bounds.

Simulation studies described in Section 3 compare the new TMLE estimator of the causal effect, which relies on a logistic fluctuation of an initial density estimate, with the traditional TMLE estimator, with and without sparsity in the data. Results for other commonly applied estimators, the inverse-probability-of-treatment weighted estimator (IPTW) (Hernan et al., 2000; Robins, 2000b), a double robust augmented IPTW estimator (aug-IPTW) (Robins and Rotnitzky, 2001; Robins et al., 2000; Robins, 2000a) that is efficient but not a substitution estimator, and the maximum likelihood substitution estimator according to a parametric model (MLE) (Robins, 1986) are also presented.

2. TMLE for causal effect estimation on a continuous outcome

The targeted MLE is a semi-parametric efficient substitution estimator of a target parameter Ψ(P0) of a true distribution P0𝒨, based on sampling n i.i.d. O1, …, On from P0. Here P0 is known to be an element of a semi-parametric statistical model 𝒨. We will start with providing a succinct summary of how it works. For more details we refer to our articles on this topic (van der Laan et al., 2009).

Firstly, one notes that Ψ(P0) = Ψ(Q0) only depends on P0 through a relevant part Q0 = Q(P0) of P0. Secondly, one proposes a loss function L(Q)(O) so that Q0 = arg minQ𝒬 E0L(Q)(O), where 𝒬 = {Q(P) : P𝒨}. Thirdly, one uses minimum loss-based learning, such as super learning (van der Laan et al., 2007), fully utilizing the power and optimality results for loss-based cross-validation to select among candidate estimators, to obtain an initial estimator Qn0 of Q0. Fourthly, one proposes a parametric fluctuation Qng0(ε), possibly indexed by nuisance parameter g0 = g(P0), so that

ddεL(Qng0(ε))(O)|ε=0=D*(Qn0,g)(O), (1)

where D* (Q0, g0) is the canonical gradient/efficient influence curve of Ψ : 𝒨 → ℝ at P0. Fifthly, one computes the amount of fluctuation

εn=argminεi=1nL(Qngn0(ε))(Oi),

where gn is an estimator of the unknown nuisance parameter g0. This yields an update Qn1=Qngn0(εn). This updating of an initial estimator Qn0 into a next Qn1 is iterated till convergence resulting in a Qn*. Since at the last step the amount of fluctuation εn ≈ 0, this final Qn* will solve the efficient influence curve estimating equation

0=i=1nD*(Qn*,gn)(Oi),

representing a fundamental ingredient for establishing asymptotic efficiency of Ψ(Qn*): recall that an estimator is efficient if and only if it is asymptotically linear with influence curve equal to the efficient influence curve D* (Q0, g0). Finally, the targeted MLE of ψ0 is the substitution estimator Ψ(Qn*).

Thus we see that the targeted MLE involves constructing a parametric model Qn0(ε) through the initial estimator Qn0 with parameter ε representing an amount of fluctuation of the initial estimator, where the score of this fluctuation model at ε = 0 equals the efficient influence curve. The latter constraint can be satisfied by many parametric models, since it represents only a local constraint of its behavior at zero fluctuation. However, it is very important that the fluctuations stay within the model for the observed data distribution, even if the parameter can be defined on fluctuations that fall outside the assumed observed data model. In particular, in the context of sparse data, a violation of this property can heavily affect the performance of the estimator.

One important strength of the semi-parametric efficient targeted MLE relative to the alternative semi-parametric efficient estimating equation methodology (van der Laan and Robins, 2003) is that it does respect the global constraints of the observed data model since it is a substitution estimator Ψ(Qn*) with Qn* an estimator of a relevant part Q0 of the true distribution of the data in the observed data model. The estimating equation methodology does not result in substitution estimators and thereby often ignores important global constraints of the observed data model, though Tan (2008) introduces a non-parametric likelihood based approach to constructing a double robust estimator that is not a substitution estimator, and offers a comparison with other estimators, including TMLE that is not constrained to remain within the bounds of the observed data model. Ignoring constraints comes at a price in the context of sparsity. Indeed, simulations have confirmed this gain of targeted MLE relative to the efficient estimating equation method in the context of sparsity (Stitelman and van der Laan, 2010), and it is again demonstrated in this article. However, if the targeted MLE starts violating this principle of being a substitution estimator by allowing Qn* to fall outside the assumed observed data model, this advantage is compromised. Therefore, it is crucial that a fluctuation model is used that is guaranteed to stay within the wished observed data model.

To demonstrate this important consideration of selecting a valid fluctuation model in the construction of targeted MLE, we consider the problem of estimating a causal effect of a binary treatment A on a continuous outcome Y, based on observing n i.i.d. copies of O = (W, A, Y) ∼ P0, where W is the set of confounders. Under non-parametric structural equation model (NPSEM) W = fW (UW), A = fA (W, UA), Y = fY (W, A, UY) with a structure on the exogenous variables U = (UW, UA, UY) satisfying the no unmeasured confounders assumption (AY (a) | W for the counterfactuals Y (a) defined by this NPSEM), the additive causal effect E(Y (1) – Y (0)) can be identified from the observed data distribution through the following statistical parameter of P0:

Ψ(P0)=E0(E0(Y|A=1,W)E0(Y|A=0,W)).

Suppose that it is known that Y ∈ [a, b] for some a < b. Alternatively, one might have truncated the original data to fall in such an interval and focus on the causal effect of treatment on this truncated outcome, motivated by the fact that estimating conditional means of unbounded, or very heavy tailed, outcomes requires very large data sets.

Let Y* = (Ya)/(ba) be the linearly transformed outcome within [0, 1], and define

Ψ*(P0)=E0(E0(Y*|A=1,W)E0(Y*|A=0,W)).

We note that

Ψ(P0)=(ba)Ψ*(P0).

An estimate, limit distribution, and confidence interval for Ψ* (P0) is now immediately mapped into an estimate, limit distribution, and confidence interval for Ψ(P0), by simple multiplication by (ba). As a consequence, without loss of generality, we can assume a = 0 and b = 1 so that Y ∈ [0, 1].

The efficient influence curve of the statistical parameter Ψ : 𝒨 → ℝ, defined on a non-parametric statistical model 𝒨 for P0, at the true distribution P0, is given by

D*(P0)=2A1g0(A|W)(YQ¯0(W,A))+Q¯0(W,1)Q¯0(W,0)Ψ(Q0),

where 0(W, A) = E0(Y | A, W), and Q0 = (QW, 0) denotes both this conditional mean 0 as well as the marginal distribution QW of W. Note that indeed Ψ(P0) only depends on P0 through 0 and the marginal distribution of W. We will use the notation Ψ(P0) and Ψ(Q0) interchangeably.

We will now define a targeted MLE of Ψ(Q0) as follows. Let Q¯n0 be an initial estimator of 0(W, A) = E(Y | A, W) with predicted values in (0, 1). In addition, we estimate PW with the empirical distribution of W. Let Qn0 denote the resulting initial estimator of Q0. The targeted MLE step will also require an estimator gn of g0 = PA|W. Only the conditional mean Q¯n0 will be modified by the targeted MLE procedure defined below: this makes sense since the empirical distribution of W is already a non-parametric maximum likelihood estimator so that no bias gain with respect to the target parameter will be obtained by modifying it.

We can represent the estimator Q¯n0 as Q¯n0=11+exp(fn0) with fn0=log(Q¯n0/(1Q¯n0)). Consider now the fluctuation model

Q¯n0(ε)=11+exp({fn0+εh}),

with parameter ε, indexed by a function

h(gn)(W,A)=2A1gn(A|W).

Equivalently, we can write this as logitQ¯n0(ε)=logitQ¯n0+εh(gn).

Consider now the following loss function for 0:

L(Q¯)(O)=YlogQ¯(W,A)+(1Y)log(1Q¯(W,A)).

Note that this is the log-likelihood of the conditional distribution of a binary outcome Y, but now extended to continuous outcomes in [0, 1]. (See also Wedderburn (1974), McCullagh (1983) for earlier use of logistic regression for continuous outcomes.) It is thus known that this loss function is a valid loss function for the conditional distribution of a binary Y, but we need that it is a valid loss function for a conditional mean of a continuous Y ∈ [0, 1]. We have the following lemma establishing this result about this loss function.

Lemma 1 We have that

Q¯0=argminQ¯E0L(Q¯),

where the minimum is taken over all functions of (W, A) which map into (0, 1). In addition, define the fluctuation function

logitQ¯(ε)=logitQ¯+εh.

For any function h we have

ddεL(Q¯(ε))|ε=0=h(W,A)(YQ¯(W,A)).

Proof: Let 1 be a local minimum and consider the fluctuation 1(ε) defined above. Then the derivative of E0L(1 (ε)) at ε = 0 equals zero. However,

ddεL(Q¯1(ε))|ε=0=h(W,A)(YQ¯1(W,A)).

Thus, it follows that

E0h(W,A)(YQ¯1(W,A))=E0h(W,A)(Q¯0Q¯1)(W,A).

But this needs to hold for any function h(W, A), which proves that 1 = 0 a.e.

This proves that L() is a valid loss function for the conditional mean 0. Indeed, we can use L() as loss function to construct an initial estimator of 0, and or use cross-validation to select among candidate targeted maximum likelihood estimators, such as in the collaborative targeted MLE procedure. For the purpose of construction of an initial estimator one could also use a minimum loss-based super learner based on the squared error loss function L2() = (Y – Q̄(W, A))2, possibly with weights.

Given an initial estimator Q¯n0, and our proposed fluctuation function Q¯n0(ε), we have

ddεL(Q¯n0(ε))|ε=0=h(g)(W,A)(YQ¯n0(W,A)),

giving us the wished first component D1* of the efficient influence curve D*=D1*+D2*.

Let’s use the log-likelihood loss function, −logQW, as loss function for the marginal distribution of W, so that our combined loss function is given by L(Q) = −logQW + L(). In addition, we use as fluctuation of the empirical distribution QWn, QWn(ε1)=(1+ε1D2*(Q))QWn, where D2*(Q)=Q¯(W,1)Q¯(W,0)Ψ(Q) is the remaining component of the efficient influence curve. With these choices we indeed now have that

ddεL(Q(ε))|ε=0=D*(Q,g).

This shows that we succeeded in defining a loss function for Q0 = (QW, 0) and fluctuation function so that the wished derivative (1) indeed yields the efficient influence curve.

The MLE of ε1 equals zero, so that the update of QWn equals QWn itself. The empirical mean of the component D2*=Q¯(W,1)Q¯(W,0)Ψ(Q) of the efficient influence curve is always equal to zero, due to the fact that we estimate the marginal distribution of W with the empirical distribution of W.

The amount of fluctuation of ε for fluctuating Q¯n0 is given by

εn0=argminεPnL(Q¯n0(ε)).

This “maximum likelihood” estimator of ε can be computed with generalized linear regression using the binomial link, i.e. the logistic regression MLE procedure, simply ignoring that the outcome is not binary, which also corresponds with iterative re-weighted least squares estimation using weights 1/(1 – ).

This provides us with the targeted MLE update Qn1=Qn0(εn0), where the empirical distribution of W did not get updated, and Q¯n0 did get updated as Q¯n0(εn0). Iterating this procedure now defines the targeted MLE Qn*, but as in the binary outcome case, we have that Qn2=Qn1(εn1)=Qn1 since the next MLE εn1=0. Thus convergence occurs in one step, so that Qn*=Qn1. The targeted MLE of ψ0 is thus given by Ψ(Qn*)=Ψ(Qn1). As predicted, we have that the targeted MLE Qn* solves the efficient influence curve estimating equation PnD*(Qn*,gn,Ψ(Qn*))=0.

An inspection of this efficient influence curve,

D*(P0)=2A1g0(A|W)(YQ¯0(W,A))+Q¯0(W,1)Q¯0(W,0)Ψ(Q0),

reveals that there are two potential sources of sparsity. Small values for g0(A | W) and large outlying values of Y inflate the variance. Enforcing (e.g., known) bounds on Y and g0 in the estimation procedure provides a means for controlling these sources of variance. We note that, even if there is strong confounding causing some large values of hgn0, the resulting targeted MLE Q¯n* remains bounded in (0, 1), so that the targeted MLE Ψ(Qn*) fully respects the global constraints of the observed data model. On the other hand, the augmented IPTW estimator obtained by solving PnD*(Qn0,gn,ψ)=0 in ψ yields the estimator

ψn=1ni=1nhgn0(Wi,Ai)(YiQ¯n0(Wi,Ai))+Q¯n0(Wi,1)Q¯n0(Wi,0),

which can easily fall outside [0, 1] if for some observations Wi, gn(1 | Wi) is close to 1 or 0. This represents the price of not being a substitution estimator.

Contrasting with targeted MLE using linear fluctuation function. Alternatively, we would employ the targeted MLE using the L2() = (Y(W, A))2 loss function, and fluctuation function 0(ε) = 0 + εh(g), so that (1) is still satisfied. In this case, large values of h(g) will result in predicted values of 0(εn) that are out of the bounds [a, b]. Therefore, this version of targeted MLE is not respecting the global constraints of the model, i.e., the knowledge that Y ∈ [a, b]. A comparison based on simulated data of the targeted MLE using the logistic fluctuation function and the targeted MLE using this linear fluctuation function is provided in the next section.

3. Simulation studies for the additive effect of a binary point treatment on a continuous outcome

Two simulation studies illustrate the effects of employing a logistic vs. linear fluctuation on TMLE estimator performance with and without sparsity in the data, where a high degree of sparsity corresponds to a target parameter that is borderline-identifiable. As above, the parameter of interest is defined as the marginal effect of a binary point treatment on the outcome, ψ0 = EW (E(Y | A = 1, W) – E(Y | A = 0, W)).

The “traditional” targeted maximum likelihood approach to estimating an additive treatment effect when the outcome is continuous is to fluctuate the initial density estimate on a linear scale. Given Q¯n0(W,A), an initial estimate of the conditional mean of Y given (W, A), the fluctuation function is defined as Q¯n0(ε)=Q¯n0+ε(hgn) and the loss function L() is chosen to be the squared error loss function, so that we still have the required constraint (1). The estimate εn can be obtained by estimating ε with a linear regression of Y on hgn, using the initial fit, Q¯n0(W,A), as offset.

A second TMLE estimate using the logistic fluctuation method described in Section 2 is also obtained. Y is transformed into Y* ∈ [0, 1] by shifting and scaling the values. In the simulation setting, Y is not bounded, so that we do not have an a priori a and b bound on Y. Instead of truncating Y and redefining the target parameter as the causal effect on the truncated Y, we still aim to estimate the causal effect on the original Y. Therefore, we set a = min(Y), b = max(Y), and

Y*=Yaba.

An initial estimate, Q¯n0,Y*(W,A)=E(Y*|W,A), is obtained, and then represented as a logistic function of its logit-transformation. Note that logit(x) is not defined when x = 0 or 1, therefore in practice Q¯n0,Y*(W,A) is bounded away from 0 and 1 by truncating it at (α, (1 – α)). We used α = 0.005 in these simulation studies, which did not yield appreciably different results than setting α = 0.001 or α = 0.01. The function Q¯n0,Y* is fluctuated on the logit scale with logitQ¯n0,Y*(ε)=logitQ¯n0,Y*+εh(gn), using the same clever covariate, hgn (W, A), employed in the linear fluctuation described above. Fitting ε is again carried out using standard software, but this time using logistic regression of Y* on hgn (W, A) with offset logit(Qn0,Y*(W,A)). This results in the updated Q¯n1,Y*. Fitted values for Q¯n1,Y*(W,A) are mapped back to the original scale: Q¯n1,Y=Q¯n1,Y*(W,A)*(ba)+a. The marginal distribution is estimated with the empirical distribution of W, giving the Qn*=Qn1=(QW,n,Q¯n1,Y) of (QW, 0). The estimate

ψn=Ψ(Qn*)=1ni=1nQ¯n1,Y(Wi,1)Q¯n1,Y(Wi,0)

is the targeted MLE of the wished additive causal effect ψ0.

Parameter estimates were also obtained using the augmented inverse probability of treatment weighed estimator (aug-IPTW),

ψnaugIPTW=1ni=1n2A1gn(Ai|Wi)(YiQ¯n0(Wi,Ai))+1ni=1n(Q¯n0(Wi,1)Q¯n0(Wi,0)).

Both the targeted MLE and the augmented IPTW estimator are double robust so that these estimators will be consistent for ψ0 if either gn or Q¯n0 is consistent for g0 and 0, respectively. Both the targeted MLE and the augmented IPTW estimator are asymptotically efficient if both gn and Q¯n0 are consistent.

In this simulation study we will use simple parametric MLE’s as initial estimators Q¯n0 and gn, even though we recommend the utilization of super learning in practice. The purpose of this simulation is to investigate the performance of the updating step under misspecified and correctly specified Q¯n0, and for that purpose we can work with parametric MLE fits.

Results from two estimation methods that are not double robust and semi-parametric efficient are included as well. The maximum likelihood estimator according to a parametric model for 0 (MLE), used as initial estimator in the targeted MLE and augmented IPTW, is included for the sake of evaluating the bias reduction step carried out by these two semi-parametric efficient procedures. Inverse probability of treatment weighted (IPTW) estimators are consistent when gn(A | W) is a consistent estimator of the treatment mechanism g0(A | W) = P (A = 1|W), but are known to be inefficient. These two estimators are defined as

ψnMLE=1ni1nQ¯n0(Wi1)Q¯n0(Wi,0),ψnIPTW=1ni=1n(2A1)Yign(Ai|Wi).

3.1. Data generation

Covariates W1, W2, W3 were generated as independent binary random variables,

W1,W2,W3Bernoulli(0.5).

Two treatment mechanisms were defined that differ only in the values of the coefficients for each covariate:

g0(1|W)=P(A=1|W)=logit1(aW1+bW2+cW3).

We consider two settings:

a1=0.5,b1=1.5,c1=1anda2=1.5,b2=4.5,c2=3.

We refer to these two treatment mechanisms as g0,1 and g0,2, respectively. The observed outcome Y was generated as

Y=Q¯0(W,A)+e,eN(0,1),Q¯0(W,A)=Aj+2W1+3W24W3.

For both simulations the true additive causal effect equals one: ψ0 = 1. In both simulations predicted values for gn(A | W) are bounded away from 0 and 1 by truncating at (p, (1 – p)), with p = 0.01. Treatment assignment probabilities based on mechanism g0,1 range from 0.269 to 0.881, indicating no sparsity in the data for simulation 1. In contrast, simulation 2 poses a challenging estimation problem in the context of sparse data. Treatment assignment probabilities based on mechanism g0,2 range from 0.047 to 0.998. These extreme values are nevertheless not uncommon for data from observational studies (see for example Dehejia and Wahba (2002); Stukel et al. (2007)).

Estimates were obtained for 1000 samples of size n = 1000 from each data generating distribution. Treatment assignment probabilities, g0(A | W), were estimated using a correctly specified logistic regression model. A correctly specified main terms regression model was used to obtain Q¯cor0(W,A). In addition, a misspecified initial estimate, Q¯mis0(W,A), was obtained by regressing Y on A.

We expect MLE estimates based on Q¯cor0 to be unbiased and efficient, while those based on Q¯mis0 will be biased. IPTW estimates only depend on consistent estimation of g0, thus are identical regardless of how 0 is estimated. For both simulations gn is a consistent estimator, thus it is reasonable to expect unbiased IPTW estimates, with more variation in simulation 2 estimates. The targeted MLE and the augmented IPTW are known to be unbiased if gn is consistent, and asymptotically efficient when both 0 and g0 are consistently estimated. Though correctly estimating g0 will asymptotically correct for any bias due to mis-specification of Q¯n0, this is not guaranteed in finite samples, especially when there is sparsity. For simulation 2 we expect TMLElog, using the logistic fluctuation, to outperform TMLElin, using the linear fluctuation.

3.2. Results

Table 1 reports the average estimate, bias, empirical variance, and mean squared error (MSE) for each estimator, under different specifications of the initial estimator Q¯n0. In all cases gn is consistent, and bounded at (0.01, 0.99). In simulation 1, when 0 is correctly estimated all estimators perform quite well, though as expected, IPTW is the least efficient. However, when 0 is incorrectly estimated, the MLE estimator is biased and has high variance relative to the other estimators. Because gn(A | W) is correctly specified, IPTW and aug-IPTW provide unbiased estimates, as do both TMLEs. TMLElog is on a par with TMLElin, as there is no sparsity in the data, and both are more efficient than any of the other estimators.

Table 1:

Estimator performance for simulations 1 and 2 when the initial estimator of 0 is correct and misspecified. Results are based on 1000 samples of size n = 1000.

0correctly estimated
0incorrectly estimated
ave bias var MSE ave bias var MSE
Simulation 1
  MLE 1.003 0.003 0.005 0.005 3.075 2.075 0.030 4.336
  IPTW 1.006 0.006 0.009 0.009 1.006 0.006 0.009 0.009
  aug-IPTW 1.003 0.003 0.005 0.005 1.005 0.005 0.010 0.010
  TMLElog 0.993 −0.007 0.005 0.005 0.993 −0.007 0.006 0.006
  TMLElin 0.993 −0.007 0.005 0.005 0.993 −0.007 0.006 0.006

Simulation 2
  MLE 1.001 0.001 0.009 0.009 4.653 3.653 0.025 13.370
  IPTW 1.554 0.554 0.179 0.485 1.554 0.554 0.179 0.485
  aug-IPTW 0.999 −0.001 0.023 0.023 1.708 0.708 0.298 0.798
  TMLElog 0.989 −0.011 0.037 0.037 0.722 −0.278 0.214 0.291
  TMLElin 0.986 −0.014 0.042 0.042 −0.263 −1.263 2.581 4.173

In simulation 2 all estimators except IPTW are unbiased when 0 is correctly estimated. In this case, both TMLE estimators have higher variance than aug-IPTW, and all three are more efficient than IPTW, but less efficient than the parametric MLE estimator. Though asymptotically the IPTW estimator is expected to be unbiased in this simulation, since gn is a consistent estimator of g02, these results demonstrate that in finite samples, heavily weighting a subset of observations not only increases variance, but can also bias the estimate.

When the model for 0 is misspecified in simulation 2, The MLE estimator is even more biased than it was in simulation 1. The efficiency of all three double-robust efficient estimators suffers in comparison with simulation 1 as well. Nevertheless, TMLElog, using the logistic fluctuation, has the lowest MSE of all estimators. Its superiority over TMLElin in terms of bias and variance is clear. TMLElog also outperforms aug-IPTW with respect to both bias and variance, and performs much better than IPTW or MLE.

4. Discussion

When an estimation procedure incorporates weights, observations with large weights can heavily influence the point estimate and inflate the variance. Truncating these weights is a common approach to reducing the variance, but it generally introduces bias. The presented TMLE of an additive causal effect of a point treatment intervention, incorporating a logistic fluctuation of the initial conditional mean estimate, dampens the effect of these heavily weighted observations, thereby heavily reducing the reliance on truncation. As a substitution estimator, the proposed TMLE of the additive causal effect respects the global constraints of the observed data model. Simulation study results indicate that this approach is on a par with, and in the context of sparsity often superior to, fluctuating on the linear scale. In particular it is more robust when there is sparsity in the data, outperforming MLE, IPTW, and aug-IPTW.

For the sake of demonstration we considered estimation of the additive causal effect. However, the same targeted MLE, using the logistic fluctuation, can be used to estimate other point-treatment causal effects, including parameters of a marginal structural model. The newly proposed loss function also has applications in prediction of a bounded outcome, and for targeted MLE of the causal effect of a multiple time point intervention in which the final outcome is bounded and continuous. We also pointed out that the proposed fluctuation function and loss function, and corresponding targeted MLE, should also be used for continuous outcomes for which no a priori bounds are known, by simply using the minimal and maximal observed outcome values. In this way, these choices naturally robustify the targeted MLE by enforcing that the updated initial estimator will not predict outcomes outside the observed range.

The TMLE approach presented here using a logistic fluctuation of an initial estimate of the conditional mean of the continuous outcome retains all properties of targeted maximum likelihood estimators, including influence curve-based inference. The method presented here extends to collaborative targeted maximum likelihood estimation without modification.

Contributor Information

Susan Gruber, University of California, Berkeley.

Mark J. van der Laan, University of California, Berkeley

References

  1. Dehejia RH, Wahba S. Propensity score matching methods for nonexperimental causal studies. The Review of Economics and Statistics. 2002;84:151–61. doi: 10.1162/003465302317331982. [DOI] [Google Scholar]
  2. Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–570. doi: 10.1097/00001648-200009000-00012. [DOI] [PubMed] [Google Scholar]
  3. McCullagh P. Quasi-likelihood functions. Annals of Statistics. 1983;11:59–67. doi: 10.1214/aos/1176346056. [DOI] [Google Scholar]
  4. Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer”. Statistica Sinica. 2001;11(4):920–936. [Google Scholar]
  5. Robins JM, Rotnitzky A, van der Laan MJ. Comment on “On Profile Likelihood” by S.A. Murphy and A.W. van der Vaart. Journal of the American Statistical Association – Theory and Methods. 2000;450:431–435. doi: 10.2307/2669381. [DOI] [Google Scholar]
  6. Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. doi: 10.1016/0270-0255(86)90088-6. [DOI] [Google Scholar]
  7. Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. Proceedings of the American Statistical Association. 2000a [Google Scholar]
  8. Robins JM. Statistical models in epidemiology, the environment, and clinical trials (Minneapolis, MN, 1997) Springer; New York: 2000b. Marginal structural models versus structural nested models as tools for causal inference; pp. 95–133. [Google Scholar]
  9. Stitelman OM, van der Laan MJ. University of California; Berkeley: 2010. Collaborative targeted maximum likelihood for time to event data. Technical Report 260, Division of Biostatistics, [DOI] [PubMed] [Google Scholar]
  10. Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: Effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 2007;297:278–85. doi: 10.1001/jama.297.3.278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Tan Z. Bounded, efficient, and doubly robust estimation with inverse weighting. Biometrika. 2008;94:1–22. [Google Scholar]
  12. van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer; New York: 2003. [Google Scholar]
  13. van der Laan MJ, Rubin D. Targeted maximum likelihood learning. The International Journal of Biostatistics. 2006;2(1) doi: 10.2202/1557-4679.1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. van der Laan MJ, Polley E, Hubbard A. Super learner. Statistical Applications in Genetics and Molecular Biology. 2007;6(25) doi: 10.2202/1544-6115.1309. ISSN 1. [DOI] [PubMed] [Google Scholar]
  15. van der Laan MJ, Rose S, Gruber S. University of California; Berkeley: Sep, 2009. Readings in targeted maximum likelihood estimation. Technical report 254, Division of Biostatistics, [Google Scholar]
  16. Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika. 1974;6:1. [Google Scholar]

Articles from The International Journal of Biostatistics are provided here courtesy of Berkeley Electronic Press

RESOURCES