Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Anastasios A Tsiatis; Marie Davidian

doi:10.1214/07-STS227

. Author manuscript; available in PMC: 2008 May 29.

Published in final edited form as: Stat Sci. 2007;22(4):569–573. doi: 10.1214/07-STS227

Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Anastasios A Tsiatis ¹, Marie Davidian ²

PMCID: PMC2397555 NIHMSID: NIHMS48048 PMID: 18516239

INTRODUCTION

We congratulate Drs. Kang and Schafer (KS henceforth) for a careful and thought-provoking contribution to the literature regarding the so-called “double robustness” property, a topic that still engenders some confusion and disagreement. The authors’ approach of focusing on the simplest situation of estimation of the population mean μ of a response y when y is not observed on all subjects according to a missing at random (MAR) mechanism (equivalently, estimation of the mean of a potential outcome in a causal model under the assumption of no unmeasured confounders) is commendable, as the fundamental issues can be explored without the distractions of the messier notation and considerations required in more complicated settings. Indeed, as the article demonstrates, this simple setting is sufficient to highlight a number of key points.

As noted eloquently by Molenberghs (2005), in regard to how such missing data/causal inference problems are best addressed, two “schools” may be identified: the “likelihood-oriented” school and the “weighting-based” school. As we have emphasized previously (Davidian, Tsiatis and Leon, 2005), we prefer to view inference from the vantage point of semi-parametric theory, focusing on the assumptions embedded in the statistical models leading to different “types” of estimators (i.e., “likelihood-oriented” or “weighting-based”) rather than on the forms of the estimators themselves. In this discussion, we hope to complement the presentation of the authors by elaborating on this point of view.

Throughout, we use the same notation as in the paper.

SEMIPARAMETRIC THEORY PERSPECTIVE

As demonstrated by Robins, Rotnitzky and Zhao (1994) and Tsiatis (2006), exploiting the relationship between so-called influence functions and estimators is a fruitful approach to studying and contrasting the (large-sample) properties of estimators for parameters of interest in a statistical model. We remind the reader that a statistical model is a class of densities that could have generated the observed data. Our presentation here is for scalar parameters such as μ, but generalizes readily to vector-valued parameters. If one restricts attention to estimators that are regular (i.e., not “pathological”; see Davidian, Tsiatis and Leon, 2005, page 263 and Tsiatis 2006, pages 26–27), then, for a parameter μ in a parametric or semiparametric statistical model, an estimator μ̂ for μ based on independent and identically distributed observed data z_i, i = 1, …, n, is said to be asymptotically linear if it satisfies

n^{1 / 2} (\hat{μ} - μ_{0}) = n^{- 1 / 2} \sum_{i = 1}^{n} ϕ (z_{i}) + o_{p} (1)

(1)

for ϕ(z) with E{ϕ(z)} = 0 and E {ϕ²(z)} < ∞, where μ₀ is the true value of μ generating the data, and expectation is with respect to the true distribution of z. The function ϕ(z) is the influence function of the estimator μ̂. A regular, asymptotically linear estimator with influence function ϕ(z) is consistent and asymptotically normal with asymptotic variance E{ϕ²(z)}. Thus, there is an inextricable connection between estimators and influence functions in that the asymptotic behavior of an estimator is fully determined by its influence function, so that it suffices to focus on the influence function when discussing an estimator’s properties. Many of the estimators discussed by KS are regular and asymptotically linear; in the sequel, we refer to regular and asymptotically linear estimators as simply “estimators.”

We capitalize on this connection by considering the problem of estimating μ in the setting in KS in terms of statistical models that may be assumed for the observed data, from which influence functions corresponding to estimators valid under the assumed models may be derived. In the situation studied by KS, the “full” data that would ideally be observed are (t, x, y); however, as y is unobserved for some subjects, the observed data available for analysis are z = (t, x, ty). As noted by KS, the MAR assumption states that y and t are conditionally independent given x; for example, P (t = 1|y, x) = P (t = 1|x). Under this assumption, all joint densities for the observed data have the form

p (z) = p {(y ∣ x)}^{I (t = 1)} p (t ∣ x) p (x),

(2)

where p(y|x) is the density of y given x, p(t|x) is the density of t given x, and p(x) is the marginal density of x. Let p₀(z) be the density in the class of densities of form (2) generating the observed data (the true joint density).

One may posit different statistical models by making different assumptions on the components of (2). We focus on three such models:

Make no assumptions on the forms of p(x) or p(t|x), leaving these entirely unspecified. Make a specific assumption on p(y|x), namely, that E(y|x) = m(x, β) for some given function m(x, β) depending on parameters β (p × 1). Denote the class of densities satisfying these assumptions as $M_{I}$ .
Make no assumptions on the forms of p(x) or p(y|x). Make a specific assumption on p(t|x) that P (t = 1|x) = E(t|x) = π(x, α) for some given function π(x, α) depending on parameters α (s × 1). Here, we also require the assumption that P (t = 1|x) ≥ ε > 0 for all x and some ε. Denote the class of densities satisfying these assumptions as $M_{I I}$ .
Make no assumptions on the form of p(x), but make specific assumptions on p(y|x) and p(t|x), namely, that E(y|x) = m(x, β) and P (t = 1|x) = E(t|x) = π(x, α) ≥ ε > 0 for all x and some ε for given functions m(x, β) and p(x, α) depending on parameters β and α. The class of densities satisfying these assumptions is $M_{I} \cap M_{I I}$ .

All of I–III are semiparametric statistical models in that some aspects of p(z) are left unspecified. Denote by m₀(x) the true function E(y| x) and by π₀(x) the true function P (t = 1|x) = E(t|x) corresponding to the true density p₀(z).

Semiparametric theory yields the form of all influence functions corresponding to estimators for μ under each of the statistical models I–III. As discussed in Tsiatis (2006, page 52), loosely speaking, a consistent and asymptotically normal estimator for μ in a statistical model has the property that, for all p(z) in the class of densities defined by the model, $n^{1 / 2} (\hat{μ} - μ) \overset{D (p)}{\to} N {0, σ^{2} (p)}$ , where $\overset{D (p)}{\to}$ means convergence in distribution under the density p(z), and σ²(p) is the asymptotic variance of μ̂ under p(z).

If model I is correct, then m₀(x) = m(x, β) for some β, and it may be shown (e.g., Tsiatis, 2006, Section 4.5) that all estimators for μ have influence functions of the form

m_{0} (x) - μ + t a (x) {y - m_{0} (x)}

(3)

for arbitrary functions a(x) of x. If model II is correct, then π₀(x) = π(x, α) for some α, and all estimators for μ have influence functions of the form

\frac{t y}{π_{0} (x)} + \frac{t - π_{0} (x)}{π_{0} (x)} h (x) - μ

(4)

for arbitrary h(x), which is well known from Robins, Rotnitzky and Zhao (1994). If model III is correct, then m₀(x) = m(x, β) and p₀(x) = π(x, α) for some β and α, and influence functions for estimators μ̂ have the form

\begin{array}{l} m_{0} (x) - μ + t a (x) {y - m_{0} (x)} \\ + \frac{t - π_{0} (x)}{π_{0} (x)} h (x) \end{array}

(5)

for arbitrary a(x) and h(x). Depending on forms of m(x, β) as a function of β and π(x, α) as a function of α, there will be restrictions on the forms of a(x) and h(x); see below.

We now consider estimators discussed by KS from the perspective of influence functions. The regression estimator μ̂_OLS in (7) of KS comes about naturally if one assumes model I is correct. In terms of influence functions, μ̂_OLS may be motivated by considering the influence function (3) with a(x) = 0, as this leads to the estimator $n^{- 1} \sum_{i = 1}^{n} m (x_{i}, β)$ . In fact, although KS do not discuss it, the “imputation estimator” ${\hat{μ}}_{IMP} = n^{- 1} \sum_{i = 1}^{n} {t_{i} y_{i} + (1 - t_{i}) m (x_{i}, β)}$ may be motivated by taking a(x) = 1 in (3). Of course, in practice, β must be estimated. In general, (3) implies that all estimators for μ that are consistent and asymptotically normal if model I is correct must be asymptotically equivalent to an estimator of the form

n^{- 1} \sum_{i = 1}^{n} [m (x_{i}, \hat{β}) + t_{i} \tilde{a} (x_{i}) {y_{i} - m (x_{i} \hat{β})}],

(6)

where β is estimated by solving an estimating equation $\sum_{i = 1}^{n} t_{i} A (x_{i}, β) {y_{i} - m (x_{i}, β)} = 0 for A (x, β) (p \times 1)$ . Because β is estimated, the influence function of the estimator (6) with a particular ã(x) will not be exactly equal to (3) with a(x) = ã(x); instead, it may be shown that the influence function of (6) is of form (3) with a(x) in (3) equal to

\begin{array}{l} \tilde{a} (x) - E [{π_{0} (x) \tilde{a} (x) - 1} m_{β}^{T} (x, β_{0})] \\ \cdot {[E {π_{0} (x) A (x, β_{0}) m_{β}^{T} (x, β_{0})}]}^{- 1} \\ \cdot A (x, β_{0}) . \end{array}

(7)

where m_β (x, β) is the vector of partial derivatives of elements of m(x, β) with respect to β, and β₀ is such that m₀(x) = m(x, β₀).

The IPW estimator μ̂_IPW_-_POP in (3) of KS and its variants arise if one assumes model II. In particular, μ̂_IPW_-_POP can be motivated via the influence function (4) with h(x) = −μ. The estimator μ̂_IPW_-_NR in (4) of KS follows from (4) with h(x) = −E[y {1 − π(x)}]/E[{1 − π(x)}]. In fact, if one restricts h(x) in (4) to be a constant, then, using the fact that the expectation of the square of (4) is the asymptotic variance of the estimator, one may find the “best” such constant minimizing the variance as h(x) = −E[y {1 − π(x)}/π(x)]/E [{1 − π(x)}/π(x)]. An estimator based on this idea was given in (10) of Lunceford and Davidian (2004, page 2943). In general, as for model I, (4) implies that all estimators for μ that are consistent and asymptotically normal if model II is correct must be asymptotically equivalent to an estimator of the form

n^{- 1} \sum_{i = 1}^{n} {\frac{t_{i} y_{i}}{π (x_{i}, \hat{α})} + \frac{t_{i} - π (x_{i}, \hat{α})}{π (x_{i}, \hat{α})} \tilde{h} (x_{i})},

(8)

where α̂ is estimated by solving an equation of the form $\sum_{i = 1}^{n} {t_{i} - π (x_{i}, α)} B (x_{i}, α) = 0$ for some (s × 1) B(x_i, α), almost always maximum likelihood for binary regression. As above, because α is estimated, the influence function of (8) is equal to (4) with h(x) equal to

\begin{array}{l} \tilde{h} (x) - E [π_{α}^{T} (x, α_{0}) {m_{0} (x) + \tilde{h} (x)} / π_{0} (x)] \\ \cdot {[E {B (x, α_{0}) π_{α}^{T} (x, α_{0})}]}^{- 1} \\ \cdot B (x, α_{0}) π_{0} (x), \end{array}

(9)

where π_α (x, α) is the vector of partial derivatives of elements of π(x, α) with respect to α, and α₀ satisfies π₀(x) = π(x, α₀).

Doubly robust (DR) estimators are estimators that are consistent and asymptotically normal for models in $M_{I} \cup M_{I I}$ that is, under the assumptions of model I or model II. When the true density $p_{0} (z) \in M_{I} \cap M_{I I}$ , then the influence function of any such DR estimator must be equal to (3) with a(x) = 1/π₀(x) or, equivalently, equal to (4) with h(x) = −m₀(x). Accordingly, when $p_{0} (z) \in M_{I} \cap M_{I I}$ , that is, both models have been specified correctly, all such DR estimators will have the same asymptotic variance. This also implies that, if both models are correctly specified, the asymptotic properties of the estimator do not depend on the methods used to estimate β and α.

KS discuss strategies for constructing DR estimators, and they present several specific examples: μ̂_BC_-_OLS in their equation (8); the estimators below (8) using POP or NR weights, which we denote as μ̂_BC_-_POP and μ̂_BC_-_NR, respectively; the estimator μ̂ _WLS in their equation (10); μ̂_π_-_cov in their equation (12); and a version of μ̂_π_-_cov equal to the estimator proposed by Scharfstein, Rotnitzky and Robins (1999) and Bang and Robins (2005), which we denote as μ̂ _SRR. The results for these estimators under the “Correct-Correct” scenarios ( $M_{I} \cap M_{I I}$ ) in Tables 5–8 of KS are consistent with the asymptotic properties above. We note that μ_π _-_cov is not DR under $M_{I} \cup M_{I I}$ because of the additional assumption that the mean of y given π must be equal to a linear combination of basis functions in p. Making this additional assumption may not be unreasonable in practice; however, strictly speaking, it takes μ_π _-_cov outside the class of DR estimators discussed here, and hence we do not consider it in the remainder of this section. However, μ̂_SRR is still in this class.

KS suggest that a characteristic distinguishing the performance of DR estimators is whether or not the estimator is within or outside the augmented inverse-probability weighted (AIPW) class. We find this distinction artificial, as all of the above estimators μ̂_BC_-_OLS, μ̂_BC_-_POP, μ̂_BC_-_NR, μ̂_WLS and μ̂_SRR can be expressed in an AIPW form. Namely, all of these estimators are algebraically exactly of the form (8) with h̃(x_i ) replaced by a term -γ̂ - m(x_i, β̂), where γ̂_BC_-_OLS = γ̂_WLS = γ̂_SRR = 0,

\begin{array}{l} \hat{γ} B C - POP \\ = \frac{n^{- 1} \sum_{i = 1}^{n} (t_{i} / {\hat{π}}_{i}) (y_{i} - {\hat{m}}_{i})}{n^{- 1} \sum_{i = 1}^{n} t_{i} / {\hat{π}}_{i}} and \\ \hat{γ} B C - N R \\ = \frac{n^{- 1} \sum_{i = 1}^{n} (t_{i} (1 - {\hat{π}}_{i}) / {\hat{π}}_{i}) (y_{i} - {\hat{m}}_{i})}{n^{- 1} \sum_{i = 1}^{n} t_{i} (1 - {\hat{π}}_{i}) / {\hat{π}}_{i}}, \end{array}

(10)

where we write π̂_i = π(x_i,α̂) and m̂_i = m(x_i, β̂) for brevity. For μ̂_WLS and μ̂_SRR, this identity follows from the fact that $\sum_{i = 1}^{n} \frac{t_{i}}{{\hat{π}}_{i}} (y_{i} - {\hat{m}}_{i}) = 0$ , which for μ̂_WLS holds because KS restrict to m(x, β) = x^T β, with x including a constant term. Thus, we contend that issues of performance under $M_{I} \cup M_{I I}$ are not linked to whether or not a DR estimator is AIPW, but, rather, are a consequence of forms of the influence functions of estimators under $M_{I}$ or $M_{I I}$ . In particular, under model II, it follows that the above estimators have influence functions of the form (4) with h(x) equal to (9) with h̃(x) = −{γ* + m(x, β*)}, where γ* and β* are the limits in probability of γ̂ and β̂, respectively. Thus, features determining performance of these estimators when model II is correct are how close γ* + m(x, β*) is to m₀(x) and how α is estimated, where maximum likelihood is the optimal choice. In fact, this perspective reveals that, for fixed m(x, β), using ideas similar to those in Tan (2006), the optimal choice of γ̂ is as in γ̂_BC-NR with t_i (1 − π̂_i)/π̂ replaced by $t_{i} (1 - {\hat{π}}_{i}) / {\hat{π}}_{i}^{2}$ .

Similarly, under model I, the influence functions of these estimators are of the form (3) with a(x) equal to (7) with ã(x) = ψ₁/π(x, α*) + ψ₂, where α* is the limit in probability of α̂ and ψ₁ = 1 and ψ₂ = 0 for μ̂_BC_-_OLS, μ̂_WLS and μ̂_SRR; ψ₁ = 1/E{π₀(x)/π(x, α*)} and ψ₂ = 0 for μ̂_BC_-_POP; and ψ₁ and ψ₂ for μ̂_BC_-_NR are more complicated expectations involving π₀(x) and π(x, α*). Thus, under model I, features determining performance of these estimators are the form of a~(x) and how β is estimated through the choice of A(x, β).

We may interpret some of the results in Tables 5, 6 and 8 of KS in light of these observations. Under the “π -model Correct–y-model Incorrect” scenario ( $M_{I I} \cap M_{I}^{c}$ ), μ̂_BC_-_OLS, μ̂_WLS and μ̂_SRR show some nontrivial differences in performance, which, from above, are likely attributable to differences in m(x, β*). Under the “π -model Incorrect–y-model Correct” ( $M_{I} \cap M_{I I}^{c}$ ) all three estimators share the same ã(x) but use different methods to estimate β, so that any differences are dictated entirely by the choice of A(x, β). The poor performance of μ̂_SRR can be understood from this perspective: “β” for this estimator is actually β in the model m(x, β) used by the other two estimators concatenated by an additional element, the coefficient of ${\hat{π}}_{i}^{- 1}$ . The A(x, β) for μ̂_SRR thus involves a design matrix that is unstable for small π̂_i, consistent with the comment of KS at the end of their Section 3.

In summary, we believe that studying the performance of estimators via their influence functions can provide useful insights. Our preceding remarks refer to large-sample performance, which depends directly on the influence function. Estimators with the same influence function can exhibit different finite-sample properties. It may be possible via higher-order expansions to gain an understanding of some of this behavior; to the best of our knowledge, this is an open question.

BOTH MODELS INCORRECT

The developments in the previous section are relevant in $M_{I} \cup M_{I I}$ . Key themes of KS are performance of DR and other estimators outside this class; that is, when both the models π(x, α) and m(x, β) are incorrectly specified, and choice of estimator under these circumstances.

One way to study performance in this situation is through simulation. KS have devised a very interesting and instructive specific simulation scenario that highlights some important features of various estimators. In particular, the KS scenario emphasizes the difficulties encountered with some of the DR estimators when π(x_i, α̂) is small for some x_i. Indeed, in our experience, poor performance of DR and IPW estimators in practice can result from few small π(x_i, α̂). When there are small π(x_i, α̂), as noted KS, responses are not observed for some portion of the x space. Consequently, estimators like μ̂_OLS rely on extrapolation into that part of the x space. KS have constructed a scenario where failure to observe y in a portion of the x space can wreak havoc on some estimators that make use of the π(x_i, α̂) but has minimal impact on the quality of extrapolations for these x based on m(x, β̂). One could equally well build a scenario where the x for which y is unobserved are highly influential for the regression m(x, β) and hence could result in deleterious performance of μ̂_OLS. We thus reiterate the remark of KS that, although simulations can be illuminating, they cannot yield broadly applicable conclusions.

Given this, we offer some thoughts on other strategies for deriving estimators that may have some robustness properties under the foregoing conditions, that is, offer good performance outside $M_{I} \cup M_{I I}$ . One approach may be to search outside the class of DR estimators valid under $M_{I} \cup M_{I I}$ . For example, as suggested by the simulations of KS, estimators in the spirit of μ̂_π _-_cov, which impose additional assumptions rendering them DR in the strict sense only in a subset of $M_{I} \cup M_{I I}$ , may compensate for this restriction by yielding more robust performance outside $M_{I} \cup M_{I I}$ ; further study along these lines would be interesting. An alternative tactic for searching outside $M_{I} \cup M_{I I}$ may be to consider the form of influence functions (5) for estimators valid under $M_{I} \cup M_{I I}$ . For instance, a “hybrid” estimator of the form

\begin{array}{l} n^{- 1} \sum_{i = 1}^{n} [m (x_{i}, \hat{β}) I {π (x_{i}, \hat{α}) < δ} \\ + {\frac{t_{i} y_{i}}{π (x_{i}, \hat{α})} + \frac{t_{i} - π (x_{i}, \hat{α})}{π (x_{i}, \hat{α})} \tilde{h} (x_{i})} \\ \cdot I {π (x_{i}, \hat{α}) \geq δ}], \end{array}

for δ small, may take advantage of the desirable properties of both μ^_OLS and DR estimators.

A second possible strategy for identifying robust estimators arises from the following observation. Consider the estimator

n^{- 1} \sum_{i = 1}^{n} {\frac{t_{i} y_{i}}{π (x_{i})} - \frac{t_{i} - π (x_{i})}{π (x_{i})} m (x_{i}, \hat{β})} .

(11)

If π(x_i ) = π(x_i, α̂), then (11) yields one form of a DR estimator. If π(x_i ) ≡ 1, then (11) results in the imputation estimator. If π(x_i ) = ∞, (11) reduces to μ̂_OLS. This suggests that it may be possible to develop estimators based on alternative choices of π(x_i ) that may have good robustness properties. For example, a method for obtaining estimators π(x_i, α̂) that shrinks these toward a common value may prove fruitful. The suggestion of KS to move away from logistic regression models for π(x_i, α) is in a similar spirit.

Finally, we note that yet another approach to developing estimators would be to start with the premise that one make no parametric assumption on the forms of E(y|x) and E(t| x) beyond some mild smoothness conditions. Here, it is likely that first-order asymptotic theory, as in the previous section, may no longer be applicable. It may be necessary to use higher-order asymptotic theory to make progress in this direction; see, for example, Robins and van der Vaart (2006).

CONCLUDING REMARKS

We again compliment the authors for their thoughtful and insightful article, and we appreciate the opportunity to offer our perspectives on this important problem. We look forward to new methodological developments that may overcome some of the challenges brought into focus by KS in their article.

Acknowledgments

This research was supported in part by Grants R01-CA051962, R01-CA085848 and R37-AI031789 from the National Institutes of Health.

Contributor Information

Anastasios A. Tsiatis, Anastasios A. Tsiatis is Drexel Professor of Statistics at North Carolina State University, Raleigh, North Carolina 27695-8203, USA (e-mail: tsiatis@stat.ncsu.edu)

Marie Davidian, Marie Davidian is William Neal Reynolds Professor of Statistics at North Carolina State University, Raleigh, North Carolina 27695-8203, USA (e-mail: davidian@stat.ncsu.edu).

References

Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–972. doi: 10.1111/j.1541-0420.2005.00377.x. MR2216189. [DOI] [PubMed] [Google Scholar]
Davidian M, Tsiatis AA, Leon S. Semiparametric estimation of treatment effect in a pretest-posttest study without missing data. Statist Sci. 2005;20:261–301. doi: 10.1214/088342305000000151. MR2189002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine. 2004;23:2937–2960. doi: 10.1002/sim.1903. [DOI] [PubMed] [Google Scholar]
Molenberghs G. Discussion of “Semiparametric estimation of treatment effect in a pretest–posttest study with missing data,” by M. Davidian, A. A. Tsiatis and S. Leon. Statist Sci. 2005;20:289–292. doi: 10.1214/088342305000000151. MR2189002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866. MR1294730. [Google Scholar]
Robins J, van der Vaart A. Adaptive nonparametric confidence sets. Ann Statist. 2006;34:229–253. MR2275241. [Google Scholar]
Scharfstein DO, Rotnitzky A, Robins JM. Rejoinder to “Adjusting for nonignorable drop-out using semiparametric nonresponse models”. J Amer Statist Assoc. 1999;94:1135–1146. MR1731478. [Google Scholar]
Tan Z. A distributional approach for causal inference using propensity scores. J Amer Statist Assoc. 2006;101:1619–1637. MR2279484. [Google Scholar]
Tsiatis AA. Semiparametric Theory and Missing Data. Springer; New York: 2006. MR2233926. [Google Scholar]

[R1] Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–972. doi: 10.1111/j.1541-0420.2005.00377.x. MR2216189. [DOI] [PubMed] [Google Scholar]

[R2] Davidian M, Tsiatis AA, Leon S. Semiparametric estimation of treatment effect in a pretest-posttest study without missing data. Statist Sci. 2005;20:261–301. doi: 10.1214/088342305000000151. MR2189002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine. 2004;23:2937–2960. doi: 10.1002/sim.1903. [DOI] [PubMed] [Google Scholar]

[R4] Molenberghs G. Discussion of “Semiparametric estimation of treatment effect in a pretest–posttest study with missing data,” by M. Davidian, A. A. Tsiatis and S. Leon. Statist Sci. 2005;20:289–292. doi: 10.1214/088342305000000151. MR2189002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Amer Statist Assoc. 1994;89:846–866. MR1294730. [Google Scholar]

[R6] Robins J, van der Vaart A. Adaptive nonparametric confidence sets. Ann Statist. 2006;34:229–253. MR2275241. [Google Scholar]

[R7] Scharfstein DO, Rotnitzky A, Robins JM. Rejoinder to “Adjusting for nonignorable drop-out using semiparametric nonresponse models”. J Amer Statist Assoc. 1999;94:1135–1146. MR1731478. [Google Scholar]

[R8] Tan Z. A distributional approach for causal inference using propensity scores. J Amer Statist Assoc. 2006;101:1619–1637. MR2279484. [Google Scholar]

[R9] Tsiatis AA. Semiparametric Theory and Missing Data. Springer; New York: 2006. MR2233926. [Google Scholar]

PERMALINK

Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Anastasios A Tsiatis

Marie Davidian

INTRODUCTION

SEMIPARAMETRIC THEORY PERSPECTIVE

BOTH MODELS INCORRECT

CONCLUDING REMARKS

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Anastasios A Tsiatis

Marie Davidian

INTRODUCTION

SEMIPARAMETRIC THEORY PERSPECTIVE

BOTH MODELS INCORRECT

CONCLUDING REMARKS

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases