On doubly robust estimation in a semiparametric odds ratio model

Eric J Tchetgen Tchetgen; James M Robins; Andrea Rotnitzky

doi:10.1093/biomet/asp062

. 2009 Dec 8;97(1):171–180. doi: 10.1093/biomet/asp062

On doubly robust estimation in a semiparametric odds ratio model

Eric J Tchetgen Tchetgen ¹, James M Robins ¹, Andrea Rotnitzky ²

PMCID: PMC3412601 PMID: 23049119

Abstract

We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).

Some key words: Doubly robust, Generalized odds ratio, Locally efficient, Semiparametric logistic regression

1. Introduction

Given a random vector O = (Y, A, L) the conditional odds ratio function γ (Y, A, y₀, a₀, L) between A and Y given L at a given base point (a₀, y₀) is

\begin{array}{l} γ (Y, A, L) & = & \frac{f (Y | A, L) f (y_{0} | a_{0}, L)}{f (y_{0} | A, L) f (Y | a_{0}, L)} \\ = & \frac{g (A | Y, L) g (a_{0} | y_{0}, L)}{g (a_{0} | Y, L) g (A | y_{0}, L)}, \end{array}

where the vectors Y and A can take either discrete values, continuous values, or a mixture of both, L is a high-dimensional vector of measured auxiliary covariates, (a₀, y₀) is a user specified point in the sample space and f (Y | A, L), g(A | Y, L) and h(A, Y | L) are, respectively, the conditional densities of Y given A and L, the conditional density of A given Y and L and the joint conditional density of A and Y given L with respect to a dominating measure μ. The odds ratio function is a particularly useful measure of association when Y and A take both discrete and continuous values. For instance, A and Y could each be a mixture of a discrete component encoding, say, the presence or absence of a given bacterium and a continuous component encoding the bacterial counts when it is present. In such a case, as argued by Chen (2007), a complete characterization of the association between bacterium A and bacterium Y given L would require separate comparisons of the probabilities of absence of one bacterium when the other bacterium is either absent or present at a particular concentration, and of the concentration distribution for one bacterium when the other bacterium is either absent or present at a particular concentration. Instead, the direct estimation of the odds ratio function relating bacterium A to bacterium Y given covariates L provides a unified solution to this problem and obviates the need for separate analyses.

Given n independent and identically distributed copies of O, Chen (2007) proposed a locally efficient iterative estimator of the parameter ψ₀ in a semiparametric model ℬ that specifies (i) γ (Y, A, L) is equal to a known function γ (Y, A, L; ψ) evaluated at the unknown true p-dimensional parameter vector ψ₀, i.e.

γ (Y, A, L; ψ_{0}) = γ (Y, A, L),

(1)

where γ (Y, A, L; ψ) takes the value 1 if A = a₀, Y = y₀, or ψ = 0, so ψ₀ = 0 encodes the null hypothesis that Y and A are conditionally independent given L, and (ii) either but not necessarily both, (a) a given parametric model f (Y | a₀, L; θ) for f (Y | a₀, L) or (b) a parametric model g(A | y₀, L; α) for g(A | y₀, L) is correct. Model ℬ is referred to as a union model because it is the union of the model 𝒞 that assumes that (i) and (iia) are true and the model 𝒟 that assumes that (i) and (iib) are true. An estimator of ψ₀ that is consistent and asymptotically normal under this union model is referred to as doubly robust because, given equation (1), the estimator is consistent and asymptotically normal for ψ₀ if one has succeeded in specifying either a correct model f (Y | a₀, L) or a correct model for g(A | y₀, L), thus giving the data analyst two chances rather than one chance to obtain valid inference for ψ₀.

An example of a simple parametric model for the odds ratio function is the bilinear log-odds ratio model (Chen, 2003, 2004). It assumes that γ (Y, A, L; ψ₀) = exp{ψ₀(Y − y₀) ⊗ (A − a₀)}, where ⊗ is the direct product. This model includes all of the generalized linear regression models with canonical link functions as special cases. In the case of stratified 2 × 2 tables, it implies homogeneous odds ratios, but is easily extended to the case of nonhomogeneous odds ratios. Other interesting examples of odds ratio models are given by Chen (2007).

Unfortunately, Chen’s aforementioned locally efficient doubly robust estimator of ψ₀ under model ℬ is computationally very demanding, especially when A and Y have multiple continuous components. The main contribution of our paper is to provide novel and highly efficient doubly robust estimators of ψ₀ that are substantially easier to compute than those of Chen.

2. Preliminaries

Before describing our new approach, we briefly summarize Chen’s results. He considered the following parametric and semiparametric approaches to the estimation of ψ₀: a prospective likelihood approach under the model 𝒞 that assumes that one has correctly modelled the nuisance baseline function f (Y | a₀, L); a retrospective likelihood approach under the model 𝒟 that assumes that one has correctly specified a model for the nuisance baseline function g(A | y₀, L); a joint likelihood approach under the intersection model that assumes that both models 𝒞 and 𝒟 are correct; and a doubly robust locally semiparametric efficient approach under the union model ℬ of § 1.

In his doubly robust approach, Chen establishes that in the semiparametric model 𝒜 characterized by the sole restriction (1), the density h(A, Y | L) can be written as h(A, Y | L; ψ₀), where

h (A, Y | L; ψ) = \frac{γ (Y, A, L; ψ) f (Y | L, A = a_{0}) g (A | Y = y_{0}, L)}{\int γ (y, a, L; ψ) f (y | L, A = a_{0}) g (a | Y = y_{0}, L) d μ (a, y)},

(2)

f (y | L, A = a₀) and g(a | Y = y₀, L) are the unknown conditional densities that generated the data and are solely restricted by ∫ γ (y, a, L) f (y | L, A = a₀)g(a | Y = y₀, L)dμ(a, y) < ∞ almost everywhere. Then, he specifies parametric models f (Y | a₀, L; θ) and g(A | y₀, L; α) for the unknown nuisance baseline functions f (y | a₀, L) and g(a | y₀, L), obtains profile estimates θ̂(ψ) and α̂ (ψ) of the nuisance parameters θ and α and calculates the efficient score Ŝ_eff (ψ) ≡ S_eff {θ̂(ψ), α̂ (ψ), ψ} for ψ in the semiparametric model 𝒜 evaluated at the law [γ (y, a, l; ψ), f {y | a₀, l; θ̂(ψ)}, g{a |y₀,l; α̂(ψ)}] indexed by {θ̂(ψ), α̂ (ψ), ψ}. Next, he estimates ψ₀ with the solution ψ̂_eff to P_n{Ŝ_eff (ψ)} = 0, where P_n(H) = n⁻¹ ∑_i H_i, and proves that ψ̂_eff is regular and asymptotically linear and thus consistent and asymptotically normal under the union model ℬ. Further general results of Robins and Rotnitzky (2001) imply that Ŝ_eff (ψ) is also the efficient score for ψ in model ℬ under the law [γ (y, a, l; ψ), f {y | a₀, l; θ̂(ψ)}, g{a |y₀, l; α̂(ψ)}]. It follows that the estimator ψ̂_eff is locally semiparametric efficient under model ℬ at the intersection submodel with both nuisance models correct; that is, ψ̂_eff attains the semiparametric efficiency bound for the model ℬ when both nuisance models happen to hold.

By definition, the efficient score S_eff = Π (S_ψ | $Λ_{nuis}^{⊥}$ ) for a parameter ψ in a given model is the projection of the score S_ψ for ψ onto the orthocomplement $Λ_{nuis}^{⊥}$ to the nuisance tangent space Λ_nuis in the Hilbert space L₂ ≡ L₂(F_O) of zero-mean functions of p dimensions, T ≡ t(A, Y, L) = t(O), with inner product E_{F_O} ( $T_{1}^{T}$ T₂) ≡ E( $T_{1}^{T}$ T₂), and corresponding squared norm ‖T‖² = E(T^TT), where F_O is the distribution function that generated the data. Chen proves that for model 𝒜, the set

Λ_{nuis}^{⊥} = {υ (Y, A, L) : E {υ (Y, A, L) | A, L} = E {υ (Y, A, L) | Y, L} = 0} \cap L_{2}

(3)

contains all functions that have zero-mean conditional on both (A, L) and (Y, L). When both A and Y contain continuous components and ψ₀ ≠ 0, Chen (2007) finds that this projection and therefore S_eff do not exist in closed form and must be computed using the iterative alternating conditional expectations algorithm. Each iteration requires the evaluation, by numerical integration, of conditional expectations, which seriously limits the practicality of Chen’s approach, particularly when A and/or Y have two or more continuous components.

The main contribution of our paper is to show that, even though the projection Π(R | $Λ_{nuis}^{⊥}$ ) of a given random variable R = r(Y, A, L) into the orthocomplement $Λ_{nuis}^{⊥}$ does not exist in closed form when both A and Y contain continuous components, the set $Λ_{nuis}^{⊥}$ does have a closed-form representation, which appears to be new. We use our representation to obtain doubly robust estimators, i.e. consistent and asymptotically normal estimators of ψ₀ in the union model ℬ, that are nearly as efficient as ψ̂_eff under the intersection submodel, yet do not require the alternating conditional expectations algorithm. Moreover, our closed-form representation of $Λ_{nuis}^{⊥}$ is of independent interest, with applications beyond the present paper. For example, Vansteelandt et al. (2008) use our representation to construct multiple robust estimators of the parameter encoding the interaction on an additive and multiplicative scale between two exposures A₁ and A₂ in their effects on an outcome Y.

In the special situation where either Y or A has finite support, Bickel et al. (1993) provide a closed-form expression for Π(R | $Λ_{nuis}^{⊥}$ ), which Chen, however, did not use to give a closed-form expression for S_eff. We remedy this oversight and obtain doubly robust locally-efficient closed-form estimating functions when Y and/or A has finite support; some emphasis is given to the important case of dichotomous Y which, incidentally, coincides with the semiparametric logistic regression model.

In the following, for a vector υ we write υ^⊗2 = υυ^T. To simplify notation, we suppose y₀ =0 and a₀ = 0 throughout, so that γ (Y, 0, L; ψ) = γ (0, A, L; ψ) = γ (Y, A, L; 0) = 1. We shall also use the following definition.

Definition 1. Given conditional densities f^†(Y | L) and g^†(A | L), the density h^†(Y, A | L) = f^†(Y | L)g^†(A | L) that makes A and Y conditionally independent given L is an admissible independence density if the joint law of (Y, A) given L under h^†(·, ·| L) is absolutely continuous with respect to the true law of (Y, A) given L with probability one. Furthermore, E^†(· |·, L) denotes conditional expectations with respect to h^†(Y, A | L).

3. Main result

As noted previously, under model 𝒜 characterized by restriction (1), Chen showed that $Λ_{nuis}^{⊥}$ is given by the set (3). We now provide a new closed-form representation of this set. To do so, for a fixed choice of admissible independence density h^†(Y, A | L) = f^†(Y | L)g^†(A | L) and any p-dimensional function d of (Y, A, L), define the random vector U(ψ; d, h^†) as

U (ψ; d, h^{†}) \equiv u (O; ψ; d, h^{†}) \equiv {d (Y, A, L) - d^{†} (Y, A, L)} \frac{h^{†} (Y, A | L)}{h (Y, A | L; ψ)}

with h (Y, A | L; ψ) defined in (2) and d^†(Y, A, L) = E^†(D | A, L) + E^†(D | Y, L) − E^†(D | L) for D ≡ d(Y, A, L). The following theorem gives the influence functions of regular asymptotically linear estimators of ψ₀ in model 𝒜 and will form the basis for our doubly robust approach.

Theorem 1. Given an admissible independence density h^†, an alternative representation of the set $Λ_{nuis}^{⊥}$ of (3) is $Λ_{nuis}^{⊥}$ = {U(ψ₀; d, h^†) : d unrestricted} ∩ L₂.

Proof. One can verify by explicit calculation that {U(ψ₀; d, h^†) : d} ∩ L₂ ⊆ $Λ_{nuis}^{⊥}$ . To show the other direction, take any function υ(A, Y, L) in $Λ_{nuis}^{⊥}$ , let d(Y, A, L) = υ(A, Y, L)h(A, Y |L)/ h^†(Y, A | L). Then υ(A, Y, L) = U(ψ₀; d, h^†) since ∫d(y, A, L) f^†(y | L)dμ(y) =∫ υ (A, y, L) f (y | A, L)g(A | L)/g†(A | L)dμ(y) = E{υ (A, Y, L) | A, L}g(A | L)/g†(A | L) = 0 and ∫ d(Y, a, L)g^†(a | L)dμ(a) = E{υ(A, Y, L) | Y, L} f (Y | L)/ f^†(Y | L) = 0.

Remark. We give an alternative, more abstract, proof of the fact that U(ψ₀; d, h^†) ≡ {d(Y, A, L) − d^†(Y, A, L)}h^†(Y, A | L)/ h(Y, A | L; ψ₀) ∈ $Λ_{nuis}^{⊥}$ . Given an admissible independence density h^†, let $Λ_{nuis}^{⊥, †}$ be the set (3) with expectations taken under h^†. It is well known that when, as under h^†, A and Y are conditionally independent given L, $Λ_{nuis}^{⊥, †}$ admits the representation {d(Y, A, L) − d^†(Y, A, L) : d}. Then d(Y, A, L) − d^†(Y, A, L) ∈ $Λ_{nuis}^{⊥, †}$ implies {d(Y, A, L) − d^†(Y, A, L)}h^†(Y, A | L)/ h(Y, A | L; ψ₀) ∈ $Λ_{nuis}^{⊥}$ , by the Radon–Nikodym theorem.

By standard semiparametric theory (Bickel et al., 1993), Theorem 1 implies that if ψ̂ is a regular and asymptotically linear estimator of ψ₀ in model 𝒜, then given any admissible independence density h^†, there exists a p-dimensional function D ≡ d(O) such that n^1/2(ψ̂ − ψ₀) = n^1/2 P_n[E{∂U(ψ; d, h^†)/∂ψ^T|_ψ=ψ₀}⁻¹U(ψ₀; d, h^†)] +o_p(1). Furthermore, this also implies that any regular and asymptotically linear estimator of ψ₀ in model 𝒜 can be obtained, up to asymptotic equivalence, as the solution to an equation $\sum_{i = 1}^{n} U_{i} (ψ; d, h^{†}) = 0$ . However, these solutions are infeasible because h(Y, A | L; ψ) depends on the unknown conditional densities f (y | L, A = 0) and g(a | Y = 0, L), which must be estimated from the data. While a nonparametric smoothing method would, in principle, be the preferred approach to estimate these densities, its finite-sample performance is bound to be poor for continuous L of moderate to high dimension because of the curse of dimensionality. A practical alternative is to proceed as in Chen (2007) and to impose working models of reduced dimension for the unknown baseline functions f (Y | A = 0, L) and g(A | Y = 0, L). Hence, we specify variation independent parametric models g(A | Y = 0, L; α) for g(A | Y = 0, L) and f (Y | A = 0, L; θ) for f (Y | A = 0, L) with unknown finite-dimensional parameters α and θ. Since we cannot be sure that either f (Y | A = 0, L; θ) or g(A | Y = 0, L; α) is correctly specified, we shall construct a doubly robust estimator of ψ₀ that is guaranteed to be consistent and asymptotically normal if either, but not necessarily both, of these working models is correct.

To do so, we adopt the notational convention introduced in § 1 that given a function such as U(ψ₀; d, h^†) which depends on the unknown law h(Y, A | L; ψ₀), we let U (ψ, θ, α; d, h^†) be the function U(ψ₀; d, h^†) evaluated at the law {γ (y, a, l; ψ), f (y | 0, l; θ), g(a | 0, l; α)}. Then, Theorem 2 shows that, under standard regularity conditions, ψ̂ ≡ ψ̂ (d; ĥ^†) is doubly robust, where ψ̂ (d; ĥ^†) is the solution to

P_{n} [U {ψ, \hat{θ} (ψ), \hat{α} (ψ); d, {\hat{h}}^{†}}] = 0,

d(Y, A, L) is a user-supplied function,

\hat{α} (ψ) = \underset{α}{arg max} \sum_{i = 1}^{n} log {g (A_{i} | Y_{i}, L_{i}; ψ, α)}

is the profile maximum likelihood estimator of α at a fixed ψ, g(A | Y, L; ψ, α) = γ(Y, A, L; ψ)g(A | Y = 0, L; α)/∫g(a | Y = 0, L; α)γ (Y, a, L; ψ) dμ(a),

\hat{θ} (ψ) = \underset{θ}{arg max} \sum_{i = 1}^{n} log {f (Y_{i} | A_{i}, L_{i}; ψ, θ)}

is the profile maximum likelihood estimator of θ at a fixed ψ, f (Y | A, L; ψ, θ) = γ(Y, A, L; ψ) f (Y | A = 0, L; θ)/∫γ (y, A, L; ψ) f (y | A = 0, L; θ)dμ(y), and ĥ^†(Y, A | L) ≡ f^†(Y | L; ω̂_f)g^†(A | L, ω̂_g). Here f^†(Y | L; ω̂_f) is a user-specified density when ω̂_f is chosen to be nonrandom, and f^†(Y | L; ω̂_f) is a user-supplied parametric model f^†(Y | L; ω_f) for the density of Y | L evaluated at ω̂_f maximizing $\prod_{i = 1}^{n} f^{†} (Y_{i} | L_{i}; ω_{f})$ , otherwise. Similarly, g^†(A | L, ω̂_g) is a user-specified density when ω̂_g is chosen nonrandom and g^†(A | L, ω̂_g) is a user-supplied parametric model g^†(A | L; ω_g) for the density of A | L, evaluated at ω̂_g maximizing $\prod_{i = 1}^{n} f^{†} (A_{i} | L_{i}; ω_{g})$ , otherwise.

Theorem 2. Suppose ĥ^†(Y, A | L) converges in probability to an admissible independence density h^†(Y, A | L). Then subject to the regularity conditions given in the Appendix, under the union model ℬ characterized by (1) and the assumption that either the model f (y | L, A = 0; θ) or g(a | Y = 0, L; α) is correct, n^1/2(ψ̂ − ψ₀) is regular and asymptotically linear, with influence function

E {[\frac{\partial}{\partial ψ} M {ψ, θ^{*} (ψ_{0}), α^{*} (ψ_{0}); d, h^{†}} |_{ψ = ψ_{0}}]}^{- 1} M {ψ_{0}, θ^{*} (ψ_{0}), α^{*} (ψ_{0}); d, h^{†}}

(4)

and thus converges in distribution to N(0, ∑), where

Σ = E {{(E {[\frac{\partial}{\partial ψ} M {ψ, θ^{*} (ψ_{0}), α^{*} (ψ_{0}); d, h^{†}} |_{ψ = ψ_{0}}]}^{- 1} M {ψ_{0}, θ^{*} (ψ_{0}), α^{*} (ψ_{0}); d, h^{†}})}^{\otimes 2}}

with θ*(ψ) and α*(ψ) denoting the probability limits of θ̂(ψ) and α̂(ψ), respectively, and

\begin{array}{l} M (ψ, θ, α; d, h^{†}) & = & U (ψ, θ, α; d, h^{†}) - E {\frac{\partial}{\partial θ} U (ψ, θ, α; d, h^{†})} E {\frac{\partial}{\partial θ} C (ψ, θ)}^{- 1} C (ψ, θ) \\ - E {\frac{\partial}{\partial α} U (ψ, θ, α; d, h^{†})} E {\frac{\partial}{\partial α} B (ψ, α)}^{- 1} B (ψ, α), \end{array}

(5)

where $C (ψ, θ) = \frac{\partial}{\partial θ} log f (Y | A, L; ψ, θ)$ and $B (ψ, α) = \frac{\partial}{\partial α} log {g (A | Y, L; ψ, α)}$ are the scores for θ and α, respectively.

A consistent estimator of Σ is

\hat{Σ} = n^{- 1} \sum_{i = 1}^{n} {({[n^{- 1} \sum_{j = 1}^{n} \frac{\partial}{\partial ψ} {\hat{M}}_{j} {ψ, \hat{θ} (\hat{ψ}), \hat{α} (\hat{ψ}); d, {\hat{h}}^{†}} |_{ψ = \hat{ψ}}]}^{- 1} {\hat{M}}_{i} {\hat{ψ}, \hat{θ} (\hat{ψ}), \hat{α} (\hat{ψ}); d, {\hat{h}}^{†}})}^{\otimes 2},

where M̂ is defined as M but with expectations replaced by their empirical version. Thus, ∑̂ can easily be used to obtain Wald-type confidence intervals for components of ψ₀.

Remark. When ω̂_f and/or ω̂_g are random, the asymptotic distribution of ψ̂ (d; ĥ^†) is equal to that of ψ̂ (d; h^†) with h^† = f^† × g^† = f^†(Y | L) × g^†(A | L) the probability limit of f̂^† × ĝ^† = f^†(Y | L; ω̂_f) × g^†(A | L, ω̂_g). In practice, it will be convenient to use an estimated density ĥ^† rather than a fixed choice h^†.

4. Local efficiency

We first consider the case in which both A and Y contain continuous components. Chen’s estimator ψ̂_eff solving P_n{Ŝ_eff (ψ)} = 0 is locally efficient in model ℬ at the intersection submodel. However, as previously noted, the estimated efficient score Ŝ_eff (ψ) does not exist in closed form when ψ ≠ 0 and the alternating conditional expectations algorithm is needed to compute Ŝ_eff (ψ) and thus ψ̂_eff. In this section, we propose estimators that exploit our representation of the set (3) and thus are easier to compute than ψ̂_eff, and yet are nearly locally efficient, i.e. have asymptotic variance almost equal to that of ψ̂_eff at the intersection submodel. The first estimator ψ̂(d_ind, ${\hat{h}}_{ind}^{†}$ ) is the easiest to compute, although its asymptotic variance is close to that of ψ̂_eff only when all components of ψ₀ are close to zero; nonetheless ψ̂(d_ind, ${\hat{h}}_{ind}^{†}$ ) will be useful in practice, because in many epidemiologic studies, the investigator will know from previously published results that all components of ψ₀ are small. Specifically, we set d_ind(Y, A, L) ≡ [∂ log{γ^T(Y, A, L; ψ)}/∂ψ]|_{ψ = 0} and ${\hat{h}}_{ind}^{†}$ (Y, A | L) = ${\hat{f}}_{ind}^{†}$ (Y|L) ${\hat{g}}_{ind}^{†}$ (A|L), with ${\hat{f}}_{ind}^{†}$ (Y |L) ≡ f {Y | L, A; ψ = 0,θ̂ (ψ = 0)} and ${\hat{g}}_{ind}^{†}$ (A |L) ≡ g{A | Y, L; ψ = 0, α̂ (ψ = 0)}. When the true parameter ψ₀ is 0 and thus A and Y are independent given L, ψ̂_eff and ψ̂(d̂_ind, ${\hat{h}}_{ind}^{†}$ ) have identical limiting distributions under the union model B. This result follows from the fact that, when ψ₀ = 0, S_eff (ψ) = d_ind(Y, A, L) − $d_{ind}^{†}$ (Y, A, L) with h^†(Y, A | L) equal to the true density h(Y, A | L) (Chen, 2007). By continuity, the asymptotic variances of ψ̂ (d̂_ind, ${\hat{h}}_{ind}^{†}$ ) and ψ̂_eff will be close, whenever ψ₀ is nearly zero.

When ψ₀ is not known to be nearly zero, we adopt a general approach proposed by Newey (1993). We take a basis system ϕ_j (A, Y, L) (j = 1, . . .) of functions dense in L₂, such as tensor products of trigonometric, wavelets or polynomial bases when the components of A, Y and L are all continuous. For some finite K > dim(ψ), we form the K -dimensional vector U(ψ; ϕ̃_K, h^†) with ϕ̃_K the vector of the first K basis functions and let Ŵ_K (ψ) ≡ U{ψ, θ̂ (ψ), α̂ (ψ); ϕ̃_K, ĥ^†}, and ${\hat{Γ}}_{K} (\tilde{ψ}) = \sum_{i = 1}^{n} {\hat{W}}_{K, i} (\tilde{ψ}) {\hat{W}}_{K, i}^{T} (\tilde{ψ})$ , where ψ̃ is any preliminary doubly robust estimator of ψ₀. Let ψ̃_K,_eff ≡ ψ̃_K,_eff (ϕ̃_K ĥ^†) be the minimizer of the quadratic form ${\sum_{i = 1}^{n} {\hat{W}}_{K, i} (ψ)}^{T} {{\hat{Γ}}_{K} (\tilde{ψ})}^{-} {\sum_{i = 1}^{n} {\hat{W}}_{K, i} (ψ)}$ with {Γ̂_K (ψ̃)}⁻ a generalized inverse of Γ̂_K (ψ̃). Then, ψ̃_K,_eff ≡ ψ̃_K,_eff (ϕ̃_K, ĥ^†) is consistent and asymptotically normal in the semiparametric union model ℬ; furthermore, with K chosen sufficiently large, the asymptotic variance of n^1/2(ψ̃_K,_eff − ψ₀) nearly attains the semiparametric efficiency bound for the union model at the intersection submodel with both nuisance models correct. In particular, the inverse of the asymptotic variance of ψ̃_K,_eff at the intersection submodel is

\begin{array}{l} Ω_{K} & = & {[E {\frac{\partial}{\partial ψ^{T}} U (ψ; {\tilde{φ}}_{K}, h^{†}) |_{ψ = ψ_{0}}}]}^{T} Γ_{K}^{-} E {\frac{\partial}{\partial ψ^{T}} U (ψ; {\tilde{φ}}_{K}, h^{†}) |_{ψ = ψ_{0}}} \\ = & E {S_{ψ} U^{T} (ψ_{0}; {\tilde{φ}}_{K}, h^{†})} Γ_{K}^{-} {[E {S_{ψ} U^{T} (ψ_{0}; {\tilde{φ}}_{K}, h^{†})}]}^{T} \end{array}

where $Γ_{K}^{-}$ is a generalized inverse of Γ_K = E{U^T(ψ₀; ϕ̃_K, h^†)U(ψ₀; ϕ̃_K, h^†)}. Thus, Ω_K is the variance of the population least squares regression of S_ψ on the linear span of U(ψ₀; ϕ̃_K, h^†). By ϕ̃_K, dense in L₂, as K → ∞ the components of U(ψ₀; ϕ̃_K, h^†) become dense in $Λ_{nuis}^{⊥}$ so that $Ω_{K} \underset{K \to \infty}{\to} {‖ Π (S_{ψ} | Λ_{nuis}^{⊥}) ‖}^{2} = var (S_{ψ, eff})$ , the semiparametric information bound for estimating ψ₀ under model ℬ.

Neither of these two strategies is needed if Y or A have finite support as an explicit form for the efficient score in this case was given by Bickel et al. (1993). Without loss of generality, assume Y has finite support say {y₀, y₁, . . ., y_M₋₁}, with y₀ = 0. In the following, we use the result obtained by Bickel et al. (1993) to construct a doubly robust locally-efficient estimating function in model ℬ. We then demonstrate that this estimating function is in fact a particular member of the class of estimating functions in § 3. For clarity of exposition, this demonstration is restricted to the case of dichotomous Y, but can be easily extended to Y with arbitrary finite support.

Consider the vector {I (Y = y₁), . . ., I (Y = y_M₋₁)} which we again denote by Y. Next, let Ψ(A, L; ψ₀) = E{∊(ψ₀)^⊗2 | A, L} and k ↦ Ũ (ψ₀; k) = [k(A, L) − Ẽ {k(A, L) | L; ψ₀}] × ∊(ψ₀) be a function that maps the space of p × M − 1 matrix functions of A and L into L₂, where Ẽ {k(A, L) | L; ψ₀} = E{k(A, L) × Ψ(A, L; ψ₀) |L} × E{Ψ(A, L; ψ₀) |L}⁻¹ and ∊(ψ₀) = Y − E(Y | A, L; ψ₀). Then, by Theorem A.4.5 of Bickel et al. (1993), the closed linear set {Ũ (ψ₀; k) : k =k(A, L) unrestricted} ∩ L₂ as k varies over the set of all p × (M − 1)-dimensional functions of A and L is equal to the set $Λ_{nuis}^{⊥}$ for model 𝒜

Furthermore, Bickel et al. (1993) show that Ũ {ψ₀; k_eff (ψ₀)} is the efficient score function of ψ in model 𝒜, where k_eff (ψ₀) equals k_eff (ψ₀) = [∂ log{ρ^T(A, L; ψ)}/∂ψ] |_{ψ= ψ₀}, with ρ (A, L; ψ) defined to be the (M − 1) × 1 vector with the jth component equal to γ (y_j, A, L, ψ), j = 1, . . ., M − 1. Robins and Rotnitzky (2001) prove that the efficient score in models 𝒜 and ℬ is identical at the intersection submodel. Therefore, a doubly robust, locally efficient at the intersection submodel, estimator of ψ₀ in model ℬ is obtained by solving either $\sum_{i = 1}^{n} {\tilde{U}}_{i} {ψ; k_{eff} (ψ), \hat{θ} (ψ), \hat{α} (ψ)} = 0$ , or $\sum_{i = 1}^{n} {\tilde{U}}_{i} {ψ; k_{eff} ({\hat{ψ}}_{mle}), \hat{θ} (ψ), \hat{α} (ψ)} = 0$ , where Ũ (ψ; k_eff,θ, α) is equal to the function Ũ (ψ; k_eff) evaluated at the law {γ (y, a, l; ψ), f (y | A = 0, l; θ), g(a | Y = 0, l; α)} and (ψ̂_mle, θ̂_mle, α̂_mle) is the maximum likelihood estimator in the parametric model h(A, Y | L; ψ, α, θ) for h(A, Y | L).

We next derive a doubly robust locally-efficient estimating function U(ψ, θ, α; d_eff, h^†) in our class that equals Ũ (ψ; k_eff,θ, α), in the special case where Y is dichotomous. This case is of particular interest as model 𝒜 is then equivalent to the familiar semiparametric logistic regression model

logit {Pr (Y = 1 | A, L; ψ_{0}) = log {γ (1, A, L; ψ_{0})} + η (L)

with y₁ = 1 and η(L) = log[Pr(Y = 1 | A = 0, L)/{1 − Pr(Y = 1 | A = 0, L)}] is an unrestricted function of L. Since Y is binary, any function d(Y, A, L) may be written as Ym(A, L) + n(A, L) with m(A, L) = d(1, A, L) − d(0, A, L) and n(A, L) = d(0, A, L). Given an admissible independence density h^†(Y, A | L) = f^†(Y | L)g^†(A | L), let r ↦ V (ψ₀; r, h^†) = {r(A, L) − r^†(L)} × (−1)¹⁻^Y g^†(A | L)/ h(Y, A | L; ψ₀), be a function that maps the space of p-dimensional functions of A and L into L₂, where r^†(L) ≡ E^†{r(A, L) | L}. For a given choice of h^† and d(Y, A, L), U(ψ₀; d, h^†) simplifies to V (ψ₀; r, h^†) with r(A, L) = m(A, L) f^†(1 | L){1 − f^†(1 | L)}.

Furthermore, by

\frac{{(- 1)}^{1 - Y}}{h (Y, A | L; ψ_{0})} = \frac{{Y - Pr (Y = 1 | A, L; ψ_{0})}}{var (Y | A, L; ψ_{0}) \int h (y, A | L; ψ_{0}) d μ (y)},

we have that

V (ψ_{0}; r, h^{†}) = \frac{{r (A, L) - r^{†} (L)} g^{†} (A | L)}{var (Y | A, L; ψ_{0}) \int h (y, A | L) d μ (y)} \times ∊ (ψ_{0}) .

Thus, since S_eff = Ũ (k_eff) is the efficient score, we conclude that V {ψ₀; r_eff (h^†; ψ₀), h^†} = S_eff with

\begin{array}{l} r_{eff} (h^{†}; ψ_{0}) & \equiv & \frac{g (A | L)}{g^{†} (A | L)} \times var (Y | A, L; ψ_{0}) \times [k_{eff} (ψ_{0}) - E {k_{eff} (ψ_{0}) \\ \times var (Y | A, L; ψ_{0}) | L} \times E {var (Y | A, L; ψ_{0}) | L}^{- 1}] . \end{array}

Therefore, the solution to either of the following estimating equations is doubly robust locally semiparametric efficient $\sum_{i = 1}^{n} U_{i} [ψ, \hat{θ} (ψ), \hat{α} (ψ); d_{eff} {ψ, \hat{θ} (ψ), \hat{α} (ψ), {\hat{h}}^{†}}, {\hat{h}}^{†}] = 0$ , or $\sum_{i = 1}^{n} U_{i} [ψ, \hat{θ} (ψ), \hat{α} (ψ); d_{eff} ({\hat{ψ}}_{mle}, {\hat{θ}}_{mle}, {\hat{α}}_{mle}, {\hat{h}}^{†}), {\hat{h}}^{†}} = 0$ where d_eff (Y, A, L; ψ, θ, α, ĥ^†) = Yr_eff (ĥ^†; ψ, θ, α), α̂ (ψ) as defined earlier and $\hat{θ} (ψ) = {arg max}_{θ} \sum_{i = 1}^{n} [Y_{i} log {b (A_{i}, L_{i}; ψ, θ)} + (1 - Y_{i}) log {1 - b (A_{i}, L_{i}; ψ, θ)}]$ with logit{b(A, L; ψ, θ)} = log{γ (1, A, L; ψ)} + (L; θ). More precisely, each solution is regular and asymptotically linear under model ℬ and attains the semiparametric efficiency bound for the model at the intersection submodel.

5. Discussion

Although the common variation independent parameterization of h(A, Y | L) f_L (L) under model 𝒜 with Y binary is (ψ, f, g, f_L) with f_L = f_L (l), f = f (y | l, A = 0) and g = g(a | l), we instead used the parameterization of Chen (2007) that has g = g(a | Y = 0, l) rather than g = g(a | l). Our use of Chen’s parameterization was the key to our obtaining the doubly robust estimating functions for ψ and hence doubly robust estimators. Formally, following Robins and Rotnitzky (2001), a function S(ψ, f*, g*) = s(O; ψ, f*, g*) of a single subject’s data O is said to be doubly robust for ψ under a particular parameterization for model 𝒜 if, when either, but not necessarily both f = f* or g = g*, (i) E_{ψ,f,g,f_L}{S(ψ; f*, g*)} = 0 and var_{ψ,f,g,f_L}{S(ψ, f*, g*)} < ∞ for all ψ and (ii) ∂[E_{ψ*,f,g,f_L} {S(ψ; f*, g*)}/∂ψ]_|ψ=ψ* ≠ 0 for all ψ*, f, g, f_L. Part (ii) guarantees power against local alternatives. As shown in the Appendix, U(ψ; θ, α, d, h^†) and Ũ (ψ; k_eff, θ, α) satisfy this definition under Chen’s parameterization with f*(y | l, A = 0) = f (y | l, A = 0; θ) and g*(a | Y = 0, l) = g(a | Y = 0, l; α). In contrast, no doubly robust estimating function for ψ exists under the common parameterization. In fact, the following result holds.

Theorem 3. Under the common parameterization by (ψ, f, g, f_L) with f_L = f_L (l), f = f (l) = f (y | l, A = 0) and g = g(a | l), there does not exist a doubly robust estimating function S(ψ, f*, g*) = s(O; ψ, f*, g*) in model 𝒜 with Y binary characterized by the sole restriction (1).

In the Appendix, we prove this result for discrete A, thereby avoiding technicalities that arise in the continuous case.

Acknowledgments

Andrea Rotnitzky and James Robins were funded by grants from the U.S. National Institutes of Health. The authors wish to thank the reviewers for helpful comments. Andrea Rotnitzky is also affiliated with the Harvard School of Public Health.

Appendix

Proof of Theorem 2. We assume that the regularity conditions of Theorem 1A of Robins et al. (1992) hold for U(ψ₀; θ, α, h^†), C(ψ₀,θ) and B(ψ₀, α) and that E[∂M{ψ, θ*(ψ), α*(ψ); d, h^†}/∂ψ|_ψ=ψ₀ ] is nonsingular. We first show that E{U(ψ₀; θ*(ψ₀), α*(ψ₀), h^†)} = 0 when the data are generated under either f (Y | A, L; ψ₀,θ₀) or f (A | Y, L; ψ₀, α₀). By symmetry, it is enough to consider the case where the data were generated under f (Y | A, L; ψ₀,θ₀). Under standard conditions guaranteeing the consistency of the maximum likelihood estimator, θ*(ψ₀) = θ₀. Now, under f (Y | A, L; ψ₀, θ₀),

\begin{array}{l} E [U {ψ_{0}; θ_{0}, α^{*} (ψ_{0}), h^{†}}] \\ = E [f^{†} (Y | L) g^{†} (A | L) {d (Y, A, L) - d^{†} (Y, A, L)} / h {Y, A | L; ψ_{0}, θ_{0}, α^{*} (ψ_{0})}] \\ = E (\frac{g^{†} (A | L)}{\int h {u, A | L; ψ_{0}, θ_{0}, α * (ψ_{0})} d μ (u)} E [\frac{f^{†} (Y | L)}{h (Y | A, L; ψ_{0}, θ_{0})} {d (Y, A, L) - d^{†} (Y, A, L) | A, L]) \\ = E [\frac{g^{†} (A | L)}{\int h {u, A | L; ψ_{0}, θ_{0}, α^{*} (ψ_{0})} d μ (u)} \int f^{†} (y | L) {d (y, A, L) - d^{†} (y, A, L)} d μ (y)] = 0, \end{array}

since

\begin{array}{l} \int f^{†} (y | L) {d (y, A, L) - d^{†} (y, A, L)} d μ (y) \\ = \int f^{†} (y | L) d (y, A, L) d μ (y) - \int d (y, A, L) f^{†} (y | L) d μ (y) \\ - \int d (y, a, L) f^{†} (y | L) g^{†} (a | L) d μ (a, y) + \int d (y, a, L) g^{†} (a | L) f^{†} (y | L) d μ (y, a) = 0. \end{array}

Then, under the assumed regularity conditions the formulae (4) and (5) follow from standard Taylor series arguments, whenever E[∂M{ψ, θ*(ψ), α*(ψ); d, h^†}/∂ψ]|_{ψ = ψ₀} is nonsingular. The asymptotic normality result follows from the standard application of Slutsky’s theorem and the central limit theorem.

Proof of Theorem 3. The proof is by contradiction: if S(ψ, f*, g*) were doubly robust, then, for every f*, S(ψ, f*, g*) would be an unbiased estimating function for ψ with power against local alternatives in the submodel 𝒜_g* of model 𝒜 in which g = g* is known a priori. Hence, it suffices to prove that model 𝒜_g* does not admit such unbiased estimating functions. Noting that model 𝒜_g* can be parameterized by (ψ, f, f_L), we need to prove there is no function Q(ψ) = q(O; ψ) such that E_{ψ,f,f_L}{Q(ψ)} = 0, var_{ψ,f,f_L}{Q(ψ)} < ∞ and ∂[E_{ψ*,f,f_L}{Q(ψ)}]/∂ψ_{|ψ = ψ*} ≠ 0 for all ψ*, f, f_L. Now Bickel et al. (2003) proved that an unbiased estimating function for a parameter ψ lies in the orthocomplement to the nuisance tangent space for the model. For model 𝒜_g*, it is straightforward to show that the orthocomplement to the nuisance tangent space at law (ψ, f, f_L), say $Λ_{nuis, g *}^{⊥}$ (ψ, f), does not depend on f_L and is the direct sum of the orthocomplement for model 𝒜 plus the space of functions υ = υ(A, L) of (A, L) with zero-mean given L. Thus,

Λ_{nuis, g *}^{⊥} (ψ, f) = [T (k, υ; ψ, f) = \tilde{U} (ψ, f; k) + υ (A, L); k unrestricted υ with E {υ (A, L) | L} = 0] \cap L_{2},

where Ũ (ψ, f ; k) = ∊(ψ, f)[k(A, L) − Ẽ{k(A, L) | L; ψ, f}] is Ũ (ψ; k) defined in § 4 with the dependence on f now made explicit. Suppose Q(ψ) existed. Then Q(ψ) = T (k_f, υ_f; ψ, f) ∈ $Λ_{nuis, g *}^{⊥}$ (ψ, f) holds for each f, where k_f and υ_f are the particular functions k and υ associated with a given f. Thus, T (k_f, υ_f; ψ, f) = T (k_f_*, υ_f_*; ψ, f*) for any f, f*. Noting ∊(ψ, f) ≡ Y − pr(Y = 1 | A, L; ψ, f), the previous equality implies that Y {k_f (A,L) − Ẽ{k_f (A, L) | L; ψ, f} − k_f* (A, L) − Ẽ{k_f* (A, L) | L; ψ, f*}] is equal to a function that does not depend on Y. Hence, it must be that

k_{f} (A, L) - \tilde{E} {k_{f} (A, L) | L; ψ, f} = k_{f *} (A, L) - \tilde{E} {k_{f *} (A, L) | L; ψ, f *} .

Thus, for a function r(L), k_f_* (A, L) = k_f (A, L) + r(L). Substituting for k _f_* (A, L) in the last display we obtain k_f (A, L) − Ẽ{k_f (A, L) | L; ψ, f} = k_f (A, L) + r(L) − Ẽ{k_f (A,L) | L; ψ, f*} − r(L) and hence Ẽ{k_f (A,L) | L; ψ, f} = Ẽ{k_f (A,L) | L; ψ, f*} with probability one for all f, f*, ψ which, as shown in the next paragraph, implies k_f (A,L) is not a function of A, i.e. k_f (A, L) = k̃(L) with probability one for some k̃(L). But k_f (A, L) not a function of A implies ∂[E_{ψ*, f, f_L} {Q(ψ)]/∂ψ |_ψ₌_ψ* = 0, which is a contradiction. We conclude that no unbiased estimating function Q(ψ) with power against local alternatives exists. We show that, for A binary, Ẽ{h(A, L) | L; ψ, f} depends on f on a set of nonzero probability whenever the conditional odds ratio function γ (1, 1, L; ψ) ≠ 1 with probability one, and h(A, L) = h₁(L) A + h₀(L) depends on A, i.e. whenever h₁(L) is nonzero with positive probability. Let f (l) denote f (1 | A = 0, l). When γ (1, 1, L; ψ) ≠ 1 with probability one, Ẽ[A | L; ψ, f] = Ẽ[A | L; ψ, f ] = [1 + {g(A = 0 | L)/g(A = 1 | L)}{1 − f (L) + γ (1, 1, L) f (L)}²/γ (1, 1, L)]⁻¹ obviously depends on f (L) with probability one which implies Ẽ{h(A, L) | L; ψ, f} depends on f on the set where h₁(L) is nonzero. The proof for arbitrary discrete A is identical except that extra bookkeeping is required.

References

Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer; 1993. [Google Scholar]
Chen HY. A note on prospective analysis of outcome-dependent samples. J. R. Statist. Soc. B. 2003;65:575–84. [Google Scholar]
Chen HY. Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Statist Assoc. 2004;99:1176–89. [Google Scholar]
Chen HY. A semiparametric odds ratio model for measuring association. Biometrics. 2007;63:413–21. doi: 10.1111/j.1541-0420.2006.00701.x. [DOI] [PubMed] [Google Scholar]
Newey W. Efficient estimation of models with conditional moment restrictions. In: Maddala GS, Rao CR, Vinod H, editors. Handbook of Statistics, IV. Amsterdam: Elsevier Science; 1993. pp. 427–61. [Google Scholar]
Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–95. [PubMed] [Google Scholar]
Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, ‘Inference for semiparametric models: some questions and an answer’. Statist. Sinica. 2001;11:920–36. [Google Scholar]
Vansteelandt S, VanderWeele T, Tchetgen EJ, Robins JM. Semiparametric inference for statistical interactions. J Am Statist Assoc. 2008;103:1693–704. doi: 10.1198/016214508000001084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b1-asp062] Bickel P, Klassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer; 1993. [Google Scholar]

[b2-asp062] Chen HY. A note on prospective analysis of outcome-dependent samples. J. R. Statist. Soc. B. 2003;65:575–84. [Google Scholar]

[b3-asp062] Chen HY. Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Statist Assoc. 2004;99:1176–89. [Google Scholar]

[b4-asp062] Chen HY. A semiparametric odds ratio model for measuring association. Biometrics. 2007;63:413–21. doi: 10.1111/j.1541-0420.2006.00701.x. [DOI] [PubMed] [Google Scholar]

[b5-asp062] Newey W. Efficient estimation of models with conditional moment restrictions. In: Maddala GS, Rao CR, Vinod H, editors. Handbook of Statistics, IV. Amsterdam: Elsevier Science; 1993. pp. 427–61. [Google Scholar]

[b6-asp062] Robins JM, Mark SD, Newey WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–95. [PubMed] [Google Scholar]

[b7-asp062] Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, ‘Inference for semiparametric models: some questions and an answer’. Statist. Sinica. 2001;11:920–36. [Google Scholar]

[b8-asp062] Vansteelandt S, VanderWeele T, Tchetgen EJ, Robins JM. Semiparametric inference for statistical interactions. J Am Statist Assoc. 2008;103:1693–704. doi: 10.1198/016214508000001084. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On doubly robust estimation in a semiparametric odds ratio model

Eric J Tchetgen Tchetgen

James M Robins

Andrea Rotnitzky

Abstract

1. Introduction

2. Preliminaries

3. Main result

4. Local efficiency

5. Discussion

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On doubly robust estimation in a semiparametric odds ratio model

Eric J Tchetgen Tchetgen

James M Robins

Andrea Rotnitzky

Abstract

1. Introduction

2. Preliminaries

3. Main result

4. Local efficiency

5. Discussion

Acknowledgments

Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases