Abstract
We state and prove a limit theorem for estimators of a general, possibly infinite dimensional parameter based on unbiased estimating equations containing estimated nuisance parameters. The theorem corrects a gap in the proof of one of the assertions of our paper ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’.
Keywords: nuisance parameters, Z-theorem
1. Introduction
Breslow & Wellner (2007) cited a theorem of Pierce (1982) in deriving the asymptotic distribution of weighted likelihood estimators for parameters in semiparametric models fitted to two-phase stratified samples when sampling weights were estimated from the data. Li (2007, personal communication) noticed that one of Pierce’s two key hypotheses had in fact not been established by us. In this note we develop a Z-theorem with estimated nuisance parameters that applies to infinite dimensional parameters and allows us to complete our earlier proof under a slight strengthening of our previous hypotheses. The derivations use empirical process techniques developed in van der Vaart & Wellner (1996) and related articles. In order to keep the exposition as short as possible, we assume familiarity with the notation and results in section 3.3 of van der Vaart & Wellner (1996) and section 6 of Breslow & Wellner (2007).
2. A Z-theorem with estimated nuisance parameters
Following van der Vaart & Wellner (1996, section 3.3), define random and fixed maps Ψn(θ, α): ℋ → ℝ, Ψ(θ, α): ℋ → ℝ for some index set ℋ with Ψn(θ, α), Ψ(θ, α) ∈ ℓ∞(ℋ). In most applications, including that in section 3, Ψn(θ, α)h = ℙnψθ,α,h and Ψ(θ, α)h = Pψθ,α,h for given measurable functions ψθ,α,h indexed by θ ∈ Θ, α ∈
and h ∈ ℋ. We do not insist on this in the general theorem, however.
Suppose that Ψ(θ0, α0) = 0. Here α is to be regarded as a nuisance parameter; in the application, α is a finite-dimensional parameter while θ = (ν, η) where ν is finite-dimensional and η is infinite-dimensional. Suppose we have available estimators α̂ = α̂n of α, and then consider estimators θ̂n of θ satisfying
| (1) |
We would like to establish limit theorems for which are similar to those in the standard Z-theorem of van der Vaart (1995); see also theorem 3.3.1, p. 310, of van der Vaart & Wellner (1996). As argued in van der Vaart & Wellner (2007, p. 235), we can derive limit distributions of θ̂n based on {Ψn(θ, α̂): θ ∈ Θ} from the corresponding theory for {Ψn(θ, α0): θ ∈ Θ}, if we know that and if we show that
| (2) |
An alternative goal would be to relate the estimators θ̂n to estimators that satisfy . This is accomplished in the third part of the following theorem which generalizes theorem 5.31, p. 60, of van der Vaart (1998); see also theorem 6.18, p. 407, of van der Vaart (2002).
Theorem 1
Suppose that θ̂n satisfies (1), that θ ↦ {Ψ(θ, α)h: h ∈ ℋ} is uniformly Fréchet-differentiable in a neighbourhood of α0 with derivative maps Ψ̇ (θ0, α) satisfying Ψ̇(θ0, α) → Ψ̇(θ0, α0) ≡ Ψ̇0 as α → α0 with Ψ̇0 continuously invertible. Suppose, moreover, that in ℓ∞ (ℋ), that (2) holds and that
| (3) |
-
If , then and
(4) -
If the map α↦ {Ψ(θ0, α)h: h ∈ ℋ} is Fréchet-differentiable at α0 with derivative map Ψ̇α and satisfies (ℤn, ) ⇝ (ℤ0,
ϕ), then(5) -
Under the same hypotheses as in (i)
(6) -
Under the same hypotheses as in (ii)
(7)
Proof
By the definition of θ̂n,
| (8) |
by using (3) in the last line. By uniform differentiability of θ ↦ {Ψ(θ, α)h: h ∈ ℋ} and uniform non-singularity of Ψ̇(θ0, α), it follows that there is a constant c > 0 such that, for all (θ, α) in a sufficiently small neighbourhood of (θ0, α0), ||Ψ(θ, α) − Ψ(θ0, α)|| ≥ c||θ − θ0||. Combining this with (8) yields
and hence that if the last term in the preceding display is Op(1). Now
where in the first equation we have again used the uniform differentiability hypothesis and in the second the hypotheses of part (ii) of the theorem. As this converges to the claimed limit, this proves (i) and (ii). To prove (iii) and (iv), note that the standard Z-theorem yields . The claimed results follow by combining each line in the last display with (4).
Remark
Under the hypotheses of theorem 1, theorem 2.21 of Kato (1976, p. 205) implies that the derivative maps Ψ̇(θ0, α) are continuously invertible for α in a neighbourhood of α0.
3. Completion of the proof of Breslow & Wellner (2007)
In Breslow & Wellner (2007),
= {Pθ, η: θ ∈ Θ, η ∈ Ξ} is a semiparametric model that satisfies five assumptions A1–A5. Here we slightly strengthen A1, which had already strengthened the hypotheses of van der Vaart (1998, theorem 25.90), to
A1* for (θ, η) in a δ-neighbourhood of (θ0, η0) the functions ℓ̇ θ, η and {Bθ, ηh − Pθ, ηBθ, ηh, h ∈ ℋ} are contained in a P0-Donsker class ℱ and have square-integrable envelope functions F1 and F2 respectively.
We also strengthen A3 to
A3* A3 holds and moreover the derivative maps Ψ̇0 = (Ψ̇11, Ψ̇12, Ψ̇21, Ψ̇22) have representations
in terms of L2(P0) derivatives of ψ1, θ, η, h = ℓ̇ θ, η and ψ2, θ, η, h = Bθ, ηh − Pθ, ηBθ, ηh, i.e.
for i = 1, 2.
Breslow & Wellner (2007) showed in (25) on p. 92 that for π= π0
| (9) |
and thus that the quantity on the LHS has an asymptotic N(0, V) distribution where
Thus Pierce’s (1982) first hypothesis (1.1) is satisfied with θ̂(α0) − θ0 the statistic of interest, Tn in his notation, and α the estimated nuisance parameter, Pierce’s (1982)λ.
Pierce’s (1982) second hypothesis (1.2) is that with T̂n = Tn(λ̂n)
| (10) |
for some matrix B. A further hypothesis is that λ̂n is efficient; i.e. . Then Pierce shows that . Breslow & Wellner (2007) also showed in (26) on p. 92 that
| (11) |
However, as pointed out by Li, this does not yet prove that
| (12) |
for some matrix B as is needed to verify (10). This result does, however, follow from part (iv) of theorem 1.
As in section 6 of Breslow & Wellner (2007), suppose that α̂ = α̂N denotes the maximum likelihood estimator of parameters in the model πα(ν) for the sampling probabilities and that θ̂N ≡ θ̂N (α̂), η̂N ≡ η̂N (α̂) solve and for all h ∈ ℋ where
and where π̂i ≡ πα̂(Vi), i = 1, …, N.
Theorem 2
Suppose the semiparametric model
satisfies A1* and A3* above and A2, A4 and A5 of Breslow & Wellner (2007) and that the model πα (V) for the conditional distribution of ξ given X, V satisfies the hypotheses of theorem 5.39 of van der Vaart (1998). Suppose moreover that πα satisfies (42) of Breslow & Wellner (2007):
| (13) |
for α in a neighbourhood of α0 where ξ > 0 and ψ satisfies Eψ2(V) < ∞. Then
where, as in (27) of Breslow & Wellner (2007),
Proof
We use theorem 1 with θ there replaced by (θ, η). We proceed by verifying the conditions of the theorem, beginning with (2). Recall that W = (X, U) and V = (X̃, U) where X̃ = X̃ (X). Consider the classes of functions
| (14) |
| (15) |
for θ ∈ Θ, η ∈ Ξ, and h ∈ ℋ. Then showing (2) is equivalent to showing that
| (16) |
| (17) |
Under the condition (13) imposed by Breslow & Wellner (2007), (16) holds by virtue of
Then
where, using (13),
uniformly in θ ∈ Θ, η ∈ Ξ. Here F1 is a square integrable envelope function for the class of functions {ℓ̇θ, η: θ ∈ Θ, η ∈ Ξ}, which exists by A1*.
To handle SN, note that
uniformly in θ ∈ Θ, η ∈ Ξ as the class of functions { , η ∈ Ξ} is a Glivenko–Cantelli class of functions. Here is the argument: as the class {ℓ̇θ, η: θ ∈ Θ, η ∈ Ξ} is P-Donsker, it is P-Glivenko–Cantelli. Furthermore the (vector of) function(s) { } is square-integrable: for 1 ≤ j ≤ q = dim(α)
by our assumptions on the model πα(ν). Thus the Glivenko–Cantelli preservation theorem of van der Vaart & Wellner (2000) applies by taking ϕ (u, ν) = uν, , ℱ2 = ℓ̇θ, η, and noting that ℱ1 · ℱ2 has integrable envelope function ( ) F1. A similar argument works for (17) using the square integrable envelope F2 for {Bθ, ηh: θ ∈ Θ, h ∈ ℋ, η ∈ Ξ}.
Now note that A1*, A2, A3* and A4 imply that
ψθ0, η0, α0 ⇝
ψθ0, η0, α0 in ℓ∞(ℋ) and that (3) holds as α0 is fixed in both cases. The hypothesized uniform Fréchet differentiability holds under (13) and A3*: writing θ for (θ, η) in the spirit of theorem 1,
by using the assumed regularity of πα, (13) and (3) of Breslow & Wellner (2007) to bound P0(πα0/πα)2 uniformly in a neighbourhood of α0 and using A3* to bound the second term. The additional hypotheses in (ii) and (iv) also follow from the regularity of πα and (13).
To complete the proof, write ψ θ, η, α, h = (ψ1;θ, η, α, ψ2;θ, η, α, h) as defined in (14) and (15). Then the corresponding components of
are
Consequently, operating with the partitioned (assumption A5, η a measure) version of Ψ̇0 on both left- and right-hand sides of (7) we find
and
Following closely again section 25.12 of van der Vaart (1998) we choose and subtract the first equation from the second to find
Recognizing that is the ordinary information for θ, is the efficient score and is the efficient information, we multiply both sides of the preceding equation by to find
| (18) |
which is the second hypothesis (1.2) of Pierce (1982), equivalent to our (12) above, with . Theorem 2 now follows from theorem 1 via (9) and (18). The resolution of the gap in Breslow & Wellner’s (2007) argument, namely the demonstration that
is obtained from (9), (11) and (18) as a corollary to theorem 2.
Acknowledgments
We thank Zhiguo Li for pointing out the gap in the arguments in Breslow & Wellner (2007).
Contributor Information
NORMAN E. BRESLOW, Department of Biostatistics, University of Washington
JON A. WELLNER, Departments of Statistics and Biostatistics, University of Washington
References
- Breslow NE, Wellner JA. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Statist. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kato T. Perturbation theory for linear operators. 2. Springer-Verlag; Berlin: 1976. [Google Scholar]
- Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann Statist. 1982;10:475–478. [Google Scholar]
- van der Vaart AW. Efficiency of infinite-dimensional M-estimators. Statist Neerlandica. 1995;49:9–30. [Google Scholar]
- van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
- van der Vaart AW. Semiparametric statistics. In: Bolthausen E, Perkins E, van der Vaart A, editors. Lectures on probability theory and statistics. Ecole d’Eté de Probabilités de Saint-Flour XXIX – 1999. Springer-Verlag; Berlin: 2002. pp. 331–457. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. [Google Scholar]
- van der Vaart AW, Wellner JA. Preservation theorems for Glivenko–Cantelli and uniform Glivenko–Cantelli classes. In: Giné E, Mason DM, Wellner JA, editors. High dimensional probability II. Birkhäuser; Boston: 2000. pp. 115–133. [Google Scholar]
- van der Vaart AW, Wellner JA. Empirical processes indexed by estimated functions. IMS Lecture Notes Monogr Ser. 2007;55:234–252. [Google Scholar]
