Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Mar 9.
Published in final edited form as: Scand Stat Theory Appl. 2008 Mar 1;35(1):186–192. doi: 10.1111/j.1467-9469.2007.00574.x

A Z-theorem with Estimated Nuisance Parameters and Correction Note for ‘Weighted Likelihood for Semiparametric Models and Two-phase Stratified Samples, with Application to Cox Regression’

NORMAN E BRESLOW 1, JON A WELLNER 2
PMCID: PMC2835417  NIHMSID: NIHMS102015  PMID: 20221322

Abstract

We state and prove a limit theorem for estimators of a general, possibly infinite dimensional parameter based on unbiased estimating equations containing estimated nuisance parameters. The theorem corrects a gap in the proof of one of the assertions of our paper ‘Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’.

Keywords: nuisance parameters, Z-theorem

1. Introduction

Breslow & Wellner (2007) cited a theorem of Pierce (1982) in deriving the asymptotic distribution of weighted likelihood estimators for parameters in semiparametric models fitted to two-phase stratified samples when sampling weights were estimated from the data. Li (2007, personal communication) noticed that one of Pierce’s two key hypotheses had in fact not been established by us. In this note we develop a Z-theorem with estimated nuisance parameters that applies to infinite dimensional parameters and allows us to complete our earlier proof under a slight strengthening of our previous hypotheses. The derivations use empirical process techniques developed in van der Vaart & Wellner (1996) and related articles. In order to keep the exposition as short as possible, we assume familiarity with the notation and results in section 3.3 of van der Vaart & Wellner (1996) and section 6 of Breslow & Wellner (2007).

2. A Z-theorem with estimated nuisance parameters

Following van der Vaart & Wellner (1996, section 3.3), define random and fixed maps Ψn(θ, α): ℋ → ℝ, Ψ(θ, α): ℋ → ℝ for some index set ℋ with Ψn(θ, α), Ψ(θ, α) ∈ ℓ(ℋ). In most applications, including that in section 3, Ψn(θ, α)h = ℙnψθ,α,h and Ψ(θ, α)h = θ,α,h for given measurable functions ψθ,α,h indexed by θ ∈ Θ, αInline graphic and h ∈ ℋ. We do not insist on this in the general theorem, however.

Suppose that Ψ(θ0, α0) = 0. Here α is to be regarded as a nuisance parameter; in the application, α is a finite-dimensional parameter while θ = (ν, η) where ν is finite-dimensional and η is infinite-dimensional. Suppose we have available estimators α̂ = α̂n of α, and then consider estimators θ̂n of θ satisfying

suphHΨn(θ^n,α^n)h=op(n1/2). (1)

We would like to establish limit theorems for n(θ^nθ0) which are similar to those in the standard Z-theorem of van der Vaart (1995); see also theorem 3.3.1, p. 310, of van der Vaart & Wellner (1996). As argued in van der Vaart & Wellner (2007, p. 235), we can derive limit distributions of θ̂n based on {Ψn(θ, α̂): θ ∈ Θ} from the corresponding theory for {Ψn(θ, α0): θ ∈ Θ}, if we know that n(α^nα0)=Op(1) and if we show that

supθΘn(ΨnΨ)(θ,α^n)n(ΨnΨ)(θ,α0)H=op(1). (2)

An alternative goal would be to relate the estimators θ̂n to estimators θ^n0 that satisfy suphHΨn(θ^n0,α0)h=op(n1/2). This is accomplished in the third part of the following theorem which generalizes theorem 5.31, p. 60, of van der Vaart (1998); see also theorem 6.18, p. 407, of van der Vaart (2002).

Theorem 1

Suppose that θ̂n satisfies (1), that θ ↦ {Ψ(θ, α)h: h ∈ ℋ} is uniformly Fréchet-differentiable in a neighbourhood of α0 with derivative maps Ψ̇ (θ0, α) satisfying Ψ̇(θ0, α) → Ψ̇(θ0, α0) ≡ Ψ̇0 as α → α0 with Ψ̇0 continuously invertible. Suppose, moreover, that Znn(ΨnΨ)(θ0,α0)Z0 in ℓ (ℋ), that (2) holds and that

n(ΨnΨ)(θ^n,α0)n(ΨnΨ)(θ0,α0)H=op(1+nθ^nθ0). (3)
  1. If n(Ψ(θ0,α^n)Ψ(θ0,α0))H=Op(1), then nθ^nθ0=Op(1) and

    n(θ^nθ0)=Ψ.01n(ΨnΨ)(θ0,α0)Ψ.01[n(Ψ(θ0,α^n)Ψ(θ0,α0))]+op(1). (4)
  2. If the map α↦ {Ψ(θ0, α)h: h ∈ ℋ} is Fréchet-differentiable at α0 with derivative map Ψ̇α and n(α^nα0)=Gnϕ+op(1) satisfies (ℤn, n(α^nα0)) ⇝ (ℤ0, Inline graphicϕ), then

    n(θ^nθ0)Ψ.01(Z0+Ψ.αGϕ). (5)
  3. Under the same hypotheses as in (i)

    n(θ^nθ0)=n(θ^n0θ0)Ψ.01n(Ψ(θ0,α^n)Ψ(θ0,α0))+op(1). (6)
  4. Under the same hypotheses as in (ii)

    n(θ^nθ0)=n(θ^n0θ0)Ψ.01Ψ.αn(α^nα0)+op(1). (7)

Proof

By the definition of θ̂n,

n(Ψ(θ^n,α^n)Ψ(θ0,α^n))=n(Ψ(θ^n,α^n)Ψn(θ^n,α^n))n(Ψ(θ0,α^n)Ψ(θ0,αn))+op(1)=n(Ψn(θ^n,α^n)Ψ(θ^n,α^n))+n(Ψn(θ0,α^n)Ψ(θ0,α^n))n(Ψn(θ0,α^n)Ψ(θ0,α^n))n(Ψ(θ0,α^n)Ψ(θ0,α0))+op(1)=n(ΨnΨ)(θ^n,α0)+n(ΨnΨ)(θ0,α0)n(ΨnΨ)(θ0,α0)n(Ψ(θ0,α^n)Ψ(θ0,α0))+op(1)by(2)=op(1+nθ^nθ0)n(ΨnΨ)(θ0,α0)n(Ψ(θ0,α^n)Ψ(θ0,α0))+op(1) (8)

by using (3) in the last line. By uniform differentiability of θ ↦ {Ψ(θ, α)h: h ∈ ℋ} and uniform non-singularity of Ψ̇(θ0, α), it follows that there is a constant c > 0 such that, for all (θ, α) in a sufficiently small neighbourhood of (θ0, α0), ||Ψ(θ, α) − Ψ(θ0, α)|| ≥ c||θθ0||. Combining this with (8) yields

cnθ^nθ0op(1+nθ^nθ0)+Zn+n(Ψ(θ0,α^n)Ψ(θ0,α0))

and hence that nθ^nθ0=Op(1) if the last term in the preceding display is Op(1). Now

n(θ^nθ0)=Ψ.01[Zn+n(Ψ(θ0,α^n)Ψ(θ0,α0))]+op(1)=Ψ.01(Zn+Ψ.αGnϕ)+op(1)

where in the first equation we have again used the uniform differentiability hypothesis and in the second the hypotheses of part (ii) of the theorem. As this converges to the claimed limit, this proves (i) and (ii). To prove (iii) and (iv), note that the standard Z-theorem yields n(θ^n0θ0)=Ψ.01Zn+op(1). The claimed results follow by combining each line in the last display with (4).

Remark

Under the hypotheses of theorem 1, theorem 2.21 of Kato (1976, p. 205) implies that the derivative maps Ψ̇(θ0, α) are continuously invertible for α in a neighbourhood of α0.

3. Completion of the proof of Breslow & Wellner (2007)

In Breslow & Wellner (2007), Inline graphic = {Pθ, η: θ ∈ Θ, η ∈ Ξ} is a semiparametric model that satisfies five assumptions A1–A5. Here we slightly strengthen A1, which had already strengthened the hypotheses of van der Vaart (1998, theorem 25.90), to

A1* for (θ, η) in a δ-neighbourhood of (θ0, η0) the functions ℓ̇ θ, η and {Bθ, ηhPθ, ηBθ, ηh, h ∈ ℋ} are contained in a P0-Donsker class ℱ and have square-integrable envelope functions F1 and F2 respectively.

We also strengthen A3 to

A3* A3 holds and moreover the derivative maps Ψ̇0 = (Ψ̇11, Ψ̇12, Ψ̇21, Ψ̇22) have representations

Ψ.ij(θ0,η0)h=P0(ψ.ij,θ0,η0,h),i,j{1,2}

in terms of L2(P0) derivatives of ψ1, θ, η, h = ℓ̇ θ, η and ψ2, θ, η, h = Bθ, ηhPθ, ηBθ, ηh, i.e.

suphH{P0(ψi,θ,η0,hψi,θ0,η0,hψ.i1,θ0,η0,h(θθ0))2}1/2=o(θθ0),suphH{P0(ψi,θ0,η,hψi,θ0,η0,hψ.i2,θ0,η0,h(ηη0))2}1/2=o(ηη0),

for i = 1, 2.

Breslow & Wellner (2007) showed in (25) on p. 92 that for π= π0

N(θ^N(α0)θ0α^Nα0)=N(PNπ0QN0α)+op(1) (9)

and thus that the quantity on the LHS has an asymptotic N(0, V) distribution where

V=(V11V12V21V22),V11=I01+P0(1π0π002),V22=(P01V0cπ.02π0(1π0))1,V12=V21T=P0(ξπ000α)=P0(00α).

Thus Pierce’s (1982) first hypothesis (1.1) is satisfied with θ̂(α0) − θ0 the statistic of interest, Tn in his notation, and α the estimated nuisance parameter, Pierce’s (1982)λ.

Pierce’s (1982) second hypothesis (1.2) is that with n = Tn(λ̂n)

nT^n=nTn+Bn(λ^nλ)+op(1) (10)

for some matrix B. A further hypothesis is that λ̂n is efficient; i.e. V22=Iλ1. Then Pierce shows that nT^nN(0,V11BV22BT). Breslow & Wellner (2007) also showed in (26) on p. 92 that

N(PNπ^PNπ0)0=P0(1V0c0π.0Tπ0)N(α^Nα0)+op(1). (11)

However, as pointed out by Li, this does not yet prove that

N(θ^N(α^)θ0)=N(θ^N(α0)θ0)+BN(α^Nα0)+op(1) (12)

for some matrix B as is needed to verify (10). This result does, however, follow from part (iv) of theorem 1.

As in section 6 of Breslow & Wellner (2007), suppose that α̂ = α̂N denotes the maximum likelihood estimator of parameters in the model πα(ν) for the sampling probabilities and that θ̂Nθ̂N (α̂), η̂Nη̂N (α̂) solve PNπ^.θ,η=0 and PNπ^Bθ,ηhPθ,ηBθ,ηh=0 for all h ∈ ℋ where

PNπ^=1Ni=1Nξiπ^iδXi,

and where π̂iπα̂(Vi), i = 1, …, N.

Theorem 2

Suppose the semiparametric model Inline graphic satisfies A1* and A3* above and A2, A4 and A5 of Breslow & Wellner (2007) and that the model πα (V) for the conditional distribution of ξ given X, V satisfies the hypotheses of theorem 5.39 of van der Vaart (1998). Suppose moreover that πα satisfies (42) of Breslow & Wellner (2007):

1πα(v)1πα0(v)π.0T(v)π02(v)(αα0)ψ(v)αα01+ζ (13)

for α in a neighbourhood of α0 where ξ > 0 and ψ satisfies Eψ2(V) < ∞. Then

N(θ^N(α^)θ0)ZNp(0,)

where, as in (27) of Breslow & Wellner (2007),

=Var(ξπ00)P01V0c0π.0Tπ0(P01V0cπ.02π0(1π0))1P01V0cπ.00Tπ0.

Proof

We use theorem 1 with θ there replaced by (θ, η). We proceed by verifying the conditions of the theorem, beginning with (2). Recall that W = (X, U) and V = (, U) where = (X). Consider the classes of functions

ψ1;θ,η,α(w,ξ)=ξπα(v).θ,η(x), (14)
ψ2;θ,η,α,h(w,ξ)=ξπα(v)Bθ,ηh(x), (15)

for θ ∈ Θ, η ∈ Ξ, and h ∈ ℋ. Then showing (2) is equivalent to showing that

supθΘ,ηΞGN(ψ1;θ,η,α^ψ1;θ,η,α0)p0and (16)
supθΘ,ηΞ,hHGN(ψ2;θ,η,α^,hψ2;θ,η,α^0,h)p0. (17)

Under the condition (13) imposed by Breslow & Wellner (2007), (16) holds by virtue of

ψ1;θ,η,α^(w,ξ)ψ1;θ,η,α0(w,ξ)=(ξπα^(v)ξπα0(v)).θ,η(x)=ξ(1πα^(v)1πα0(v)π.0(v)Tπ02(v)(α^α0)).θ,ηξπ.0(v)Tπ02(v)(α^α0).θ,η.

Then

N(PNP)(ψ1;θ,η,α^ψ1;θ,η,α0)=N(PNP)(ξ(1πα^(v)1πα0(v)π.0(v)Tπ02(v)(α^α0)).θ,η)+N(PNP)(ξπ.0(v)Tπ02(v)(α^α0).θ,η)RN+SN

where, using (13),

RNN(PN+P){(1πα^(v)1πα0(v)π.0(v)Tπ02(v)(α^α0)).θ,η}PN(ψ(V).θ,η)Nα^α01+ζ+P(ψ(V).θ,η)Nα^α01+ζ{PNψ2(V)·PNF12+Pψ2(V)·PF12}Nα^α01+ζ=Op(1)op(1)

uniformly in θ ∈ Θ, η ∈ Ξ. Here F1 is a square integrable envelope function for the class of functions {ℓ̇θ, η: θ ∈ Θ, η ∈ Ξ}, which exists by A1*.

To handle SN, note that

SN=N(PNP)(ξπ.0Tπ02(α^α0).θ,η)=(PNP)(ξ.θ,ηπ.0Tπ02)N(α^α0)supθΘ,ηΞ(PNP)(ξ.θ,ηπ.0Tπ02)N(α^α0)=op(1)Op(1)=op(1)

uniformly in θ ∈ Θ, η ∈ Ξ as the class of functions { ξ.θ,ηπ.0T/π02:θΘ, η ∈ Ξ} is a Glivenko–Cantelli class of functions. Here is the argument: as the class {ℓ̇θ, η: θ ∈ Θ, η ∈ Ξ} is P-Donsker, it is P-Glivenko–Cantelli. Furthermore the (vector of) function(s) { ξπ.0T/π02} is square-integrable: for 1 ≤ jq = dim(α)

E(ξπ.0jπ02)2=Eπ.0j2π031σ2Eπ.0j2π0(1π0)=1σ2E.α,j2<

by our assumptions on the model πα(ν). Thus the Glivenko–Cantelli preservation theorem of van der Vaart & Wellner (2000) applies by taking ϕ (u, ν) = , F1={ξπ.0j/π02}, ℱ2 = ℓ̇θ, η, and noting that ℱ1 · ℱ2 has integrable envelope function ( π.0j/π02) F1. A similar argument works for (17) using the square integrable envelope F2 for {Bθ, ηh: θ ∈ Θ, h ∈ ℋ, η ∈ Ξ}.

Now note that A1*, A2, A3* and A4 imply that Inline graphicψθ0, η0, α0Inline graphicψθ0, η0, α0 in ℓ(ℋ) and that (3) holds as α0 is fixed in both cases. The hypothesized uniform Fréchet differentiability holds under (13) and A3*: writing θ for (θ, η) in the spirit of theorem 1,

Ψ(θ,α)hΨ(θ0,α)hΨ.(θ0,α)(θθ0)hH=suphHP0{πα0πα(ψθ,hψθ0,hψ.θ0,h(θθ0))}{P0(πα0πα)2}1/2suphH{P0(ψθ,hψθ0,hψ.θ0,h(θθ0))2}1/2Ko(θθ0)

by using the assumed regularity of πα, (13) and (3) of Breslow & Wellner (2007) to bound P0(πα0/πα)2 uniformly in a neighbourhood of α0 and using A3* to bound the second term. The additional hypotheses in (ii) and (iv) also follow from the regularity of πα and (13).

To complete the proof, write ψ θ, η, α, h = (ψ1;θ, η, α, ψ2;θ, η, α, h) as defined in (14) and (15). Then the corresponding components of

Ψ.αh=αTP0ψθ,η,α,hα=α0

are

Ψ.1,α=P0(.01V0cπ.0Tπ0)andΨ.2;αh=P0(B0h1V0cπ.0Tπ0),hH.

Consequently, operating with the partitioned (assumption A5, η a measure) version of Ψ̇0 on both left- and right-hand sides of (7) we find

I0N(θ^Nθ0)N(η^Nη0)B0.0=I0N(θ^N0θ0)N(η^N0η0)B0.0+P0(.01V0cπ.0Tπ0)N(α^Nα0)+op(1)

and

P0(B0h).0TN(θ^Nθ0)N(η^Nη0)B0B0h=P0(B0h).0TN(θ^N0θ0)N(η^N0η0)B0B0h+P0(B0h1V0cπ.0Tπ0)N(α^Nα0)+op(1).

Following closely again section 25.12 of van der Vaart (1998) we choose h=(B0B0)1B0.0 and subtract the first equation from the second to find

P0[(IB0(B0B0)1B0).0.0T]N(θ^Nθ0)=P0[(IB0(B0B0)1B0).0.0T]N(θ^N0θ0)P0[(IB0(B0B0)1B0).01V0cπ.0Tπ0]N(α^Nα0)+op(1).

Recognizing that I0=P0.0.0T is the ordinary information for θ, (IB0(B0B0)1B0).0 is the efficient score and I0=P0[(IB0(B0B0)1B0).0.0T] is the efficient information, we multiply both sides of the preceding equation by I01 to find

N(θ^N(α^)θ0)=N(θ^N(α0)θ0)P0(01V0cπ.0Tπ0)N(α^Nα0)+op(1) (18)

which is the second hypothesis (1.2) of Pierce (1982), equivalent to our (12) above, with B=P0(01V0cπ.0T/π0). Theorem 2 now follows from theorem 1 via (9) and (18). The resolution of the gap in Breslow & Wellner’s (2007) argument, namely the demonstration that

N(θ^N(α^)θ0)=NPNπ^0+op(1),

is obtained from (9), (11) and (18) as a corollary to theorem 2.

Acknowledgments

We thank Zhiguo Li for pointing out the gap in the arguments in Breslow & Wellner (2007).

Contributor Information

NORMAN E. BRESLOW, Department of Biostatistics, University of Washington

JON A. WELLNER, Departments of Statistics and Biostatistics, University of Washington

References

  1. Breslow NE, Wellner JA. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Statist. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Kato T. Perturbation theory for linear operators. 2. Springer-Verlag; Berlin: 1976. [Google Scholar]
  3. Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann Statist. 1982;10:475–478. [Google Scholar]
  4. van der Vaart AW. Efficiency of infinite-dimensional M-estimators. Statist Neerlandica. 1995;49:9–30. [Google Scholar]
  5. van der Vaart AW. Asymptotic statistics. Cambridge University Press; Cambridge: 1998. [Google Scholar]
  6. van der Vaart AW. Semiparametric statistics. In: Bolthausen E, Perkins E, van der Vaart A, editors. Lectures on probability theory and statistics. Ecole d’Eté de Probabilités de Saint-Flour XXIX – 1999. Springer-Verlag; Berlin: 2002. pp. 331–457. [Google Scholar]
  7. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. [Google Scholar]
  8. van der Vaart AW, Wellner JA. Preservation theorems for Glivenko–Cantelli and uniform Glivenko–Cantelli classes. In: Giné E, Mason DM, Wellner JA, editors. High dimensional probability II. Birkhäuser; Boston: 2000. pp. 115–133. [Google Scholar]
  9. van der Vaart AW, Wellner JA. Empirical processes indexed by estimated functions. IMS Lecture Notes Monogr Ser. 2007;55:234–252. [Google Scholar]

RESOURCES