WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING

Takumi Saegusa; Jon A Wellner

doi:10.1214/12-AOS1073

. Author manuscript; available in PMC: 2014 Feb 19.

Published in final edited form as: Ann Stat. 2013 Feb 1;41(1):269–295. doi: 10.1214/12-AOS1073

WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING

Takumi Saegusa ^1,², Jon A Wellner ^1,²

PMCID: PMC3929280 NIHMSID: NIHMS496004 PMID: 24563559

Abstract

We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko–Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase.

Key words and phrases: Calibration, estimated weights, weighted likelihood, semiparametric model, regular, nonregular

1. Introduction

Two-phase sampling is a sampling technique that aims at cost reduction and improved efficiency of estimation. At phase I, a large sample is drawn from a population, and information on variables that are easier to measure is collected. These phase I variables may be important variables such as exposure in a regression model, or simply may be auxiliary variables that are correlated with unavailable variables at phase I. The sample space is then stratified based on these phase I variables. At phase II, a subsample is drawn without replacement from each stratum to obtain phase II variables that are costly or difficult to measure. Strata formation seeks either to oversample subjects with important phase I variables, or to effectively sample subjects with targeted phase II variables, or both. This way, two-phase sampling achieves effective access to important variables with less cost.

While two-phase sampling was originally introduced in survey sampling by Neyman [20] for estimation of the “finite population mean” of some variable, it has become increasingly important in a variety of areas of statistics, biostatistics and epidemiology, especially since [22, 33] and [27].

The setting treated here is as follows:

We begin with a semiparametric model 𝒫 for a vector of variables X with values in 𝒳. [The prime examples which we treat in detail in Section 4 are the Cox proportional hazards regression model with (a) right censoring, and (b) interval censoring.]
Let W = (X, U) ∈ 𝒳 × 𝒰 ≡ 𝒲 where U is a vector of “auxiliary variables,” not involved in the model 𝒫. Suppose that W ~ P̃₀ and X ~ P₀. Now suppose that V ≡ (X̃, U) ∈ 𝒱 where X̃ ≡ X̃(X) is a coarsening of X.
At phase I we observe V₁, …, V_N i.i.d. as V, and then use the phase I data to form strata, that is, disjoint subsets 𝒱₁, …, 𝒱_J of 𝒱 with $\sum_{j = 1}^{J} 𝒱_{j} = 𝒱$ . We let N_j = #{i ≤ N : V_i ∈ 𝒱_j}.
Next, a phase II sample is drawn by sampling without replacement n_j ≤ N_j items from stratum j. For the items selected we observe X_i. Thus for the selection indicators ξ_i we have P̃₀ (ξ_i = 1|V_i) = (n_j/N_j)1_{𝒱_j} (V_i) ≡ π₀ (V_i).
Finally weighted likelihood (or inverse probability weighted) estimation methods based on all the observed data are used to estimate the parameters of the model 𝒫 and to make further inferences about the model.

It is now well known that the classical Horvitz–Thompson estimators [9] use only the phase II data and are inefficient, sometimes quite severely so; see, for example, [2, 3, 14, 23] and [34]. Improvements in efficiency of estimation can be achieved by “adjusting” the weights by use of the phase I data (even though the sampling probabilities are known). Two basic methods of adjustment are:

Estimated weights, a method originating in the missing data literature [23], and with significant further developments since in connection with many models in which the missing-ness mechanism is not known, in contrast to our current setting in which the missing-ness is by design.
Calibration, a method originating in the sample survey literature [8]; see also [13, 14].

One of our goals here is to study existing methods for adjustment of the weights of weighted likelihood methods and to introduce several new methods: modified calibration as suggested by Chan [6] and centered calibration as proposed here in Section 2.

A second goal is to give a systematic treatment of estimators based on sampling without replacement at phase II in the setting of general semiparametric models and to make comparisons with the behavior of estimators based on Bernoulli (or independent) sampling at phase II, thus continuing and strengthening the comparisons made in [4, 5], and [2, 3] for a particular sub-class of semiparametric models and adjustments via estimated weights and ordinary calibration. Many studies of the theoretical properties of procedures based on two-phase design data have been made for the case of Bernoulli sampling; see, for example, [11] and the review of case-cohort sampling given there. On the other hand, while statistical practice continues to involve phase II data sampled without replacement, most available theory in this case (other than [4, 5]) has developed on a model-by-model basis. As has become clear from [4, 5], sampling without replacement results in smaller asymptotic variances, and hence inference based on asymptotic variances derived from Bernoulli sampling will often be conservative. Our treatment here provides theory and tools for dealing directly with the sampling without replacement design. We do this by providing the relevant theory both for semiparametric models in which the infinite-dimensional nuisance parameters can be estimated at a regular rate ( $\sqrt{n}$ ) with complete data, and semiparametric models in which the infinite-dimensional nuisance parameters can only be estimated at slower (nonregular) rates.

The main contributions of our paper are three-fold: First, we establish two Z-theorems giving weak sufficient conditions for asymptotic distributions of the WLEs in general semiparametric models. The first theorem covers the case where the nuisance parameter is estimable at a regular rate; this yields rigorous justification of [2, 3] under weaker conditions. The second theorem covers the case of general semiparametric models with nonregular rates for estimators of the nuisance parameters. The conditions of our theorems, formulated in terms of complete data, are almost identical to those for the MLE with complete data. This formulation allows us to use tools from empirical process theory together with the new tools developed here in a straightforward way. Second, we propose centered calibration, a new calibration method. This new calibration method is the only one guaranteed to yield improved efficiency over the plain WLE under both Bernoulli sampling and sampling without replacement, while other methods are warranted only for Bernoulli sampling. Third, we establish general results for the inverse probability weighted (IPW) empirical process. Some results such as a Glivenko–Cantelli theorem (Theorem 5.1) and a Donsker theorem (Theorem 5.3) are of interest in their own right. These results accounting for dependence due to the sampling design are useful in verifying the conditions of Z-theorems in applications. For instance, Theorems 5.1 and 5.2 easily establish consistency and rates of convergence under our “without replacement” sampling scheme. We illustrate application of the general results with examples in Section 4.

The rest of the paper is organized as follows. In Section 2, we introduce our estimation procedures in the context of a general semiparametric model. The WLE and methods involving adjusted weights are discussed. Two Z-theorems are presented in Section 3; these yield asymptotic distributions of the WLEs of finite-dimensional parameters of the model. All estimators are compared under Bernoulli sampling and sampling without replacement with different methods of adjusting weights. In Section 4 we apply our Z-theorems to the Cox model with both right censoring and interval censoring. Section 5 consists of general results for IPW empirical processes. Several open problems are briefly discussed in Section 6. All proofs, except those in Section 4 and auxiliary results, are collected in [25].

2. Sampling, models and estimators

We use the basic notation introduced in the previous section. After stratified sampling, X is fully observed for n_j subjects in the jth stratum at phase II. The observed data is (V, Xξ, ξ) where ξ is the indicator of being sampled at phase II. We use a doubly subscripted notation: for example, V_j,i denotes V for the ith subject in stratum j. We denote the stratum probability for the jth stratum by ν_j ≡ P̃₀(V ∈ 𝒱_j), and the conditional expectation given membership in the jth stratum by P_0|j (·) ≡ P̃₀(·|V ∈ 𝒱_j).

The sampling probability is P(ξ = 1|V_i) = π₀ (V_i) = n_j/N_j for V_i ∈ 𝒱_j. These sampling probabilities are assumed to be strictly positive; that is, there is a constant σ > 0 such that 0 < σ ≤ π₀(υ) ≤ 1 for υ ∈ 𝒱. We assume that n_j/N_j → p_j > 0 for j = 1, …, J as N → ∞. Although dependence is induced among the observations (V_i, ξ_i X_i, ξ_i) by the sampling indicators, the vector of sampling indicators (ξ_j1, …, ξ_{jN_j}) within strata, are exchangeable for each j = 1, …, J, and the J random vectors (ξ_j1, …, ξ_{jN_j}) are independent.

The empirical measure is one of the most useful tools in empirical process theory. Because the X_i’s are observed only for a sub-sample at phase II, we define, instead, the IPW empirical measure $ℙ_{N}^{π}$ by

ℙ_{N}^{π} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{0} (V_{i})} δ_{X_{i}} = \frac{1}{N} \sum_{j = 1}^{J} \sum_{i = 1}^{N_{j}} \frac{ξ_{j, i}}{n_{j} / N_{j}} δ_{X_{j, i}},

where δ_{X_i} denotes a Dirac measure placing unit mass on X_i. The identity in the last display is justified by the arguments in Appendix A of [4]. We also define the IPW empirical process by $𝔾_{N}^{π} = \sqrt{N} (ℙ_{N}^{π} - P_{0})$ and the phase II empirical process for the jth stratum by $𝔾_{j, N_{j}}^{ξ} \equiv \sqrt{N_{j}} (ℙ_{j, N_{j}}^{ξ} - (n_{j} / N_{j}) ℙ_{j, N_{j}})$ , j = 1, …, J, $ℙ_{j, N_{j}}^{ξ} \equiv N_{j}^{- 1} \sum_{i = 1}^{N_{j}} ξ_{j, i} δ_{X_{j, i}}$ is the phase II empirical measure for the jth stratum, and $ℙ_{j, N_{j}} \equiv N_{j}^{- 1} \sum_{i = 1}^{N_{j}} δ_{X_{j, i}}$ is the empirical measure for all the data in the jth stratum; note that the latter empirical measure is not observed. Then, following [4], we decompose $𝔾_{N}^{π}$ as follows:

𝔾_{N}^{π} = 𝔾_{N} + \sum_{j = 1}^{J} \sqrt{\frac{N_{j}}{N}} (\frac{N_{j}}{n_{j}}) 𝔾_{j, N_{j}}^{ξ},

(2.1)

where $ℙ_{N} = N^{- 1} \sum_{j = 1}^{J} N_{j} ℙ_{j, N_{j}} and 𝔾_{N} = \sqrt{N} (ℙ_{N} - P_{0})$ . Notice that $𝔾_{j, N_{j}}^{ξ}$ correspond to “exchangeably weighted bootstrap” versions of the stratum-wise complete data empirical processes $𝔾_{j, N_{j}} \equiv \sqrt{N_{j}} (ℙ_{j, N_{j}} - P_{0 | j})$ . This observation allows application of the “exchangeably weighted bootstrap” theory of [21] and [32], Section 3.6.

2.1. Improving efficiency by adjusting weights

Efficiency of estimators based on IPW empirical processes can be improved by adjusting weights, either by estimated weights [23] or by calibration [8] via use of the phase I data; see also [14]. Besides these, we discuss two variants of calibration, modified calibration [6], and our proposed new method, centered calibration.

Let Z_i ≡ g(V_i) be the auxiliary variables for the ith subject for a known transformation g. For estimated weights with binary regression, Z_i contains the membership indicators for the strata I_{𝒱_j} (V_i), j = 1, …, J. Observations with π₀(V) = 1 are dropped from binary regression, and the original weight 1 is used. For notational simplicity, we write Z_i for either method, and assume that sampling probabilities are strictly less than 1 for all strata.

2.1.1. Estimated weights

The method of estimated weights adjusts weights through binary regression on the phase I variables. The sampling probability for the ith subject is modeled by $p_{α} (ξ_{i} | Z_{i}) = G_{e} {(Z_{i}^{T} α)}^{ξ_{i}} {(1 - G_{e} (Z_{i}^{T} α))}^{1 - ξ_{i}} \equiv π_{α} {(V_{i})}^{ξ_{i}} {1 - π_{α} (V_{i})}^{1 - ξ_{i}}$ , where α ∈ 𝒜_e ⊂ ℝ^J+k is a regression parameter and G_e : ℝ ↦ [0, 1] is a known function. If G_e(x) = e^x/(1+e^x), for instance, then the adjustment simply involves logistic regression. Let α̂_N be the estimator of α that maximizes the pseudo- (or composite) likelihood

\prod_{i = 1}^{N} p_{α} (ξ_{i} | Z_{i}) = \prod_{i = 1}^{N} G_{e} {(Z_{i}^{T} α)}^{ξ_{i}} {(1 - G_{e} (Z_{i}^{T} α))}^{1 - ξ_{i}} .

(2.2)

We define the IPW empirical measure with estimated weights by

ℙ_{N}^{π, e} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{{\hat{α}}_{N}} (V_{i})} δ_{X_{i}} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{0} (V_{i})} \frac{π_{0} (V_{i})}{G_{e} (Z_{i}^{T} {\hat{α}}_{N})} δ_{X_{i}},

and the IPW empirical process with estimated weights by $𝔾_{N}^{π, e} = \sqrt{N} (ℙ_{N}^{π, e} - P_{0})$ .

2.1.2. Calibration

Calibration adjusts weights so that the inverse probability weighted average from the phase II sample is equated to the phase I average, whereby the phase I information is taken into account for estimation. Specifically, we find an estimator α̂_N that is the solution for α ∈ 𝒜_c ⊂ ℝ^k of the following calibration equation:

\frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i} G_{c} (V_{i}; α)}{π_{0} (V_{i})} Z_{i} = \frac{1}{N} \sum_{i = 1}^{N} Z_{i},

(2.3)

where G_c(V ; α) ≡ G(g(V)^T α) = G(Z^T α) for known G with G(0) = 1 and Ġ(0) > 0. We call π_α(V) ≡ π₀(V)/G_c(V ; α) the calibrated sampling probability. We define the calibrated IPW empirical measure by

ℙ_{N}^{π, c} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{{\hat{α}}_{N}} (V_{i})} δ_{X_{i}} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{0} (V_{i})} G (Z_{i}^{T} {\hat{α}}_{N}) δ_{X_{i}}

and the calibrated IPW empirical process by $𝔾_{N}^{π, c} = \sqrt{N} (ℙ_{N}^{π, c} - P_{0})$ .

Examples for G in the definition of G_c are listed in [8] (F in their notation). For $G (x) = 1 + x, ℙ_{N}^{π . c} X$ is a well-known regression estimator of the mean of X. Since we assume boundedness of G later, we may want to consider truncated versions of these examples instead. Note that choice of G in (variants of) calibration does not affect asymptotic results on WLEs.

As noted in [13], there are several different approaches to calibration. Here, and in introducing variants of calibration below, we adopt the view that calibration proceeds by making the smallest possible change in weights in order to match an estimated phase II average with the corresponding phase I average. Another approach proceeds via regression modeling of the variable X of interest and the auxiliary variables V, leading to a robustness discussion on effects of the validity of the model on estimation for X. We prefer the former view because we do not assume a model for X and V throughout this paper. In fact, our results are independent of such a modeling assumption.

2.1.3. Modified calibration

Modifying the function G_c in calibration so that individuals with higher sampling probabilities π(V_i) receive less weight was proposed by [6] in a missing response problem where observations are i.i.d. (see, e.g., [28] for recent developments in this area and [14] for their connections with calibration methods). An interpretation of this method within the framework of [8] is discussed in [26]. In modified calibration, we find the estimator α̂_N that is the solution for α ∈ 𝒜_mc ⊂ ℝ^k of the following calibration equation:

\frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i} G_{m c} (V_{i}; α)}{π_{0} (V_{i})} Z_{i} = \frac{1}{N} \sum_{i = 1}^{N} Z_{i},

(2.4)

where G_mc(V ; α) ≡ G((π₀(V)⁻¹ − 1) Z^T α) for known G with G(0) = 1 and Ġ(0) > 0. We call π_α(V) ≡ π₀(V)/G_mc(V ; α) the calibrated sampling probability with modified calibration. We define the IPW empirical measure with modified calibration by

ℙ_{N}^{π, m c} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{{\hat{α}}_{N}} (V_{i})} δ_{X_{i}} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{0} (V_{i})} G (\frac{1 - π_{0} (V_{i})}{π_{0} (V_{i})} Z_{i}^{T} {\hat{α}}_{N}) δ_{X_{i}}

and the corresponding IPW empirical process by $𝔾_{N}^{π, m c} = \sqrt{N} (ℙ_{N}^{π, m c} - P_{0})$ .

2.1.4. Centered calibration

We propose a new method, centered calibration, that calibrates on centered auxiliary variables with modified calibration. This method improves the plain WLE under our sampling scheme, while retaining the good properties of modified calibration. See Section 3.4 for a discussion of its advantage and connections to other methods.

In centered calibration, we find the estimator α̂_N that is the solution for α ∈ 𝒜_cc ⊂ ℝ^k of the following calibration equation:

\frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i} G_{c c} (V_{i}; α)}{π_{0} (V_{i})} (Z_{i} - {\bar{Z}}_{N}) = 0,

(2.5)

where G_cc(V ; α) ≡ G((π₀ (V)⁻¹ − 1)(Z − Z̅_N)^T α) for known G with G(0) = 1 and Ġ(0) > 0 and ${\bar{Z}}_{N} = N^{- 1} \sum_{i = 1}^{N} Z_{i}$ . We call π_α(V) ≡ π₀(V)/G_cc(V ; α) the calibrated sampling probability with centered calibration. We define the IPW empirical measure with centered calibration by

ℙ_{N}^{π, c c} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{{\hat{α}}_{N}} (V_{i})} δ_{X_{i}} = \frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i}}{π_{0} (V_{i})} G_{c c} (V_{i}; {\hat{α}}_{N}) δ_{X_{i}}

and the corresponding IPW empirical process by $𝔾_{N}^{π, c c} = \sqrt{N} (ℙ_{N}^{π, c c} - P_{0})$ .

2.2. Estimators for a semiparametric model 𝒫

We study the asymptotic distribution of the weighted likelihood estimator of a finite-dimensional parameter θ in a general semiparametric model 𝒫 = {P_θ,η : θ ∈ Θ, η ∈ H} where Θ ⊂ ℝ^p and the nuisance parameter space H is a subset of some Banach space ℬ. Let P₀ = P_θ₀,η₀ denote the true distribution.

The MLE for complete data is often obtained as a solution to the infinite-dimensional likelihood equations. In such models, the WLE under two-phase sampling is obtained by solving the corresponding infinite-dimensional inverse probability weighted likelihood equations. Specifically, the WLE (θ̂_N, η̂_N) is a solution to the following weighted likelihood equations:

Ψ_{N, 1}^{π} (θ, η) = ℙ_{N}^{π} {\dot{ℓ}}_{θ, η} = o_{P *} (N^{- 1 / 2}), {∥ Ψ_{N, 2}^{π} (θ, η) h ∥}_{ℋ} = {∥ ℙ_{N}^{π} (B_{θ, η} h - P_{θ, η} B_{θ, η} h) ∥}_{ℋ} = o_{P *} (N^{- 1 / 2}),

(2.6)

where ${\dot{ℓ}}_{θ, η} \in ℒ_{2}^{0} {(P_{θ, η})}^{p}$ is the score function for θ, and the score operator $B_{θ, η} : ℋ \mapsto ℒ_{2}^{0} (P_{θ, η})$ is the bounded linear operator mapping a direction h in some Hilbert space ℋ of one-dimensional submodels for η along which η → η₀. The WLE with estimated weights (θ̂_N,e, η̂_N,e), the calibrated WLE (θ̂_N,c, η̂_N,c), the WLE with modified calibration (θ̂_N,mc, η̂_N,mc) and the WLE with centered calibration (θ̂_N,cc, η̂_N,cc) are obtained by replacing $ℙ_{N}^{π} by ℙ_{N}^{π, #}$ with # ∈ {e, c, mc, cc} in (2.6), respectively. Let ℓ̇₀ = ℓ̇_{θ₀, η₀}, and B₀ = B_{θ₀, η₀}.

3. Asymptotics for the WLE in general semiparametric models

We consider two cases: in the first case the nuisance parameter η is estimable at a regular (i.e., $\sqrt{n}$ ) rate, and for ease of exposition, η is assumed to be a measure. In the second case η is only estimable at a nonregular (slower than $\sqrt{n}$ ) rate. Our theorem (Theorem 3.2) concerning the second case nearly covers the former case, but requires slightly more smoothness and a separate proof of the rate of convergence for an estimator of η. On the other hand, our theorem (Theorem 3.1) concerning the former case includes a proof of the (regular) ( $\sqrt{n}$ ) rate of convergence, and hence is of interest by itself.

3.1. Conditions for adjusting weights

We assume the following conditions for estimators of α for adjusted weights. Throughout this paper, we may assume both Conditions 3.1 and 3.2 at the same time, but it should be understood that the former condition is used exclusively for the estimators regarding estimated weights and the latter condition is imposed only for estimators regarding (variants of) calibration. Also, it should be understood that Conditions 3.2(a)(i) and (d)(i), Conditions 3.2(a)(ii) and (d)(ii) and Conditions 3.2(a)(iii) and (d)(iii) are assumed for estimators defined via calibration, modified calibration and centered calibration, respectively.

Condition 3.1 (Estimated weights). (a) The estimator α̂_N is a maximizer of the pseudo-likelihood (2.2).

(b)
Z ∈ ℝ^J+k is not concentrated on a (J + k)-dimensional affine space of ℝ^J+k and has bounded support.
(c)
G_e : ℝ ↦ [0, 1] is a twice continuously differentiable, monotone function.
(d)
S₀ ≡ P₀[{Ġ_e(Z^T α₀)}²{π₀(V)(1−π₀(V))}⁻¹ Z^⊗2] is finite and nonsingular, where Ġ_e is a derivative of G_e.
(e)
The “true” parameter α₀ = (α_{0, 1}, …, α_{0, J+k}) is given by $α_{0, j} = G_{e}^{- 1} (p_{j})$ for j = 1, …, J and α_{0, j} = 0, for j = J + 1, …, J + k. The parameter α is identifiable, that is, p_α = p_α₀ almost surely implies α = α₀.
(f)
For a fixed p_j ∈ (0, 1), n_j satisfies n_j = [N_jp_j] for j = 1, …, J.

Condition 3.2 (Calibrations). (a) (i) The estimator ${\hat{α}}_{N} = {\hat{α}}_{N}^{c}$ is a solution of calibration equation (2.3). (ii) The estimator ${\hat{α}}_{N} = {\hat{α}}_{N}^{m c}$ is a solution of calibration equation (2.4). (iii) The estimator ${\hat{α}}_{N} = {\hat{α}}_{N}^{c c}$ is a solution of calibration equation (2.5).

(b)
Z ∈ ℝ^k is not concentrated at 0 and has bounded support.
(c)
G is a strictly increasing continuously differentiable function on ℝ such that G(0) = 1 and for all x, −∞ < m₁ ≤ G(x) ≤ M₁ < ∞ and 0 < Ġ(x) ≤ M₂ < ∞, where Ġ is the derivative of G.
(d)
(i) P₀Z^⊗2 is finite and positive definite. (ii) P₀[π₀(V)⁻¹(1 − π₀(V))Z^⊗2] is finite and positive definite. (iii) P₀[π₀(V)⁻¹ (1 − π₀(V)) (Z − μ_Z)^⊗2] is finite and positive definite where μ_Z = PZ.
(e)
(e) The “true” parameter α₀ = 0.

Condition 3.1(f) may seem unnatural at first, but in practice the phase II sample size n_j can be chosen by the investigator so that the sampling probability p_j can be understood to be automatically chosen to satisfy n_j = [N_jp_j]. The other parts of Condition 3.1 are standard in binary regression, and Condition 3.2 is similar to Condition 3.1.

Asymptotic properties of α̂_N for all methods are proved in [25].

3.2. Regular rate for a nuisance parameter

We assume the following conditions.

Condition 3.3 (Consistency). The estimator (θ̂_N, η̂_N) is consistent for (θ₀, η₀) and solves the weighted likelihood equations (2.6), where $ℙ_{N}^{π}$ may be replaced by $ℙ_{N}^{π, #}$ with # ∈ {e, c, mc, cc} for the estimators with adjusted weights.

Condition 3.4 (Asymptotic equicontinuity). Let ℱ₁(δ) = {ℓ̇_θ,η : |θ − θ₀| + ∥η − η₀∥ < δ} and ℱ₂(δ) = {B_θ,ηh − P_θ,ηB_θ,ηh : h ∈ ℋ, |θ − θ₀| + ∥η − η₀∥ < δ}. There exists a δ₀ > 0 such that (1) ℱ_k(δ₀), k = 1, 2, are P₀-Donsker and sup_h∈ℋ P₀|f_j − f_{0, j}|² → 0, as |θ − θ₀| + ∥η − η₀∥ → 0, for every f_j ∈ ℱ_j(δ₀), j = 1, 2, where f_0,1 = ℓ̇_θ₀,η₀ and f_0,2 = B₀h − P₀B₀h, (2) ℱ_k(δ₀), k = 1, 2, have integrable envelopes.

Condition 3.5. The map Ψ = (Ψ₁, Ψ₂) : Θ × H ↦ ℝ^p × ℓ^∞ (ℋ) with components

Ψ_{1} (θ, η) \equiv P_{0} Ψ_{N, 1} (θ, η) = P_{0} {\dot{ℓ}}_{θ, η}, Ψ_{2} (θ, η) h \equiv P_{0} Ψ_{N, 2} (θ, η) = P_{0} B_{θ, η} h - P_{θ, η} B_{θ, η} h, h \in ℋ,

has a continuously invertible Fréchet derivative map Ψ̇₀ = (Ψ̇₁₁, Ψ̇₁₂, Ψ̇₂₁, Ψ̇₂₂) at (θ₀, η₀) given by Ψ̇_ij (θ₀, η₀)h = P₀(ψ̇_{i,j, θ₀, η₀,h}), i, j ∈ {1, 2} in terms of L₂(P₀) derivatives of ψ_1,θ,η,h = ℓ̇_θ,η and ψ_2,θ,η,h = B_θ,ηh − P_θ,ηB_θ,ηh; that is,

sup_{h \in ℋ} {[P_{0} {ψ_{i, θ, η_{0}, h} - ψ_{i, θ_{0}, η_{0}, h} - {\dot{ψ}}_{i 1, θ_{0}, η_{0}, h} (θ - θ_{0})}^{2}]}^{1 / 2} = o (∥ θ - θ_{0} ∥), sup_{h \in ℋ} {[P_{0} {ψ_{i, θ_{0}, η, h} - ψ_{i, θ_{0}, η_{0}, h} - {\dot{ψ}}_{i 2, θ_{0}, η_{0}, h} (η - η_{0})}^{2}]}^{1 / 2} = o (∥ η - η_{0} ∥) .

Furthermore, Ψ̇₀ admits a partition

(θ - θ_{0}, η - η) \mapsto (\begin{matrix} {\dot{Ψ}}_{11} & {\dot{Ψ}}_{12} \\ {\dot{Ψ}}_{21} & {\dot{Ψ}}_{22} \end{matrix}) (\begin{matrix} θ - θ_{0} \\ η - η_{0} \end{matrix}),

where

{\dot{Ψ}}_{11} (θ - θ_{0}) = - P_{θ_{0}, η_{0}} {\dot{ℓ}}_{θ_{0}, η_{0}} {\dot{ℓ}}_{θ_{0}, η_{0}}^{T} (θ - θ_{0}), {\dot{Ψ}}_{12} (η - η_{0}) = - \int B_{θ_{0}, η_{0}}^{*} {\dot{ℓ}}_{θ_{0}, η_{0}} d (η - η_{0}), {\dot{Ψ}}_{21} (θ - θ_{0}) h = - P_{θ_{0}, η_{0}} B_{θ_{0}, η_{0}} h {\dot{ℓ}}_{θ_{0}, η_{0}}^{T} (θ - θ_{0}), {\dot{Ψ}}_{22} (η - η_{0}) h = - \int B_{θ_{0}, η_{0}}^{*} B_{θ_{0}, η_{0}} h d (η - η_{0})

and $B_{θ_{0}, η_{0}}^{*} B_{θ_{0}, η_{0}}$ is continuously invertible.

Let ${\tilde{I}}_{0} = P_{0} [(I - B_{0} {(B_{0}^{*} B_{0})}^{- 1} B_{0}^{*}) {\dot{ℓ}}_{0} {\dot{ℓ}}_{0}^{T}]$ be the efficient information for θ and ${\tilde{ℓ}}_{0} = {\tilde{I}}_{0}^{- 1} (I - B_{0} {(B_{0}^{*} B_{0})}^{- 1} B_{0}^{*}) {\dot{ℓ}}_{0}$ be the efficient influence function for θ for the semiparametric model with complete data.

Theorem 3.1. Under Conditions 3.1–3.5,

\sqrt{N} ({\hat{θ}}_{N} - θ_{0}) = \sqrt{N} ℙ_{N}^{π} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z ~ N_{p} (0, Σ), \sqrt{N} ({\hat{θ}}_{N, #} - θ_{0}) = \sqrt{N} ℙ_{N}^{π, #} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z_{#} ~ N_{p} (0, Σ_{#}),

where # ∈ {e, c, mc, cc},

Σ \equiv I_{0}^{- 1} \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} {Var}_{0 | j} ({\tilde{ℓ}}_{0}),

(3.1)

Σ_{#} \equiv I_{0}^{- 1} + \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} {Var}_{0 | j} ((I - Q_{#}) {\tilde{ℓ}}_{0})

(3.2)

and (recall Conditions 3.1 and 3.2)

Q_{e} f \equiv P_{0} [π_{0}^{- 1} (V) f {\dot{G}}_{e} (Z^{T} α_{0}) Z^{T}] S_{0}^{- 1} {(1 - π_{0} (V))}^{- 1} {\dot{G}}_{e} (Z^{T} α_{0}) Z, Q_{c} f \equiv P_{0} [f Z^{T}] {P_{0} Z^{\otimes 2}}^{- 1} Z, Q_{m c} f \equiv P_{0} [(π_{0}^{- 1} (V) - 1) f Z^{T}] {P_{0} [(π_{0}^{- 1} (V) - 1) Z^{\otimes 2}]}^{- 1} Z, Q_{c c} f \equiv P_{0} [(π_{0}^{- 1} (V) - 1) f {(Z - μ_{Z})}^{T}] {P_{0} [(π_{0}^{- 1} (V) - 1) {(Z - μ_{Z})}^{\otimes 2}]}^{- 1} \times (Z - μ_{Z}) .

Remark 3.1. Our conditions in Theorem 3.1 are the same as those in [5] except the integrability condition. Our Condition 3.4(2) requires existence of integrable envelopes for class of scores while the condition (A1*) in [5] requires square integrable envelopes. Note that this integrability condition is required only for the WLE with adjusted weights, as in [4].

Remark 3.2. As can be seen from the definition of Q_#, the choice of G in calibration does not affect the asymptotic variances while G_e in the method of estimated weights does affect the asymptotic variance.

3.3. Nonregular rate for a nuisance parameter

For h = (h₁, …, h_p)^T with h_k ∈ H, k = 1, …, p, let B_θ,η[h] = (B_θ,ηh₁, …, B_θ,ηh_p)^T. We assume the following conditions.

Condition 3.6 (Consistency and rate of convergence). An estimator (θ̂_N,η̂_N) of (θ₀,η₀) satisfies |θ̂_N − θ₀| = o_P (1), and ∥η̂_N − η₀∥ = O_P (N^−β) for some β > 0.

Condition 3.7 (Positive information). There is an ${\underline{h}}^{*} = (h_{1}^{*}, \dots, h_{p}^{*})$ , where $h_{k}^{*} \in H$ for k = 1, …, p, such that

P_{0} {({\dot{ℓ}}_{0} - B_{0} [{\underline{h}}^{*}]) B_{0} h} = 0 for all h \in H .

The efficient information I₀ ≡ P₀(ℓ̇₀ − B₀[h*])^⊗2 for θ for the semiparametric model with complete data is finite and nonsingular. Denote the efficient influence function for the semiparametric model with complete data by ${\tilde{ℓ}}_{0} \equiv I_{0}^{- 1} ({\dot{ℓ}}_{0} - B_{0} [{\underline{h}}^{*}])$ .

Condition 3.8 (Asymptotic equicontinuity). (1) For any δ_N ↓ 0 and C > 0,

sup_{| θ - θ_{0} | \leq δ_{N}, ∥ η - η_{0} ∥ \leq C N^{- β}} | 𝔾_{N} ({\dot{ℓ}}_{θ, η} - {\dot{ℓ}}_{0}) | = o_{P} (1), sup_{| θ - θ_{0} | \leq δ_{N}, ∥ η - η_{0} ∥ \leq C N^{- β}} | 𝔾_{N} (B_{θ, η} - B_{0}) [{\underline{h}}^{*}] | = o_{P} (1) .

(2) There exists a δ > 0 such that the classes {ℓ̇_θ,η : |θ − θ₀| + ∥η − η₀∥ ≤ δ} and {B_θ,η[h*] : |θ − θ₀| + ∥η − η₀∥ ≤ δ} are P₀-Glivenko–Cantelli and have integrable envelopes. Moreover, ℓ̇_θ,η and B_θ,η[h*] are continuous with respect to (θ, η) either pointwise or in L₁(P₀).

Condition 3.9 (Smoothness of the model). For some α > 1 satisfying αβ > 1/2 and for (θ, η) in the neighborhood {(θ, η) : |θ − θ₀| ≤ δ_N, ∥η − η₀∥ ≤ CN^−β},

| P_{0} {{\dot{ℓ}}_{θ, η} - {\dot{ℓ}}_{0} + {\dot{ℓ}}_{0} ({\dot{ℓ}}_{0}^{T} (θ - θ_{0}) + B_{0} [η - η_{0}])} | = o (| θ - θ_{0} |) + O ({∥ η - η_{0} ∥}^{α}), | P_{0} {(B_{θ, η} - B_{0}) [{\underline{h}}^{*}] + B_{0} [{\underline{h}}^{*}] ({\dot{ℓ}}_{0}^{T} (θ - θ_{0}) + B_{0} [η - η_{0}])} | = o (| θ - θ_{0} |) + O ({∥ η - η_{0} ∥}^{α}) .

In the previous section, we required that the WLE solves the weighted likelihood equations (2.6) for all h ∈ ℋ. Here, we only assume that the WLE (θ̂_N, η̂_N) satisfies the weighted likelihood equations

Ψ_{N, 1}^{π} (θ, η, α) = ℙ_{N}^{π} {\dot{ℓ}}_{θ, η} = o_{P *} (N^{- 1 / 2}), Ψ_{N, 2}^{π} (θ, η, α) [{\underline{h}}^{*}] = ℙ_{N}^{π} B_{θ, η} [{\underline{h}}^{*}] = o_{P *} (N^{- 1 / 2}) .

(3.3)

The corresponding WLEs with adjusted weights, (θ̂_N,#, η̂_N,#) with # ∈ {e, c, mc, cc} satisfy (3.3) with $ℙ_{N}^{π}$ replaced by $ℙ_{N}^{π, #}$ .

Theorem 3.2. Suppose that the WLE is a solution of (3.3) where $ℙ_{N}^{π}$ may be replaced by $ℙ_{N}^{π, #}$ with # ∈ {e, c, mc, cc} for the estimators with adjusted weights. Under Conditions 3.1, 3.2 and 3.6–3.9,

\sqrt{N} ({\hat{θ}}_{N} - θ_{0}) = \sqrt{N} ℙ_{N}^{π} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z ~ N_{p} (0, Σ), \sqrt{N} ({\hat{θ}}_{N, #} - θ_{0}) = \sqrt{N} ℙ_{N}^{π, #} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z_{#} ~ N_{p} (0, Σ_{#}),

where Σ and Σ_# are as defined in (3.1) and (3.2) of Theorem 3.1, but now I₀ and ℓ̃₀ are defined in Condition 3.7, and Q_# are defined in Theorem 3.1.

Remark 3.3. Our conditions are identical to those of the Z-theorem of [10] except Condition 3.8(2). This additional condition is not stringent for the following reasons. First, the Glivenko–Cantelli condition is usually assumed to prove consistency of estimators before deriving asymptotic distributions. Second, a stronger L₂(P₀)-continuity condition is standard as is seen in Condition 3.4 (see also Section 25.8 of [31]). Note that the L₁(P₀)-continuity condition is only required for the WLEs with adjusted weights.

3.4. Comparisons of methods

We compare asymptotic variances of five WLEs in view of improvement by adjusting weights and change of designs. We also include in comparison special cases of adjusting weights involving stratum-wise adjustment.

3.4.1. Stratified Bernoulli sampling

We first give a statement of the result corresponding to Theorem 3.1 for stratified Bernoulli sampling where all subjects are independent with the sampling probability p_j if V ∈ 𝒱_j and ${\hat{θ}}_{N}^{Bern} and {\hat{θ}}_{N, #}^{Bern}$ with # ∈ {e, c, mc, cc} are the corresponding WLE and WLEs with adjusted weights.

Theorem 3.3. Suppose Conditions 3.1 [except Condition 3.1(f)] and 3.2 hold. Let ξ_i ∈ {0, 1} and ξ be i.i.d. with $E [ξ | V] = π_{0} (V) = \sum_{j = 1}^{J} p_{j} I (V \in 𝒱_{j})$ .

Suppose that the WLE is a solution of (3.3) where $ℙ_{N}^{π}$ may be replaced by $ℙ_{N}^{π, #}$ with # ∈ {e, c, mc, cc} for the estimators with adjusted weights. Under the same conditions as in Theorem 3.1,
$\sqrt{N} ({\hat{θ}}_{N}^{Bern} - θ_{0}) = \sqrt{N} ℙ_{N}^{π} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z^{Bern} ~ N_{p} (0, Σ^{Bern}), \sqrt{N} ({\hat{θ}}_{N, #}^{Bern} - θ_{0}) = \sqrt{N} ℙ_{N}^{π, #} {\tilde{ℓ}}_{0} + o_{P *} (1) ⇝ Z_{#}^{Bern} ~ N_{p} (0, Σ_{#}^{Bern}),$
where
$Σ^{Bern} \equiv I_{0}^{- 1} + \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} P_{0 | j} {({\tilde{ℓ}}_{0})}^{\otimes 2},$ (3.4)

$Σ_{#}^{Bern} \equiv I_{0}^{- 1} + \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} P_{0 | j} {((I - Q_{#}) {\tilde{ℓ}}_{0})}^{\otimes 2},$ (3.5)
where Q_# with # ∈ {e, c, mc, cc} are defined in Theorem 3.1.
Under the same conditions as in Theorem 3.2, the same conclusions in (1) hold with I₀ and ℓ̃₀ replaced by those defined in Condition 3.7.

Comparing the variance–covariance matrices in Theorem 3.3 to those in Theorems 3.1 and 3.2, we obtain the following corollary comparing designs. All estimators have smaller variances under sampling without replacement.

Corollary 3.1. Under the same conditions as in Theorem 3.3,

Σ = Σ^{Bern} - \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} {P_{0 | j} {\tilde{ℓ}}_{0}}^{\otimes 2}, Σ_{#} = Σ_{#}^{Bern} - \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} {P_{0 | j} (I - Q_{#}) {\tilde{ℓ}}_{0}}^{\otimes 2}, # \in {e, c, m c, c c} .

Variance formulas (3.5) with # ∈ {e, mc, cc} except for the ordinary calibration have the following alternative representations which show the efficiency gains over the plain WLE under Bernoulli sampling.

Corollary 3.2. Under the same conditions as in Theorem 3.3,

Σ_{#}^{Bern} = Σ^{Bern} - Var (\frac{ξ - π_{0} (V)}{π_{0} (V)} Q_{#} {\tilde{ℓ}}_{0}), # \in {e, m c, c c} .

3.4.2. Within-stratum adjustment of weights

Adjusting weights can be carried out in every stratum. This is proposed by Breslow et al. [2, 3] for ordinary calibration. Consider calibration on Z̃ where Z̃ ≡ (Z⁽¹⁾, …, Z^(J))^T with Z^(j) ≡ I (V ∈ 𝒱_j)Z^T . The calibration equation (2.3) becomes

\frac{1}{N} \sum_{i = 1}^{N} \frac{ξ_{i} G_{c} ({\tilde{Z}}_{i}; α)}{π_{0} (V_{i})} Z_{i} I (V_{i} \in 𝒱_{j}) = \frac{1}{N} \sum_{i = 1}^{N} Z_{i} I (V_{i} \in 𝒱_{j}), j = 1, \dots, J,

where α ∈ ℝ^Jk. We call this special case within-stratum calibration. We define within-stratum modified and centered calibration analogously.

We also call estimated weights carried out within stratum within-stratum estimated weights. Recall that Z in estimated weights contains the membership indicators for the strata and the rest are other auxiliary variables, say Z^[2]. Within-stratum estimated weights uses Z̃ ≡ (Z⁽¹⁾, …, Z^(J))^T where Z^(j) ≡ I (V ∈ 𝒱_j)(Z^[2])^T with 1 included in Z^[2]. The “true” parameter α̃₀ has zero for all elements except having $G_{e}^{- 1} (p_{j})$ for the element corresponding to I (V ∈ 𝒱_j), j = 1, …, J.

The following corollary summarizes within-stratum adjustment of weights under stratified Bernoulli sampling and sampling without replacement. All methods achieve improved efficiency over the plain WLE under Bernoulli sampling while centered calibration is the only method to yield a guaranteed improvement under sampling without replacement. This is because centering yields the $L_{2}^{0} (P_{0 | j})$ -projection suitable for the conditional variances in (3.2) while noncentering results in the L₂(P_0|j)-projection for the conditional expectations in (3.5).

Corollary 3.3. (1) (Bernoulli) Under the same conditions as in Theorem 3.3 with Z replaced by Z̃ and α₀ replaced by α̃₀ for within-stratum estimated weights,

Σ_{#}^{Bern} = Σ^{Bern} - \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} P_{0 | j} {(Q_{#}^{(j)} {\tilde{ℓ}}_{0})}^{\otimes 2},

(3.6)

where # ∈ {e, c, mc, cc} and

Q_{e}^{(j)} f \equiv P_{0 | j} [f {\dot{G}}_{e} ({\tilde{Z}}^{T} {\tilde{α}}_{0}) {(Z^{[2]})}^{T}] {P_{0 | j} {\dot{G}}_{e}^{2} ({\tilde{Z}}^{T} {\tilde{α}}_{0}) {(Z^{[2]})}^{\otimes 2}}^{- 1} \times {\dot{G}}_{e} ({\tilde{Z}}^{T} {\tilde{α}}_{0}) I (V \in 𝒱_{j}) Z^{[2]}, Q_{c}^{(j)} f \equiv P_{0 | j} [f Z^{T}] {P_{0 | j} [Z^{\otimes 2}]}^{- 1} I (V \in 𝒱_{j}) Z, Q_{m c}^{(j)} f \equiv Q_{c}^{(j)} f, Q_{c c}^{(j)} f \equiv P_{0 | j} [f {(Z - μ_{Z, j})}^{T}] {P_{0 | j} [{(Z - μ_{Z, j})}^{\otimes 2}]}^{- 1} I (V \in 𝒱_{j}) (Z - μ_{Z, j})

with μ_Z,j ≡ E[I (V ∈ 𝒱_j)Z] for j = 1, …, J.

(2) (without replacement) Under the same conditions as in Theorems 3.1 or 3.2 with Z is replaced by Z̃

Σ_{c c} = Σ - \sum_{j = 1}^{J} ν_{j} \frac{1 - p_{j}}{p_{j}} {Var}_{0 | j} (Q_{c c}^{(j)} {\tilde{ℓ}}_{0}) .

(3.7)

3.4.3. Comparisons

We summarize Corollaries 3.1–3.3. Every method of adjusting weights improves efficiency over the plain WLE in a certain design and with a certain range of adjustment of weights (within-stratum or “across-strata” adjustment). However, particularly notable among all methods is centered calibration. While other methods gain efficiency only under Bernoulli sampling, centered calibration improves efficiency over the plain WLE under both sampling schemes. There is no known method of “across-strata” adjustment that is guaranteed to gain efficiency over the plain WLE under stratified sampling without replacement.

There are close connections among all methods. When the auxiliary variables have mean zero, centered and modified calibrations are essentially the same. The ordinary and modified calibrations give the same asymptotic variance when carried out stratum-wise. For Z and α₀ defined for estimated weights, estimated weights and modified calibration based on (1 − π₀(V))⁻¹ Ġ_e(Z^T α₀)Z performs the same way. Similarly within-stratum estimated weights with Z̃ and α̃₀ is as good as within-stratum calibration based on Ġ_e(Z̃^T α̃₀)Z̃.

As seen in the relationship among methods, there is no single method superior to others in each situation. In fact, performance depends on choice and transformation of auxiliary variables, the true distribution P₀ and the design. For our “without replacement” sampling scheme, within-stratum centered calibration is the only method guaranteed to gain efficiency while other methods may perform even worse than the plain WLE.

4. Examples

For asymptotic normality of WLEs, consistency and rate of convergence need to be established first to apply our Z-theorems in Section 3. To this end, general results on IPW empirical processes discussed in the next section will be useful. We illustrate this in the Cox models with right censoring and interval censoring under two-phase sampling.

Let T ~ F be a failure time, and X be a vector of covariates with bounded supports in the regression model. The Cox proportional hazards model [7] specifies the relationship

Λ (t | x) = exp (θ^{T} x) Λ (t),

where θ ∈ Θ ⊂ ℝ^p is the regression parameter, Λ ∈ H is the (baseline) cumulative hazard function. Here the space H for the nuisance parameter Λ is the set of nonnegative, nondecreasing cadlag functions defined on the positive line. The true parameters are θ₀ and Λ₀.

In addition to X, let U be a vector of auxiliary variables collected at phase I which are correlated with the covariate X. For simplicity of notation, we assume that the covariates X are only observed for the subject sampled at phase II. Thus, if some of the coordinates of X are available at phase I, then we include an identical copy of those coordinates of X in the vector of U.

4.1. Cox model with right censored data

Under right censoring, we only observe the minimum of the failure time T and the censoring time C ~ G. Define the observed time Y = T ∧ C and the censoring indicator Δ = I (T ≤ C). The phase I data is V = (Y, Δ, U), and the observed data is (Y, Δ, ξX, U, ξ) where ξ is the sampling indicator. With right censored data and complete data, the theory for maximum likelihood estimators in the Cox model has received several treatments; the one we follow most closely here is that of [31]. For the Cox model with case-cohort data, see [27] and for treatments with even more general designs [1] and [12]. Here, for both sampling without replacement and Bernoulli sampling, we continue the developments of [4, 5]. We assume the following conditions:

Condition 4.1. The finite-dimensional parameter space Θ is compact and contains the true parameter θ₀ as an interior point.

Condition 4.2. The failure time T and the censoring time C are conditionally independent given X, and that there is τ > 0 such that P(T > τ) > 0 and P(C ≥ τ) = P(C = τ) > 0. Both T and C have continuous conditional densities given the covariates X = x.

Condition 4.3. The covariate X has bounded support. For any measurable function h, P(X ≠ h(Y)) > 0.

Let λ(t) = (d/dt)Λ(t) be the baseline hazard function. With complete data, the density of (Y, Δ, X) is

p_{θ, Λ} (y, δ, x) = {λ (y) e^{θ^{T} x - Λ (y) e^{θ^{T} x}} (1 - G) (y | x)}^{δ} {e^{- Λ (y) e^{θ^{T} x}} g (y | x)}^{1 - δ} p_{X} (x),

where p_X is the marginal density of X and g(·|x) is the conditional density of C given X = x. The score for θ is given by ℓ̇_θ,Λ (y, δ, x) = x{δ − e^{θ^T x} Λ (y)}, and the score operator B_θ,Λ : ℋ ↦ L₂(P_θ,Λ) is defined on the unit ball ℋ in the space BV[0, τ] such that B_θ,Λ h(y, δ, x) = δh(y) − e^{θ^T x} ∫_[0,y] h d Λ. Because the likelihood based on the density above does not yield the MLE for complete data, we define the log likelihood for one observation for complete data by ℓ_θ,Λ (y, δ, x) = log{(e^{θ^T x} Λ {y})^δ e^{−Λ(y)e^{θ^T x}}} where Λ{t} is the (point) mass of Λ at t. Then maximizing the weighted log likelihood $ℙ_{N}^{π} ℓ_{θ, Λ}$ reduces to solving the system of equations $ℙ_{N}^{π} {\dot{ℓ}}_{θ, Λ} = 0 and ℙ_{N}^{π} B_{θ, Λ} h = 0$ for every h ∈ ℋ. The efficient score for θ for complete data is given by

ℓ_{θ_{0}, Λ_{0}}^{*} (y, δ, x) = δ (x - (M_{1} / M_{0}) (y)) - e^{θ_{0}^{T} x} \int_{[0, y]} δ (x - (M_{1} / M_{0}) (t)) d Λ_{0} (t),

, and the efficient information for θ for complete data is

{\tilde{I}}_{θ_{0}, Λ_{0}} = E [{(ℓ_{θ_{0}, Λ_{0}}^{*})}^{\otimes 2}] = E e^{θ_{0}^{T} X} \int_{0}^{τ} {(X - \frac{M_{1}}{M_{0}} (y))}^{\otimes 2} (1 - G) (y | X) d Λ_{0} (y),

where $M_{k} (s) = P_{θ_{0}, Λ_{0}} [X^{k} e^{θ_{0}^{T} X} I (Y \geq s)], k = 0, 1$ .

Theorem 4.1 (Consistency). Under Conditions 3.1, 3.2, 4.1–4.3, the WLEs are consistent for (θ₀, Λ₀).

Proof. This proof follows along the lines of the proof given by [29], but with the usual empirical measure replaced by the IPW empirical measure (with adjusted weights), and by use of Theorem 5.1. For details see [25].

Our Z-theorem (Theorem 3.1) yields asymptotic normality of the WLEs.

Theorem 4.2 (Asymptotic normality). Under Conditions 3.1, 3.2, 4.1–4.3,

\sqrt{N} ({\hat{θ}}_{N} - θ_{0}) = \sqrt{N} ℙ_{N}^{π} {\tilde{ℓ}}_{θ_{0}, Λ_{0}} + o_{P *} (1) \to_{d} N (0, Σ), \sqrt{N} ({\hat{θ}}_{N, #} - θ_{0}) = \sqrt{N} ℙ_{N}^{π, #} {\tilde{ℓ}}_{θ_{0}, Λ_{0}} + o_{P *} (1) \to_{d} N (0, Σ_{#}),

where # ∈ {e, c, mc, cc}, ${\tilde{ℓ}}_{θ_{0}, Λ_{0}} = I_{θ_{0}, Λ_{0}}^{- 1} ℓ_{θ_{0}, Λ_{0}}^{*}$ is the efficient influence function for θ for complete data, and Σ and Σ_# are given in Theorem 3.1.

Proof. We verify the conditions of Theorem 3.1. Condition 3.3 holds by Theorem 4.1. Conditions 3.4 and 3.5 hold under the present hypotheses as was shown in [31], Section 25.12.

For variance estimation regarding ${\hat{θ}}_{N}, {\hat{I}}_{N} \equiv ℙ_{N}^{π} {(ℓ_{{\hat{θ}}_{N}, {\hat{Λ}}_{N}}^{*})}^{\otimes 2}$ can be used to estimate I₀. Letting ${\hat{\tilde{ℓ}}}_{0} \equiv {\hat{I}}_{N}^{- 1} ℓ_{{\hat{θ}}_{N}, {\hat{Λ}}_{N}}^{*}$ , we can estimate ${Var}_{0 | j} {\tilde{ℓ}}_{0}$ by ${\hat{P}}_{j} {\tilde{ℓ}}_{0}^{\otimes 2} - {{\hat{P}}_{j} {\tilde{ℓ}}_{0}}^{\otimes 2}$ where ${\hat{P}}_{j} {\tilde{ℓ}}_{0} \equiv ℙ_{N}^{π} {\hat{\tilde{ℓ}}}_{0} I (V \in 𝒱_{j})$ and ${\hat{P}}_{j} {\tilde{ℓ}}_{0}^{\otimes 2} \equiv ℙ_{N}^{π} {\hat{\tilde{ℓ}}}_{0}^{\otimes 2} I (V \in 𝒱_{j})$ . The other four cases are similar.

4.2. Cox model with interval censored data

Let Y be a censoring time that is assumed to be conditionally independent of a failure time T given a covariate vector X. Under the case 1 interval censoring, we do not observe T but (Y, Δ) where Δ ≡ I (T ≤ Y). The phase I data is V = (Y, Δ, U) and the observed data is (Y, Δ, ξ X, U, ξ) where ξ is the sampling indicator. In the case of complete data, maximum likelihood estimates for this model were studied by Huang [10]. For a generalized version of this model and two-phase data with Bernoulli sampling, weighted likelihood estimates with and without estimated weights have recently been studied by Li and Nan [11]. Here we treat two-phase data under sampling without replacement at phase II and with both estimated weights and calibration.

With complete data, the log likelihood for one observation is given by

ℓ (θ, F) \equiv δ log {1 - \bar{F} {(y)}^{exp (θ^{T} x)}} + (1 - δ) log \bar{F} {(y)}^{exp (θ^{T} x)} \equiv δ log {1 - e^{- Λ (y) exp (θ^{T} x)}} - (1 - δ) e^{θ^{T} x} Λ (y) \equiv ℓ (θ, Λ),

where F̅ ≡ 1 − F = e^−Λ. The score for θ and the score operator B_θ,Λ for Λ for complete data are ℓ̇_θ,Λ = x exp(θ^T x)Λ(y)(δr(y, x; θ, Λ) − (1 − δ)) and B_θ,Λ[h] = exp(θ^T x)h(y){δr(y, x; θ, Λ) − (1 − δ)} where r(y, x; θ, Λ) = exp(−e^{θ^T x} Λ (y))/{1 − exp(−e^{θ^T x} Λ (y))}. The efficient score for θ for complete data is given by

ℓ_{θ_{0}, Λ_{0}}^{*} = e^{θ_{0}^{T} x} Q (y, δ, x; θ_{0}, Λ_{0}) Λ_{0} (y) {x - \frac{E [X e^{2 θ_{0}^{T} X} O (Y | X) | Y = y]}{E [e^{2 θ_{0}^{T} X} O (Y | X) | Y = y]}},

where Q(y, δ, x; θ, Λ) = δr(y, x; θ, Λ) − (1 − δ) and O(y|x) = r(y, x; θ₀, Λ₀). The efficient information for θ for complete data ${\tilde{I}}_{θ_{0}, Λ_{0}} = E [{(ℓ_{θ_{0}, Λ_{0}}^{*})}^{\otimes 2}]$ is given by Ĩ_{θ₀, Λ₀} = E[R(Y, X){X − E[X R(Y, X)|Y]/E[R(Y, X)|Y]}] where $R (Y, X) = Λ_{0}^{2} (Y | X) O (Y | X)$ . See [10] for further details.

We impose the same assumptions made for complete data in [10].

Condition 4.4. The finite-dimensional parameter space Θ is compact and contains the true parameter θ₀ as its interior point.

Condition 4.5. (a) The covariate X has bounded support; that is, there exists x₀ such that |X| ≤ x₀ with probability 1. (b) For any θ ≠ θ₀, the probability $P (θ^{T} X \neq θ_{0}^{T} X) > 0$ .

Condition 4.6. F₀(0) = 0. Let τ_F₀ = inf{t : F₀(t) = 1}. The support of Y is an interval S[Y] = [l_Y, u_Y] and 0 < l_Y ≤ u_Y < τ_F₀.

Condition 4.7. The cumulative hazard function Λ₀ has strictly positive derivative on S[Y], and the joint function G(y, x) of (Y, X) has bounded second order (partial) derivative with respect to y.

4.2.1. Consistency

The characterization of WLEs (θ̂_N, Λ̂_N) and (θ̂_{N, #}, Λ̂_{N, #}) with # ∈ {e, c, mc, cc} maximizing $ℙ_{N}^{π} ℓ (θ, Λ) or ℙ_{N}^{π, #} ℓ (θ, Λ)$ is given in [25], Lemma A.5. We prove consistency of the WLEs in the metric given by d((θ₁, Λ₁), (θ₂, Λ₂)) ≡ ∥θ₁ − θ₂∥ + ∥Λ₁ − Λ₂∥_{P_Y}, where ∥ · ∥ is the Euclidean metric and $∥ Λ_{1} - Λ_{2} ∥_{P_{Y}}^{2} = {\int (Λ_{1} (y) - Λ_{2} (y))}^{2} d P_{Y}$ , and P_Y is the marginal probability measure of the censoring variable Y.

Theorem 4.3 (Consistency). Under Conditions 3.1, 3.2, 4.4–4.7, the WLEs are consistent in the metric d.

Proof. We only prove consistency for the WLE. Proofs for the other four estimators are similar.

Let H̃ be the set of all subdistribution functions defined on [0, ∞]. We denote the WLE of F as F̂_N = 1 − e^−Λ̂_N. Define the set ℱ of functions by

ℱ \equiv {f (θ, F) = δ (1 - \bar{F} {(y)}^{exp (θ^{T} x)}) + (1 - δ) \bar{F} {(y)}^{exp (θ^{T} x)} : θ \in Θ, F \in \tilde{H}} .

Boundedness of X and compactness of Θ ⊂ ℝ^p imply that the set {e^{θ^T x} : θ ∈ Θ} is Glivenko–Cantelli. The set H̃ is also Glivenko–Cantelli since it is a subset of the set of bounded monotone functions. Thus, it follows from boundedness of functions in ℱ and the Glivenko–Cantelli preservation theorem [30] that ℱ is Glivenko–Cantelli.

Let 0 < α < 1 be a fixed constant. It follows by concavity of the function u ↦ log u and Jensen’s inequality that

P_{0} [log {1 + α (f (θ, F) / f (θ_{0}, F_{0}) - 1)}] \leq log (P_{0} [1 + α (f (θ, F) / f (θ_{0}, F_{0}) - 1)]) = log (1 - α + α P_{0} [f (θ, F) / f (θ_{0}, F_{0})]) \leq 0,

where the first equality holds if and only if 1 + α(f (θ, F)/f (θ₀, F₀) − 1) is constant on S[Y], in other words, (θ, F) = (θ₀, F₀) on S[Y] by the identifiability Condition 4.5. Note also that by monotonicity of the logarithm

P_{0} [log {1 + α (f (θ, F) / f (θ_{0}, F_{0}) - 1)}] \geq P_{0} [log {1 + α (0 - 1)}] = log (1 - α) .

Thus, the set 𝒢 = {log{1 + α(f (θ, F)/f (θ₀, F₀) − 1)} : f (θ, F) ∈ ℱ} has an integrable envelope. To see this, form a sequence (θ_n, F_n) such that

g_{n} \equiv log {1 + α (f (θ_{n}, F_{n}) / f (θ_{0}, F_{0}) - 1)} ↗ sup_{θ \in Θ, F \in \tilde{H}} log {1 + α (f (θ, F) / f (θ_{0}, F_{0}) - 1)} \equiv G .

Then {g_n − log(1 − α)}_n∈ℕ is a monotone increasing sequence of nonnegative functions. By the monotone convergence theorem, P₀g_n − log(1 − α) → P₀G − log(1 − α) ≤ −log(1 − α). Thus we choose G ∨ − log(1 − α) as an integrable envelope. Also, the set 𝒢 is Glivenko–Cantelli by a Glivenko–Cantelli preservation theorem [30].

Now, by the concavity of the map u ↦ log u, and the definition of the WLE, we have

ℙ_{N}^{π} log {1 + α (f ({\hat{θ}}_{N}, {\hat{F}}_{N}) / f (θ_{0}, F_{0}) - 1)} \geq ℙ_{N}^{π} {(1 - α) log (1) + α log {f ({\hat{θ}}_{N}, {\hat{F}}_{N}) / f (θ_{0}, F_{0})}} = α {ℙ_{N}^{π} log f ({\hat{θ}}_{N}, {\hat{F}}_{N}) - ℙ_{N}^{π} log f (θ_{0}, F_{0})} \geq 0 .

Since Θ and H̃ are compact, there is a subsequence of (θ̂_N, F̂_N) converging to (θ_∞, F_∞) ∈ Θ × H̃. Along this subsequence, Theorem 5.1 implies that

0 \leq ℙ_{N}^{π} log {1 + α (f ({\hat{θ}}_{N}, {\hat{F}}_{N}) / f (θ_{0}, F_{0}) - 1)} \to_{P *} P_{θ_{0}, F_{0}} [log {1 + α (f (θ_{\infty}, F_{\infty}) / f (θ_{0}, F_{0}) - 1)}] \leq 0,

so that P_{θ₀, F₀} log{1 + α(f (θ_∞, F_∞)/f (θ₀, F₀) − 1)} = 0. This is possible when (θ_∞, F_∞) = (θ₀, F₀) because (θ, F) ↦ P[log{1 + α(f (θ, F)/f (θ₀, F₀) − 1)}] attains its maximum only at (θ₀, F₀). Hence we conclude that (θ̂_N, F̂_N) converges to (θ₀, F₀) in the sense of Kullback–Leibler divergence. Since the Kullback–Leibler divergence bounds the Hellinger distance, it follows by Lemma A5 of [17] that d((θ̂_N, Λ̂_N), (θ₀, Λ₀)) = o_P* (1).

4.2.2. Rate of convergence

We prove the rate of convergence of the WLE is N^1/3 by applying the rate theorem (Theorem 5.2) in Section 5. Since we proved the consistency of (θ̂_N, Λ̂_N) to (θ₀, Λ₀) on S[Y], under Condition 4.6 we can restrict a parameter space of Λ to H_M ≡ {Λ ∈ H : M⁻¹ ≤ Λ ≤ M, on S[Y]}, where M is a positive constant such that M⁻¹ ≤ Λ₀ ≤ M on S[Y]. Define ℳ ≡ {ℓ(θ, Λ): θ ∈ Θ, Λ ∈ H_M}.

Theorem 4.4 (Rate of convergence). Under Conditions 4.4–4.7,

d (({\hat{θ}}_{N}, {\hat{Λ}}_{N}), (θ_{0}, Λ_{0})) = O_{P *} (N^{- 1 / 3}) .

This holds if we replace the WLE by the WLEs with adjusted weights assuming Conditions 3.1 and 3.2.

Proof. Since the rate of convergence for the WLE is easier to verify than the other four estimators, we only prove the theorem for the WLE with modified calibration. The cases for the WLEs with adjusted weights.

We proceed by verifying the conditions in Theorem 5.2. Bound (5.4) follows by Lemma 5.2 in Section 5 and Lemma A5 of [17]. For bound (5.5), we follow the proof of (5.3) in [10]. Since α̂_N is consistent, we can specify the small neighborhood 𝒜_{mc, 0} of a zero vector such that G_mc(z; α) is contained in a small interval that contains 1 and consists of strictly positive numbers. Thus, multiplying the log likelihood by a uniformly bounded quantity G_mc(z; α) only requires a slight modification of Huang’s proof of his Lemma 3.1 to obtain sup_Q log N_[·] (ε, Gℳ, L₂(Q)) ≲ ε⁻¹ for ε small enough where the supremum is taken over the all discrete probability measures and Gℳ = {G_mc(·; α)ℓ(θ, Λ) : α ∈ 𝒜_{mc, 0}, ℓ(θ, Λ) ∈ ℳ}. Let Gℳ_δ = {m(θ, Λ, α) − m(θ₀, Λ₀, α) : m(θ, Λ, α) ∈ Gℳ, d((θ, Λ), (θ₀, Λ₀)) ≤ δ}. It follows by Lemma 3.2.2 of [32] that $E^{*} {∥ 𝔾_{N} ∥}_{G ℳ_{δ}} ≲ δ^{1 / 2} {1 + (δ^{1 / 2} / δ^{2} \sqrt{N}) M} \equiv ϕ_{N} (δ)$ . Apply Theorem 5.2 to conclude r_N = N^1/3.

4.2.3. Asymptotic normality of the estimators

We apply Theorem 3.2 to derive the asymptotic distributions of the WLEs.

Theorem 4.5 (Asymptotic normality). Under Conditions 3.1, 3.2, 4.4–4.7,

\sqrt{N} ({\hat{θ}}_{N} - θ_{0}) = \sqrt{N} ℙ_{N}^{π} {\tilde{ℓ}}_{θ_{0}, Λ_{0}} + o_{P *} (1) ⇝ N (0, Σ), \sqrt{N} ({\hat{θ}}_{N, #} - θ_{0}) = \sqrt{N} ℙ_{N}^{π, #} {\tilde{ℓ}}_{θ_{0}, Λ_{0}} + o_{P *} (1) ⇝ N (0, Σ_{#}),

where # ∈ {e, c, mc, cc}, ${\tilde{ℓ}}_{θ_{0}, Λ_{0}} = I_{θ_{0}, Λ_{0}}^{- 1} ℓ_{θ_{0}, Λ_{0}}^{*}$ is the efficient influence function for complete data and Σ and Σ_# are given in Theorem 3.2.

Proof. We proceed by verifying the conditions of Theorem 3.2 for the WLE with modified calibration. The other four cases are similar.

Condition 3.6 is satisfied with β = 1/3 by Theorems 4.3 and 4.4. Conditions 3.7–3.9 are verified by [10] with

{\underline{h}}^{*} (y) \equiv Λ_{0} (y) E [X e^{2 θ_{0}^{T} X} O (Y | X) | Y = y] / E [e^{2 θ_{0}^{T} X} O (Y | X) | Y = y] .

Since $ℙ_{N}^{π, m c} {\dot{ℓ}}_{{\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}} = 0$ by Lemma A.5, it remains to show that

ℙ_{N}^{π, m c} B_{{\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}} [{\underline{h}}^{*}] = o_{P *} (N^{- 1 / 2}) .

Let $g_{0} \equiv {\underline{h}}^{*} ◦ Λ_{0}^{- 1}$ be the composition of h* and the inverse of Λ₀. Note that Λ₀ is a strictly increasing continuous function by our assumption. Since g₀(Λ̂_{N, mc}(y)) is a right continuous function and has exactly the same jump points as Λ̂_{N, mc}(y), by Lemma A.5, $ℙ_{N}^{π, m c} g_{0} ({\hat{Λ}}_{N, m c} (Y)) e^{{\hat{θ}}_{N, m c}^{T} X} Q (Y, Δ, X; {\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}) = 0$ . By Conditions 4.5–4.7, h* has bounded derivative. This and the assumption that Λ₀ has strictly positive derivative by Condition 4.7 imply that g₀ has bounded derivative, too. So, noting that h* = g₀ ◦ Λ₀, we have

ℙ_{N}^{π, m c} B_{{\hat{θ}}_{N, m c,} {\hat{Λ}}_{N, m c}} [{\underline{h}}^{*}] = ℙ_{N}^{π, m c} {\underline{h}}^{*} (Y) e^{{\hat{θ}}_{N, m c}^{T} X} Q (Y, Δ, X; {\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}) = ℙ_{N}^{π, m c} {g_{0} ◦ Λ_{0} (Y) - g_{0} ({\hat{Λ}}_{N, m c} (Y))} e^{{\hat{θ}}_{N, m c}^{T} X} Q (Y, Δ, X; {\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}) = (ℙ_{N}^{π, m c} - P_{θ_{0}, Λ_{0}}) {g_{0} ◦ Λ_{0} (Y) - g_{0} ({\hat{Λ}}_{N, m c} (Y))} \times e^{{\hat{θ}}_{N, m c}^{T} X} Q (Y, Δ, X; {\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}) + P_{θ_{0}, Λ_{0}} {g_{0} ◦ Λ_{0} (Y) - g_{0} ({\hat{Λ}}_{N, m c} (Y))} \times e^{{\hat{θ}}_{N, m c}^{T} X} Q (Y, Δ, X; {\hat{θ}}_{N, m c}, {\hat{Λ}}_{N, m c}) .

Huang [10] showed that the second term in the display is o_P*(N^−1/2). We show that the first term in the display is also o_P*(N^−1/2). Let C > 0 be an arbitrary constant. Define for a fixed constant η > 0, 𝒟(η) ≡ {ψ(y, x; θ, Λ) : d((θ, Λ), (θ₀, Λ₀)) ≤ η, Λ ∈ H_M}, where ψ(y, δ, x; θ, Λ) ≡ {g₀ ◦ Λ₀(y) − g₀(Λ(y))} × e^{θ^T x} Q(y, δ, x; θ, Λ). Because Huang [10] showed that 𝒟(η) is Donsker for every η > 0 and that ∥𝔾_N∥_{𝒟 (CN^−1/3)} = o_P*(1), it follows by Lemma 5.4 with ℱ_N replaced by 𝒟(CN^−1/3) that ${∥ 𝔾_{N}^{π, m c} ∥}_{𝒟 (C N^{- 1 / 3})} = o_{P *} (1)$ . This completes the proof.

Unlike the previous example, $ℓ_{θ, Λ}^{*}$ depends on additional unknown functions, and the method of variance estimation used in the previous example does not apply to the present case. See the discussion in Section 6.

5. General results for IPW empirical processes

The IPW empirical measure and IPW empirical process inherit important properties from the empirical measure and empirical process, respectively. We emphasize the similarity between empirical processes and IPW empirical processes.

5.1. Glivenko–Cantelli theorem

The next theorem states that the Glivenko–Cantelli property for complete data is preserved under two-phase sampling.

Theorem 5.1. Suppose that ℱ is P₀-Glivenko–Cantelli. Then

{∥ ℙ_{N}^{π} - P_{0} ∥}_{ℱ} \to_{P *} 0,

(5.1)

where ∥·∥_ℱ is the supremum norm. This also holds if we replace $ℙ_{N}^{π} by ℙ_{N}^{π, #}$ with # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2.

5.2. Rate of convergence

The rate of convergence of an M-estimator for complete data is often established via maximal inequalities for the empirical processes. If we follow the same line of reasoning, it is natural to derive maximal inequalities for IPW empirical processes, though this may require some efforts. Fortunately, these maximal inequalities for empirical processes (or slight modifications of them) suffice to establish the same rate of convergence under two-phase sampling.

Theorem 5.2. Let ℳ = {m_θ : θ ∈ Θ} be the set of criterion functions and define ℳ_δ = {m_θ − m_θ₀ : d(θ, θ₀) < δ} for some fixed δ > 0 where d is a semimetric on the parameter space Θ.

(1) Suppose that for every θ in a neighborhood of θ₀,

P_{0} (m_{θ} - m_{θ_{0}}) ≲ - d^{2} (θ, θ_{0});

(5.2)

here a ≲ b means a ≤ K b for some constant K ∈ (0, ∞). Assume that there exists a function ϕ_N such that δ ↦ ϕ_N(δ)/δ^α is decreasing for some α < 2 (not depending on N), and for every N,

E^{*} {∥ 𝔾_{N} ∥}_{ℳ_{δ}} ≲ ϕ_{N} (δ),

(5.3)

where 𝔾_N is the empirical process. If an estimator θ̂_N satisfying $ℙ_{N}^{π} m_{{\hat{θ}}_{N}} \geq ℙ_{N}^{π} m_{θ_{0}} - O_{P *} (r_{N}^{- 2})$ converges in outer probability to θ₀, then r_Nd(θ̂_N, θ₀) = O_P*(1) for every sequence r_N such that $r_{N}^{2} ϕ_{N} (1 / r_{N}) \leq \sqrt{N}$ for every N.

(2) Let # ∈ {e, c, mc, cc} be fixed. Suppose Condition 3.2 holds. Suppose also that for every θ ∈ Θ in a neighborhood of θ₀,

P_{0} {{\tilde{G}}_{#} (V; α) (m_{θ} - m_{θ_{0}})} ≲ - d^{2} (θ, θ_{0}) + {| α - α_{0} |}^{2},

(5.4)

where G̃_e = π₀(V)/G_e or G̃_# = G_# with # ∈ {c, mc, cc}. Assume that

E^{*} {∥ 𝔾_{N} ∥}_{{\tilde{G}}_{#} ℳ_{δ}} ≲ ϕ_{N} (δ),

(5.5)

where G̃_#ℳ_δ ≡ {G̃_#(·; α)f : |α| ≤ δ, α ∈ 𝒜_N, f ∈ ℳ_δ} for some 𝒜_N ⊂ 𝒜_#. Then an estimator θ̂_N,# satisfying $ℙ_{N}^{π, #} m_{{\hat{θ}}_{N, #}} \geq ℙ_{N}^{π, #} m_{θ_{0}} - O_{P *} (r_{N}^{- 2})$ has the same rate of convergence as θ̂_N in part (1) if it is consistent.

Remark 5.1. The key to establishing a general theorem for the rate of convergence is to make use of the boundedness of the weights in the IPW empirical process and also deal with the dependence of the weights. In treating independent bootstrap weights in the weighted bootstrap ([15], Lemmas 1–3), require the boundedness of bootstrap weights because the product of an unbounded weight and a bounded function is no longer bounded. Our theorem exploits the boundedness of sampling indicators in the IPW empirical processes by applying a multiplier inequality for the case of bounded weights (Lemma 5.1) to cover more general cases.

The following is a multiplier inequality for bounded exchangeable weights. Note that the sum of stochastic processes in the second term is divided by n^1/2 rather than k^1/2.

Lemma 5.1. For i.i.d. stochastic processes Z₁, …, Z_n, every bounded, exchangeable random vector (ξ₁, …, ξ_n) with each ξ_i ∈ [l, u] that is independent of Z₁, …, Z_n, and any 1 ≤ n₀ ≤ n,

E {∥ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i} Z_{i} ∥}_{ℱ}^{*} \leq \frac{2 (n_{0} - 1)}{n} \sum_{i = 1}^{n} E^{*} {∥ Z_{i} ∥}_{ℱ} E max_{1 \leq i \leq n} \frac{ξ_{i}}{\sqrt{n}} + 2 (u - l) max_{n_{0} \leq k \leq n} E {∥ \frac{1}{\sqrt{n}} \sum_{i = n_{0}}^{k} Z_{i} ∥}_{ℱ}^{*} .

Bound (5.5) is not difficult to verify in the presence of bound (5.3) since G_#(· ; α) is a bounded monotone function indexed by a finite-dimensional parameter. Bound (5.4) may be verified through the lemma below for some applications including the Cox model with interval censoring.

Lemma 5.2. Suppose Conditions 3.1 and 3.2 hold. Let m_θ be the log likelihood log p_θ where p_θ is the density with dominating measure μ, and d is the Hellinger distance. Then the bound (5.4) holds.

5.3. Donsker theorem

The next theorem yields weak convergence of the IPW empirical processes under sampling without replacement.

Theorem 5.3. Suppose that ℱ with ∥P₀∥_ℱ < ∞ is P₀-Donsker and Conditions 3.1 and 3.2 hold. Then

𝔾_{N}^{π} ⇝ 𝔾^{π} \equiv 𝔾 + \sum_{j = 1}^{J} \sqrt{ν_{j}} \sqrt{\frac{1 - p_{j}}{p_{j}}} 𝔾_{j},

(5.6)

𝔾_{N}^{π, #} ⇝ 𝔾^{π, #} \equiv 𝔾 + \sum_{j = 1}^{J} \sqrt{ν_{j}} \sqrt{\frac{1 - p_{j}}{p_{j}}} 𝔾_{j} (\cdot - Q_{#} \cdot)

(5.7)

in ℓ^∞(ℱ) where # ∈ {e, c, mc, cc}, the P₀-Brownian bridge process, 𝔾, indexed by ℱ and the P_0|j-Brownian bridge processes, 𝔾_j, indexed by ℱ are all independent.

Remark 5.2. The integrability hypothesis ∥P₀∥_ℱ < ∞ is only required for the IPW empirical processes with adjusted weights.

For a Donsker set ℱ, it follows by Theorem 5.3 and Lemma 2.3.11 of [32] that asymptotic equicontinuity in probability and in mean follows for the metric that depends on the limit process. In applications, it is of interest to have these results for the original metric ρ_P₀(f, g) = σ_P₀(f − g).

Theorem 5.4. Let ℱ be Donsker and define ℱ_δ = {f − g : f, g ∈ ℱ, ρ_P₀(f, g) < δ} for some fixed δ > 0. Then, for every sequence δ_N ↓ 0,

E^{*} {∥ 𝔾_{N}^{π} ∥}_{ℱ_{δ_{N}}} \to 0

and consequently, ${∥ 𝔾_{N}^{π} ∥}_{ℱ_{δ_{N}}} = o_{P *} (1)$ . Moreover, ${∥ 𝔾_{N}^{π, #} ∥}_{ℱ_{δ_{N}}} = o_{P *} (1)$ for # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2.

We end this section with two important lemmas. The first lemma is an extension of Lemma 3.3.5 of [32] and will be used in our proof of Theorem 3.1 to verify asymptotic equicontinuity.

Lemma 5.3. Suppose ℱ = {ψ_θ,h − ψ_θ₀,h : ∥θ − θ₀∥ < δ, h ∈ ℋ} is P₀-Donsker for some δ > 0 and that sup_h∈ℋ P₀(ψ_θ,h − ψ_θ₀,h)² → 0, as θ →θ₀. If θ̂_N converges in outer probability to θ₀, then

{∥ 𝔾_{N}^{π} (ψ_{{\hat{θ}}_{N}, h} - ψ_{θ_{0}, h}) ∥}_{ℋ} = o_{P *} (1) .

This also holds if we replace $𝔾_{N}^{π} by 𝔾_{N}^{π, #}$ with # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2 hold and ∥P₀∥_ℱ < ∞.

The second lemma is used to verify asymptotic equicontinuity in the proof of Theorem 3.2, the first part for the IPW empirical process and the second part for the other four IPW empirical processes with adjusted weights.

Lemma 5.4. Let ℱ_N be a sequence of decreasing classes of functions such that ∥𝔾_N∥_{ℱ_N} = o_P*(1). Assume that there exists an integrable envelope for ℱ_N₀ for some N₀. Then E∥𝔾_N∥_{ℱ_N} → 0 as N → ∞. As a consequence, ${∥ 𝔾_{N}^{π} ∥}_{ℱ_{N}} = o_{P *} (1)$ .

Suppose, moreover, that ℱ_N is P₀-Glivenko–Cantelli with ∥P₀∥_{ℱ_N₁} < ∞ for some N₁, and that every f = f_N ∈ ℱ_N converges to zero either point-wise or in L₁(P₀) as N → ∞. Then ${∥ 𝔾_{N}^{π, e} ∥}_{ℱ_{N}} = o_{P *} (1), {∥ 𝔾_{N}^{π, c} ∥}_{ℱ_{N}} = o_{P *} (1), {∥ 𝔾_{N}^{π, m c} ∥}_{ℱ_{N}} = o_{P *} (1) and {∥ 𝔾_{N}^{π, c c} ∥}_{ℱ_{N}} = o_{P *} (1)$ , assuming Conditions 3.1 and 3.2.

6. Discussion

We developed asymptotic theory for weighted likelihood estimation under two-phase sampling, introduced and studied a new calibration method, centered calibration, and compared several WLE estimation methods involving adjusted weights. The methods of proof and general results for the IPW empirical process are applicable to other estimation procedures. For example, the weighted Kaplan–Meier estimator can be shown to be asymptotically Gaussian via our Donsker theorem (Theorem 5.3) together with the functional delta method. A particularly interesting application is to study asymptotic properties of estimators that are known to be efficient under Bernoulli sampling (e.g., estimator of [19]). Whether or not these estimators are “efficient” under our sampling scheme is an open problem; see [16] for a definition of efficiency with non-i.i.d. data.

There are several other open problems. Variance estimation under two-phase sampling has been restricted to the case where the asymptotic variance is a known function up to parameters as discussed in Section 4, while there are several methods available for complete data in a general case (e.g., [18]). In [24] the first author has proposed and studied nonparametric bootstrap variance estimation methods which remain valid even under model misspecification; these results will appear elsewhere. Another direction of research is to study (local and global) model misspecification under two-phase sampling where missingness is by design. An interesting open problem beyond our sampling scheme is to study other complex survey designs. Stratified sampling without replacement is sufficiently simple for the existing bootstrap empirical process theory to apply. Other complex designs may provide interesting theoretical challenges, perhaps in connection with extensions of bootstrap empirical process theory.

Supplementary Material

ElectronicSupplementaryFile

NIHMS496004-supplement-ElectronicSupplementaryFile.pdf^{(440.8KB, pdf)}

Acknowledgements

We owe thanks to Kwun Chuen Gary Chan for suggesting the modified calibration method introduced in Section 2.1.3. We also thank Norman Breslow for many helpful conversations concerning two-phase sampling, and two referees for their constructive comments and suggestions.

Footnotes

Supported by NIH/NIAID R01 AI089341.

Supported in part by NSF Grant DMS-11-04832, NI-AID Grant 2R01 AI291968-04 and the Alexander von Humboldt Foundation.

Supplementary material for “Weighted likelihood estimation under two-phase sampling” (DOI: 10.1214/12-AOS1073SUPP;.pdf). Due to space constraints, the proofs and technical details have been given in the supplementary document [25]. References here beginning with “A.” refer to [25].

REFERENCES

1.Binder DA. Fitting Cox’s proportional hazards models from survey data. Biometrika. 1992;79:139–147. MR1158522. [Google Scholar]
2.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Stat. Biosc. 2009;1:32–49. doi: 10.1007/s12561-009-9001-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Using the whole cohort in the analysis of case-cohort data. Am. J. Epidemiol. 2009;169:1398–1405. doi: 10.1093/aje/kwp055. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Breslow NE, Wellner JA .Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Stat. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. MR2325244. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Breslow NE, Wellner JA. A Z-theorem with estimated nuisance parameters and correction note for:“Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression” [ScandJStatist. 34 (2007), no. 1, 86-102; MR2325244] Scand. J. Stat. 2008;35:186–192. doi: 10.1111/j.1467-9469.2007.00574.x. MR2391566. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chan KCG. Uniform improvement of empirical likelihood for missing response problem. Electron. J. Stat. 2012;6:289–302. [Google Scholar]
7.Cox DR. Regression models and life-tables (with discussion) J. R. Stat. Soc. Ser. B Stat. Methodol. 1972;34:187–220. MR0341758. [Google Scholar]
8.Deville J-C, Särndal C-E. Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 1992;87:376–382. MR1173804. [Google Scholar]
9.Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 1952;47:663–685. MR0053460. [Google Scholar]
10.Huang J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 1996;24:540–568. MR1394975. [Google Scholar]
11.Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. Canad. J. Statist. 2011;39:557–577. MR2860827. [Google Scholar]
12.Lin DY. On fitting Cox’s proportional hazards models to survey data. Biometrika. 2000;87:37–47. MR1766826. [Google Scholar]
13.Lumley T. Complex Surveys:A Guide to Analysis Using R. Hoboken, NJ: Wiley; 2010. [Google Scholar]
14.Lumley T, Shaw PA, Dai JY. Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 2011;79:200–232. doi: 10.1111/j.1751-5823.2011.00138.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Ma S, Kosorok MR. Robust semiparametric M-estimation and the weighted bootstrap. J. Multivariate Anal. 2005;96:190–217. MR2202406. [Google Scholar]
16.McNeney B, Wellner JA. Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference. 2000;91:441–480. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998) MR1814795. [Google Scholar]
17.Murphy SA, van der Vaart AW. Semiparametric likelihood ratio inference. Ann. Statist. 1997;25:1471–1509. MR1463562. [Google Scholar]
18.Murphy SA, van der Vaart AW. Observed information in semi-parametric models. Bernoulli. 1999;5:381–412. MR1693616. [Google Scholar]
19.Nan B. Efficient estimation for case-cohort studies. Canad. J. Statist. 2004;32:403–419. MR2125853. [Google Scholar]
20.Neyman J. Contribution to the theory of sampling human populations. J. Amer. Statist. Assoc. 1938;33:101–116. [Google Scholar]
21.Præstgaard J, Wellner JA. Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 1993;21:2053–2086. MR1245301. [Google Scholar]
22.Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
23.Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 1994;89:846–866. MR1294730. [Google Scholar]
24.Saegusa T. Ph.D. thesis. Seattle, WA: Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Saegusa T, Wellner JA. Supplement to"Weighted likelihood estimation under two-phase sampling". 2012 doi: 10.1214/12-AOS1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Saegusa T, Wellner JA. Technical Report 592. Seattle, WA: Dept. Statistics, Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. Available at arXiv:1112.4951. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 1988;16:64–81. MR0924857. [Google Scholar]
28.Tan Z. Efficient restricted estimators for conditional mean models with missing data. Biometrika. 2011;98:663–684. MR2836413. [Google Scholar]
29.van der Vaart A. Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Lecture Notes in Math. Vol. 1781. Berlin: Springer; 2002. Semiparametric statistics; pp. 331–457. MR1915446. [Google Scholar]
30.van der Vaart A, Wellner JA. High Dimensional Probability. II (Seattle. WA, 1999). Progress in Probability. Vol. 47. Boston, MA: Birkhäuser; 2000. Preservation theorems for Glivenko– Cantelli and uniform Glivenko–Cantelli classes; pp. 115–133. MR1857319. [Google Scholar]
31.van der Vaart AW. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 3. Cambridge: Cambridge Univ. Press; 1998. MR1652247. [Google Scholar]
32.van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes :With Applications to Statistics. New York: Springer; 1996. MR1385671. [Google Scholar]
33.White JE. A two stage design for the study of the relationship between a rare exposure and and a rare disease. Am. J. Epidemiol. 1986;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]
34.Zheng H, Little RJA. Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Survey Methodology. 2004;30:209–218. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ElectronicSupplementaryFile

NIHMS496004-supplement-ElectronicSupplementaryFile.pdf^{(440.8KB, pdf)}

[R1] 1.Binder DA. Fitting Cox’s proportional hazards models from survey data. Biometrika. 1992;79:139–147. MR1158522. [Google Scholar]

[R2] 2.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Stat. Biosc. 2009;1:32–49. doi: 10.1007/s12561-009-9001-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Using the whole cohort in the analysis of case-cohort data. Am. J. Epidemiol. 2009;169:1398–1405. doi: 10.1093/aje/kwp055. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Breslow NE, Wellner JA .Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Stat. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. MR2325244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Breslow NE, Wellner JA. A Z-theorem with estimated nuisance parameters and correction note for:“Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression” [ScandJStatist. 34 (2007), no. 1, 86-102; MR2325244] Scand. J. Stat. 2008;35:186–192. doi: 10.1111/j.1467-9469.2007.00574.x. MR2391566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Chan KCG. Uniform improvement of empirical likelihood for missing response problem. Electron. J. Stat. 2012;6:289–302. [Google Scholar]

[R7] 7.Cox DR. Regression models and life-tables (with discussion) J. R. Stat. Soc. Ser. B Stat. Methodol. 1972;34:187–220. MR0341758. [Google Scholar]

[R8] 8.Deville J-C, Särndal C-E. Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 1992;87:376–382. MR1173804. [Google Scholar]

[R9] 9.Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 1952;47:663–685. MR0053460. [Google Scholar]

[R10] 10.Huang J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 1996;24:540–568. MR1394975. [Google Scholar]

[R11] 11.Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. Canad. J. Statist. 2011;39:557–577. MR2860827. [Google Scholar]

[R12] 12.Lin DY. On fitting Cox’s proportional hazards models to survey data. Biometrika. 2000;87:37–47. MR1766826. [Google Scholar]

[R13] 13.Lumley T. Complex Surveys:A Guide to Analysis Using R. Hoboken, NJ: Wiley; 2010. [Google Scholar]

[R14] 14.Lumley T, Shaw PA, Dai JY. Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 2011;79:200–232. doi: 10.1111/j.1751-5823.2011.00138.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Ma S, Kosorok MR. Robust semiparametric M-estimation and the weighted bootstrap. J. Multivariate Anal. 2005;96:190–217. MR2202406. [Google Scholar]

[R16] 16.McNeney B, Wellner JA. Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference. 2000;91:441–480. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998) MR1814795. [Google Scholar]

[R17] 17.Murphy SA, van der Vaart AW. Semiparametric likelihood ratio inference. Ann. Statist. 1997;25:1471–1509. MR1463562. [Google Scholar]

[R18] 18.Murphy SA, van der Vaart AW. Observed information in semi-parametric models. Bernoulli. 1999;5:381–412. MR1693616. [Google Scholar]

[R19] 19.Nan B. Efficient estimation for case-cohort studies. Canad. J. Statist. 2004;32:403–419. MR2125853. [Google Scholar]

[R20] 20.Neyman J. Contribution to the theory of sampling human populations. J. Amer. Statist. Assoc. 1938;33:101–116. [Google Scholar]

[R21] 21.Præstgaard J, Wellner JA. Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 1993;21:2053–2086. MR1245301. [Google Scholar]

[R22] 22.Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]

[R23] 23.Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 1994;89:846–866. MR1294730. [Google Scholar]

[R24] 24.Saegusa T. Ph.D. thesis. Seattle, WA: Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Saegusa T, Wellner JA. Supplement to"Weighted likelihood estimation under two-phase sampling". 2012 doi: 10.1214/12-AOS1073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Saegusa T, Wellner JA. Technical Report 592. Seattle, WA: Dept. Statistics, Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. Available at arXiv:1112.4951. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 1988;16:64–81. MR0924857. [Google Scholar]

[R28] 28.Tan Z. Efficient restricted estimators for conditional mean models with missing data. Biometrika. 2011;98:663–684. MR2836413. [Google Scholar]

[R29] 29.van der Vaart A. Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Lecture Notes in Math. Vol. 1781. Berlin: Springer; 2002. Semiparametric statistics; pp. 331–457. MR1915446. [Google Scholar]

[R30] 30.van der Vaart A, Wellner JA. High Dimensional Probability. II (Seattle. WA, 1999). Progress in Probability. Vol. 47. Boston, MA: Birkhäuser; 2000. Preservation theorems for Glivenko– Cantelli and uniform Glivenko–Cantelli classes; pp. 115–133. MR1857319. [Google Scholar]

[R31] 31.van der Vaart AW. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 3. Cambridge: Cambridge Univ. Press; 1998. MR1652247. [Google Scholar]

[R32] 32.van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes :With Applications to Statistics. New York: Springer; 1996. MR1385671. [Google Scholar]

[R33] 33.White JE. A two stage design for the study of the relationship between a rare exposure and and a rare disease. Am. J. Epidemiol. 1986;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]

[R34] 34.Zheng H, Little RJA. Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Survey Methodology. 2004;30:209–218. [Google Scholar]

PERMALINK

WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING

Takumi Saegusa

Jon A Wellner

Abstract

1. Introduction

2. Sampling, models and estimators

2.1. Improving efficiency by adjusting weights

2.1.1. Estimated weights

2.1.2. Calibration

2.1.3. Modified calibration

2.1.4. Centered calibration

2.2. Estimators for a semiparametric model 𝒫

3. Asymptotics for the WLE in general semiparametric models

3.1. Conditions for adjusting weights

3.2. Regular rate for a nuisance parameter

3.3. Nonregular rate for a nuisance parameter

3.4. Comparisons of methods

3.4.1. Stratified Bernoulli sampling

3.4.2. Within-stratum adjustment of weights

3.4.3. Comparisons

4. Examples

4.1. Cox model with right censored data

4.2. Cox model with interval censored data

4.2.1. Consistency

4.2.2. Rate of convergence

4.2.3. Asymptotic normality of the estimators

5. General results for IPW empirical processes

5.1. Glivenko–Cantelli theorem

5.2. Rate of convergence

5.3. Donsker theorem

6. Discussion

Supplementary Material

Acknowledgements

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases