Abstract
We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko–Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase.
Key words and phrases: Calibration, estimated weights, weighted likelihood, semiparametric model, regular, nonregular
1. Introduction
Two-phase sampling is a sampling technique that aims at cost reduction and improved efficiency of estimation. At phase I, a large sample is drawn from a population, and information on variables that are easier to measure is collected. These phase I variables may be important variables such as exposure in a regression model, or simply may be auxiliary variables that are correlated with unavailable variables at phase I. The sample space is then stratified based on these phase I variables. At phase II, a subsample is drawn without replacement from each stratum to obtain phase II variables that are costly or difficult to measure. Strata formation seeks either to oversample subjects with important phase I variables, or to effectively sample subjects with targeted phase II variables, or both. This way, two-phase sampling achieves effective access to important variables with less cost.
While two-phase sampling was originally introduced in survey sampling by Neyman [20] for estimation of the “finite population mean” of some variable, it has become increasingly important in a variety of areas of statistics, biostatistics and epidemiology, especially since [22, 33] and [27].
The setting treated here is as follows:
We begin with a semiparametric model 𝒫 for a vector of variables X with values in 𝒳. [The prime examples which we treat in detail in Section 4 are the Cox proportional hazards regression model with (a) right censoring, and (b) interval censoring.]
Let W = (X, U) ∈ 𝒳 × 𝒰 ≡ 𝒲 where U is a vector of “auxiliary variables,” not involved in the model 𝒫. Suppose that W ~ P̃0 and X ~ P0. Now suppose that V ≡ (X̃, U) ∈ 𝒱 where X̃ ≡ X̃(X) is a coarsening of X.
At phase I we observe V1, …, VN i.i.d. as V, and then use the phase I data to form strata, that is, disjoint subsets 𝒱1, …, 𝒱J of 𝒱 with . We let Nj = #{i ≤ N : Vi ∈ 𝒱j}.
Next, a phase II sample is drawn by sampling without replacement nj ≤ Nj items from stratum j. For the items selected we observe Xi. Thus for the selection indicators ξi we have P̃0 (ξi = 1|Vi) = (nj/Nj)1𝒱j (Vi) ≡ π0 (Vi).
Finally weighted likelihood (or inverse probability weighted) estimation methods based on all the observed data are used to estimate the parameters of the model 𝒫 and to make further inferences about the model.
It is now well known that the classical Horvitz–Thompson estimators [9] use only the phase II data and are inefficient, sometimes quite severely so; see, for example, [2, 3, 14, 23] and [34]. Improvements in efficiency of estimation can be achieved by “adjusting” the weights by use of the phase I data (even though the sampling probabilities are known). Two basic methods of adjustment are:
Estimated weights, a method originating in the missing data literature [23], and with significant further developments since in connection with many models in which the missing-ness mechanism is not known, in contrast to our current setting in which the missing-ness is by design.
Calibration, a method originating in the sample survey literature [8]; see also [13, 14].
One of our goals here is to study existing methods for adjustment of the weights of weighted likelihood methods and to introduce several new methods: modified calibration as suggested by Chan [6] and centered calibration as proposed here in Section 2.
A second goal is to give a systematic treatment of estimators based on sampling without replacement at phase II in the setting of general semiparametric models and to make comparisons with the behavior of estimators based on Bernoulli (or independent) sampling at phase II, thus continuing and strengthening the comparisons made in [4, 5], and [2, 3] for a particular sub-class of semiparametric models and adjustments via estimated weights and ordinary calibration. Many studies of the theoretical properties of procedures based on two-phase design data have been made for the case of Bernoulli sampling; see, for example, [11] and the review of case-cohort sampling given there. On the other hand, while statistical practice continues to involve phase II data sampled without replacement, most available theory in this case (other than [4, 5]) has developed on a model-by-model basis. As has become clear from [4, 5], sampling without replacement results in smaller asymptotic variances, and hence inference based on asymptotic variances derived from Bernoulli sampling will often be conservative. Our treatment here provides theory and tools for dealing directly with the sampling without replacement design. We do this by providing the relevant theory both for semiparametric models in which the infinite-dimensional nuisance parameters can be estimated at a regular rate () with complete data, and semiparametric models in which the infinite-dimensional nuisance parameters can only be estimated at slower (nonregular) rates.
The main contributions of our paper are three-fold: First, we establish two Z-theorems giving weak sufficient conditions for asymptotic distributions of the WLEs in general semiparametric models. The first theorem covers the case where the nuisance parameter is estimable at a regular rate; this yields rigorous justification of [2, 3] under weaker conditions. The second theorem covers the case of general semiparametric models with nonregular rates for estimators of the nuisance parameters. The conditions of our theorems, formulated in terms of complete data, are almost identical to those for the MLE with complete data. This formulation allows us to use tools from empirical process theory together with the new tools developed here in a straightforward way. Second, we propose centered calibration, a new calibration method. This new calibration method is the only one guaranteed to yield improved efficiency over the plain WLE under both Bernoulli sampling and sampling without replacement, while other methods are warranted only for Bernoulli sampling. Third, we establish general results for the inverse probability weighted (IPW) empirical process. Some results such as a Glivenko–Cantelli theorem (Theorem 5.1) and a Donsker theorem (Theorem 5.3) are of interest in their own right. These results accounting for dependence due to the sampling design are useful in verifying the conditions of Z-theorems in applications. For instance, Theorems 5.1 and 5.2 easily establish consistency and rates of convergence under our “without replacement” sampling scheme. We illustrate application of the general results with examples in Section 4.
The rest of the paper is organized as follows. In Section 2, we introduce our estimation procedures in the context of a general semiparametric model. The WLE and methods involving adjusted weights are discussed. Two Z-theorems are presented in Section 3; these yield asymptotic distributions of the WLEs of finite-dimensional parameters of the model. All estimators are compared under Bernoulli sampling and sampling without replacement with different methods of adjusting weights. In Section 4 we apply our Z-theorems to the Cox model with both right censoring and interval censoring. Section 5 consists of general results for IPW empirical processes. Several open problems are briefly discussed in Section 6. All proofs, except those in Section 4 and auxiliary results, are collected in [25].
2. Sampling, models and estimators
We use the basic notation introduced in the previous section. After stratified sampling, X is fully observed for nj subjects in the jth stratum at phase II. The observed data is (V, Xξ, ξ) where ξ is the indicator of being sampled at phase II. We use a doubly subscripted notation: for example, Vj,i denotes V for the ith subject in stratum j. We denote the stratum probability for the jth stratum by νj ≡ P̃0(V ∈ 𝒱j), and the conditional expectation given membership in the jth stratum by P0|j (·) ≡ P̃0(·|V ∈ 𝒱j).
The sampling probability is P(ξ = 1|Vi) = π0 (Vi) = nj/Nj for Vi ∈ 𝒱j. These sampling probabilities are assumed to be strictly positive; that is, there is a constant σ > 0 such that 0 < σ ≤ π0(υ) ≤ 1 for υ ∈ 𝒱. We assume that nj/Nj → pj > 0 for j = 1, …, J as N → ∞. Although dependence is induced among the observations (Vi, ξi Xi, ξi) by the sampling indicators, the vector of sampling indicators (ξj1, …, ξjNj) within strata, are exchangeable for each j = 1, …, J, and the J random vectors (ξj1, …, ξjNj) are independent.
The empirical measure is one of the most useful tools in empirical process theory. Because the Xi’s are observed only for a sub-sample at phase II, we define, instead, the IPW empirical measure by
where δXi denotes a Dirac measure placing unit mass on Xi. The identity in the last display is justified by the arguments in Appendix A of [4]. We also define the IPW empirical process by and the phase II empirical process for the jth stratum by , j = 1, …, J, is the phase II empirical measure for the jth stratum, and is the empirical measure for all the data in the jth stratum; note that the latter empirical measure is not observed. Then, following [4], we decompose as follows:
(2.1) |
where . Notice that correspond to “exchangeably weighted bootstrap” versions of the stratum-wise complete data empirical processes . This observation allows application of the “exchangeably weighted bootstrap” theory of [21] and [32], Section 3.6.
2.1. Improving efficiency by adjusting weights
Efficiency of estimators based on IPW empirical processes can be improved by adjusting weights, either by estimated weights [23] or by calibration [8] via use of the phase I data; see also [14]. Besides these, we discuss two variants of calibration, modified calibration [6], and our proposed new method, centered calibration.
Let Zi ≡ g(Vi) be the auxiliary variables for the ith subject for a known transformation g. For estimated weights with binary regression, Zi contains the membership indicators for the strata I𝒱j (Vi), j = 1, …, J. Observations with π0(V) = 1 are dropped from binary regression, and the original weight 1 is used. For notational simplicity, we write Zi for either method, and assume that sampling probabilities are strictly less than 1 for all strata.
2.1.1. Estimated weights
The method of estimated weights adjusts weights through binary regression on the phase I variables. The sampling probability for the ith subject is modeled by , where α ∈ 𝒜e ⊂ ℝJ+k is a regression parameter and Ge : ℝ ↦ [0, 1] is a known function. If Ge(x) = ex/(1+ex), for instance, then the adjustment simply involves logistic regression. Let α̂N be the estimator of α that maximizes the pseudo- (or composite) likelihood
(2.2) |
We define the IPW empirical measure with estimated weights by
and the IPW empirical process with estimated weights by .
2.1.2. Calibration
Calibration adjusts weights so that the inverse probability weighted average from the phase II sample is equated to the phase I average, whereby the phase I information is taken into account for estimation. Specifically, we find an estimator α̂N that is the solution for α ∈ 𝒜c ⊂ ℝk of the following calibration equation:
(2.3) |
where Gc(V ; α) ≡ G(g(V)T α) = G(ZT α) for known G with G(0) = 1 and Ġ(0) > 0. We call πα(V) ≡ π0(V)/Gc(V ; α) the calibrated sampling probability. We define the calibrated IPW empirical measure by
and the calibrated IPW empirical process by .
Examples for G in the definition of Gc are listed in [8] (F in their notation). For is a well-known regression estimator of the mean of X. Since we assume boundedness of G later, we may want to consider truncated versions of these examples instead. Note that choice of G in (variants of) calibration does not affect asymptotic results on WLEs.
As noted in [13], there are several different approaches to calibration. Here, and in introducing variants of calibration below, we adopt the view that calibration proceeds by making the smallest possible change in weights in order to match an estimated phase II average with the corresponding phase I average. Another approach proceeds via regression modeling of the variable X of interest and the auxiliary variables V, leading to a robustness discussion on effects of the validity of the model on estimation for X. We prefer the former view because we do not assume a model for X and V throughout this paper. In fact, our results are independent of such a modeling assumption.
2.1.3. Modified calibration
Modifying the function Gc in calibration so that individuals with higher sampling probabilities π(Vi) receive less weight was proposed by [6] in a missing response problem where observations are i.i.d. (see, e.g., [28] for recent developments in this area and [14] for their connections with calibration methods). An interpretation of this method within the framework of [8] is discussed in [26]. In modified calibration, we find the estimator α̂N that is the solution for α ∈ 𝒜mc ⊂ ℝk of the following calibration equation:
(2.4) |
where Gmc(V ; α) ≡ G((π0(V)−1 − 1) ZT α) for known G with G(0) = 1 and Ġ(0) > 0. We call πα(V) ≡ π0(V)/Gmc(V ; α) the calibrated sampling probability with modified calibration. We define the IPW empirical measure with modified calibration by
and the corresponding IPW empirical process by .
2.1.4. Centered calibration
We propose a new method, centered calibration, that calibrates on centered auxiliary variables with modified calibration. This method improves the plain WLE under our sampling scheme, while retaining the good properties of modified calibration. See Section 3.4 for a discussion of its advantage and connections to other methods.
In centered calibration, we find the estimator α̂N that is the solution for α ∈ 𝒜cc ⊂ ℝk of the following calibration equation:
(2.5) |
where Gcc(V ; α) ≡ G((π0 (V)−1 − 1)(Z − Z̅N)T α) for known G with G(0) = 1 and Ġ(0) > 0 and . We call πα(V) ≡ π0(V)/Gcc(V ; α) the calibrated sampling probability with centered calibration. We define the IPW empirical measure with centered calibration by
and the corresponding IPW empirical process by .
2.2. Estimators for a semiparametric model 𝒫
We study the asymptotic distribution of the weighted likelihood estimator of a finite-dimensional parameter θ in a general semiparametric model 𝒫 = {Pθ,η : θ ∈ Θ, η ∈ H} where Θ ⊂ ℝp and the nuisance parameter space H is a subset of some Banach space ℬ. Let P0 = Pθ0,η0 denote the true distribution.
The MLE for complete data is often obtained as a solution to the infinite-dimensional likelihood equations. In such models, the WLE under two-phase sampling is obtained by solving the corresponding infinite-dimensional inverse probability weighted likelihood equations. Specifically, the WLE (θ̂N, η̂N) is a solution to the following weighted likelihood equations:
(2.6) |
where is the score function for θ, and the score operator is the bounded linear operator mapping a direction h in some Hilbert space ℋ of one-dimensional submodels for η along which η → η0. The WLE with estimated weights (θ̂N,e, η̂N,e), the calibrated WLE (θ̂N,c, η̂N,c), the WLE with modified calibration (θ̂N,mc, η̂N,mc) and the WLE with centered calibration (θ̂N,cc, η̂N,cc) are obtained by replacing with # ∈ {e, c, mc, cc} in (2.6), respectively. Let ℓ̇0 = ℓ̇θ0, η0, and B0 = Bθ0, η0.
3. Asymptotics for the WLE in general semiparametric models
We consider two cases: in the first case the nuisance parameter η is estimable at a regular (i.e., ) rate, and for ease of exposition, η is assumed to be a measure. In the second case η is only estimable at a nonregular (slower than ) rate. Our theorem (Theorem 3.2) concerning the second case nearly covers the former case, but requires slightly more smoothness and a separate proof of the rate of convergence for an estimator of η. On the other hand, our theorem (Theorem 3.1) concerning the former case includes a proof of the (regular) () rate of convergence, and hence is of interest by itself.
3.1. Conditions for adjusting weights
We assume the following conditions for estimators of α for adjusted weights. Throughout this paper, we may assume both Conditions 3.1 and 3.2 at the same time, but it should be understood that the former condition is used exclusively for the estimators regarding estimated weights and the latter condition is imposed only for estimators regarding (variants of) calibration. Also, it should be understood that Conditions 3.2(a)(i) and (d)(i), Conditions 3.2(a)(ii) and (d)(ii) and Conditions 3.2(a)(iii) and (d)(iii) are assumed for estimators defined via calibration, modified calibration and centered calibration, respectively.
Condition 3.1 (Estimated weights). (a) The estimator α̂N is a maximizer of the pseudo-likelihood (2.2).
-
(b)
Z ∈ ℝJ+k is not concentrated on a (J + k)-dimensional affine space of ℝJ+k and has bounded support.
-
(c)
Ge : ℝ ↦ [0, 1] is a twice continuously differentiable, monotone function.
-
(d)
S0 ≡ P0[{Ġe(ZT α0)}2{π0(V)(1−π0(V))}−1 Z⊗2] is finite and nonsingular, where Ġe is a derivative of Ge.
-
(e)
The “true” parameter α0 = (α0, 1, …, α0, J+k) is given by for j = 1, …, J and α0, j = 0, for j = J + 1, …, J + k. The parameter α is identifiable, that is, pα = pα0 almost surely implies α = α0.
-
(f)
For a fixed pj ∈ (0, 1), nj satisfies nj = [Njpj] for j = 1, …, J.
Condition 3.2 (Calibrations). (a) (i) The estimator is a solution of calibration equation (2.3). (ii) The estimator is a solution of calibration equation (2.4). (iii) The estimator is a solution of calibration equation (2.5).
-
(b)
Z ∈ ℝk is not concentrated at 0 and has bounded support.
-
(c)
G is a strictly increasing continuously differentiable function on ℝ such that G(0) = 1 and for all x, −∞ < m1 ≤ G(x) ≤ M1 < ∞ and 0 < Ġ(x) ≤ M2 < ∞, where Ġ is the derivative of G.
-
(d)
(i) P0Z⊗2 is finite and positive definite. (ii) P0[π0(V)−1(1 − π0(V))Z⊗2] is finite and positive definite. (iii) P0[π0(V)−1 (1 − π0(V)) (Z − μZ)⊗2] is finite and positive definite where μZ = PZ.
-
(e)
(e) The “true” parameter α0 = 0.
Condition 3.1(f) may seem unnatural at first, but in practice the phase II sample size nj can be chosen by the investigator so that the sampling probability pj can be understood to be automatically chosen to satisfy nj = [Njpj]. The other parts of Condition 3.1 are standard in binary regression, and Condition 3.2 is similar to Condition 3.1.
Asymptotic properties of α̂N for all methods are proved in [25].
3.2. Regular rate for a nuisance parameter
We assume the following conditions.
Condition 3.3 (Consistency). The estimator (θ̂N, η̂N) is consistent for (θ0, η0) and solves the weighted likelihood equations (2.6), where may be replaced by with # ∈ {e, c, mc, cc} for the estimators with adjusted weights.
Condition 3.4 (Asymptotic equicontinuity). Let ℱ1(δ) = {ℓ̇θ,η : |θ − θ0| + ∥η − η0∥ < δ} and ℱ2(δ) = {Bθ,ηh − Pθ,ηBθ,ηh : h ∈ ℋ, |θ − θ0| + ∥η − η0∥ < δ}. There exists a δ0 > 0 such that (1) ℱk(δ0), k = 1, 2, are P0-Donsker and suph∈ℋ P0|fj − f0, j|2 → 0, as |θ − θ0| + ∥η − η0∥ → 0, for every fj ∈ ℱj(δ0), j = 1, 2, where f0,1 = ℓ̇θ0,η0 and f0,2 = B0h − P0B0h, (2) ℱk(δ0), k = 1, 2, have integrable envelopes.
Condition 3.5. The map Ψ = (Ψ1, Ψ2) : Θ × H ↦ ℝp × ℓ∞ (ℋ) with components
has a continuously invertible Fréchet derivative map Ψ̇0 = (Ψ̇11, Ψ̇12, Ψ̇21, Ψ̇22) at (θ0, η0) given by Ψ̇ij (θ0, η0)h = P0(ψ̇i,j, θ0, η0,h), i, j ∈ {1, 2} in terms of L2(P0) derivatives of ψ1,θ,η,h = ℓ̇θ,η and ψ2,θ,η,h = Bθ,ηh − Pθ,ηBθ,ηh; that is,
Furthermore, Ψ̇0 admits a partition
where
and is continuously invertible.
Let be the efficient information for θ and be the efficient influence function for θ for the semiparametric model with complete data.
Theorem 3.1. Under Conditions 3.1–3.5,
where # ∈ {e, c, mc, cc},
(3.1) |
(3.2) |
and (recall Conditions 3.1 and 3.2)
Remark 3.1. Our conditions in Theorem 3.1 are the same as those in [5] except the integrability condition. Our Condition 3.4(2) requires existence of integrable envelopes for class of scores while the condition (A1*) in [5] requires square integrable envelopes. Note that this integrability condition is required only for the WLE with adjusted weights, as in [4].
Remark 3.2. As can be seen from the definition of Q#, the choice of G in calibration does not affect the asymptotic variances while Ge in the method of estimated weights does affect the asymptotic variance.
3.3. Nonregular rate for a nuisance parameter
For h = (h1, …, hp)T with hk ∈ H, k = 1, …, p, let Bθ,η[h] = (Bθ,ηh1, …, Bθ,ηhp)T. We assume the following conditions.
Condition 3.6 (Consistency and rate of convergence). An estimator (θ̂N,η̂N) of (θ0,η0) satisfies |θ̂N − θ0| = oP (1), and ∥η̂N − η0∥ = OP (N−β) for some β > 0.
Condition 3.7 (Positive information). There is an , where for k = 1, …, p, such that
The efficient information I0 ≡ P0(ℓ̇0 − B0[h*])⊗2 for θ for the semiparametric model with complete data is finite and nonsingular. Denote the efficient influence function for the semiparametric model with complete data by .
Condition 3.8 (Asymptotic equicontinuity). (1) For any δN ↓ 0 and C > 0,
(2) There exists a δ > 0 such that the classes {ℓ̇θ,η : |θ − θ0| + ∥η − η0∥ ≤ δ} and {Bθ,η[h*] : |θ − θ0| + ∥η − η0∥ ≤ δ} are P0-Glivenko–Cantelli and have integrable envelopes. Moreover, ℓ̇θ,η and Bθ,η[h*] are continuous with respect to (θ, η) either pointwise or in L1(P0).
Condition 3.9 (Smoothness of the model). For some α > 1 satisfying αβ > 1/2 and for (θ, η) in the neighborhood {(θ, η) : |θ − θ0| ≤ δN, ∥η − η0∥ ≤ CN−β},
In the previous section, we required that the WLE solves the weighted likelihood equations (2.6) for all h ∈ ℋ. Here, we only assume that the WLE (θ̂N, η̂N) satisfies the weighted likelihood equations
(3.3) |
The corresponding WLEs with adjusted weights, (θ̂N,#, η̂N,#) with # ∈ {e, c, mc, cc} satisfy (3.3) with replaced by .
Theorem 3.2. Suppose that the WLE is a solution of (3.3) where may be replaced by with # ∈ {e, c, mc, cc} for the estimators with adjusted weights. Under Conditions 3.1, 3.2 and 3.6–3.9,
where Σ and Σ# are as defined in (3.1) and (3.2) of Theorem 3.1, but now I0 and ℓ̃0 are defined in Condition 3.7, and Q# are defined in Theorem 3.1.
Remark 3.3. Our conditions are identical to those of the Z-theorem of [10] except Condition 3.8(2). This additional condition is not stringent for the following reasons. First, the Glivenko–Cantelli condition is usually assumed to prove consistency of estimators before deriving asymptotic distributions. Second, a stronger L2(P0)-continuity condition is standard as is seen in Condition 3.4 (see also Section 25.8 of [31]). Note that the L1(P0)-continuity condition is only required for the WLEs with adjusted weights.
3.4. Comparisons of methods
We compare asymptotic variances of five WLEs in view of improvement by adjusting weights and change of designs. We also include in comparison special cases of adjusting weights involving stratum-wise adjustment.
3.4.1. Stratified Bernoulli sampling
We first give a statement of the result corresponding to Theorem 3.1 for stratified Bernoulli sampling where all subjects are independent with the sampling probability pj if V ∈ 𝒱j and with # ∈ {e, c, mc, cc} are the corresponding WLE and WLEs with adjusted weights.
Theorem 3.3. Suppose Conditions 3.1 [except Condition 3.1(f)] and 3.2 hold. Let ξi ∈ {0, 1} and ξ be i.i.d. with .
- Suppose that the WLE is a solution of (3.3) where may be replaced by with # ∈ {e, c, mc, cc} for the estimators with adjusted weights. Under the same conditions as in Theorem 3.1,
where(3.4)
where Q# with # ∈ {e, c, mc, cc} are defined in Theorem 3.1.(3.5) Under the same conditions as in Theorem 3.2, the same conclusions in (1) hold with I0 and ℓ̃0 replaced by those defined in Condition 3.7.
Comparing the variance–covariance matrices in Theorem 3.3 to those in Theorems 3.1 and 3.2, we obtain the following corollary comparing designs. All estimators have smaller variances under sampling without replacement.
Corollary 3.1. Under the same conditions as in Theorem 3.3,
Variance formulas (3.5) with # ∈ {e, mc, cc} except for the ordinary calibration have the following alternative representations which show the efficiency gains over the plain WLE under Bernoulli sampling.
Corollary 3.2. Under the same conditions as in Theorem 3.3,
3.4.2. Within-stratum adjustment of weights
Adjusting weights can be carried out in every stratum. This is proposed by Breslow et al. [2, 3] for ordinary calibration. Consider calibration on Z̃ where Z̃ ≡ (Z(1), …, Z(J))T with Z(j) ≡ I (V ∈ 𝒱j)ZT . The calibration equation (2.3) becomes
where α ∈ ℝJk. We call this special case within-stratum calibration. We define within-stratum modified and centered calibration analogously.
We also call estimated weights carried out within stratum within-stratum estimated weights. Recall that Z in estimated weights contains the membership indicators for the strata and the rest are other auxiliary variables, say Z[2]. Within-stratum estimated weights uses Z̃ ≡ (Z(1), …, Z(J))T where Z(j) ≡ I (V ∈ 𝒱j)(Z[2])T with 1 included in Z[2]. The “true” parameter α̃0 has zero for all elements except having for the element corresponding to I (V ∈ 𝒱j), j = 1, …, J.
The following corollary summarizes within-stratum adjustment of weights under stratified Bernoulli sampling and sampling without replacement. All methods achieve improved efficiency over the plain WLE under Bernoulli sampling while centered calibration is the only method to yield a guaranteed improvement under sampling without replacement. This is because centering yields the -projection suitable for the conditional variances in (3.2) while noncentering results in the L2(P0|j)-projection for the conditional expectations in (3.5).
Corollary 3.3. (1) (Bernoulli) Under the same conditions as in Theorem 3.3 with Z replaced by Z̃ and α0 replaced by α̃0 for within-stratum estimated weights,
(3.6) |
where # ∈ {e, c, mc, cc} and
with μZ,j ≡ E[I (V ∈ 𝒱j)Z] for j = 1, …, J.
(2) (without replacement) Under the same conditions as in Theorems 3.1 or 3.2 with Z is replaced by Z̃
(3.7) |
.
3.4.3. Comparisons
We summarize Corollaries 3.1–3.3. Every method of adjusting weights improves efficiency over the plain WLE in a certain design and with a certain range of adjustment of weights (within-stratum or “across-strata” adjustment). However, particularly notable among all methods is centered calibration. While other methods gain efficiency only under Bernoulli sampling, centered calibration improves efficiency over the plain WLE under both sampling schemes. There is no known method of “across-strata” adjustment that is guaranteed to gain efficiency over the plain WLE under stratified sampling without replacement.
There are close connections among all methods. When the auxiliary variables have mean zero, centered and modified calibrations are essentially the same. The ordinary and modified calibrations give the same asymptotic variance when carried out stratum-wise. For Z and α0 defined for estimated weights, estimated weights and modified calibration based on (1 − π0(V))−1 Ġe(ZT α0)Z performs the same way. Similarly within-stratum estimated weights with Z̃ and α̃0 is as good as within-stratum calibration based on Ġe(Z̃T α̃0)Z̃.
As seen in the relationship among methods, there is no single method superior to others in each situation. In fact, performance depends on choice and transformation of auxiliary variables, the true distribution P0 and the design. For our “without replacement” sampling scheme, within-stratum centered calibration is the only method guaranteed to gain efficiency while other methods may perform even worse than the plain WLE.
4. Examples
For asymptotic normality of WLEs, consistency and rate of convergence need to be established first to apply our Z-theorems in Section 3. To this end, general results on IPW empirical processes discussed in the next section will be useful. We illustrate this in the Cox models with right censoring and interval censoring under two-phase sampling.
Let T ~ F be a failure time, and X be a vector of covariates with bounded supports in the regression model. The Cox proportional hazards model [7] specifies the relationship
where θ ∈ Θ ⊂ ℝp is the regression parameter, Λ ∈ H is the (baseline) cumulative hazard function. Here the space H for the nuisance parameter Λ is the set of nonnegative, nondecreasing cadlag functions defined on the positive line. The true parameters are θ0 and Λ0.
In addition to X, let U be a vector of auxiliary variables collected at phase I which are correlated with the covariate X. For simplicity of notation, we assume that the covariates X are only observed for the subject sampled at phase II. Thus, if some of the coordinates of X are available at phase I, then we include an identical copy of those coordinates of X in the vector of U.
4.1. Cox model with right censored data
Under right censoring, we only observe the minimum of the failure time T and the censoring time C ~ G. Define the observed time Y = T ∧ C and the censoring indicator Δ = I (T ≤ C). The phase I data is V = (Y, Δ, U), and the observed data is (Y, Δ, ξX, U, ξ) where ξ is the sampling indicator. With right censored data and complete data, the theory for maximum likelihood estimators in the Cox model has received several treatments; the one we follow most closely here is that of [31]. For the Cox model with case-cohort data, see [27] and for treatments with even more general designs [1] and [12]. Here, for both sampling without replacement and Bernoulli sampling, we continue the developments of [4, 5]. We assume the following conditions:
Condition 4.1. The finite-dimensional parameter space Θ is compact and contains the true parameter θ0 as an interior point.
Condition 4.2. The failure time T and the censoring time C are conditionally independent given X, and that there is τ > 0 such that P(T > τ) > 0 and P(C ≥ τ) = P(C = τ) > 0. Both T and C have continuous conditional densities given the covariates X = x.
Condition 4.3. The covariate X has bounded support. For any measurable function h, P(X ≠ h(Y)) > 0.
Let λ(t) = (d/dt)Λ(t) be the baseline hazard function. With complete data, the density of (Y, Δ, X) is
where pX is the marginal density of X and g(·|x) is the conditional density of C given X = x. The score for θ is given by ℓ̇θ,Λ (y, δ, x) = x{δ − eθT x Λ (y)}, and the score operator Bθ,Λ : ℋ ↦ L2(Pθ,Λ) is defined on the unit ball ℋ in the space BV[0, τ] such that Bθ,Λ h(y, δ, x) = δh(y) − eθT x ∫[0,y] h d Λ. Because the likelihood based on the density above does not yield the MLE for complete data, we define the log likelihood for one observation for complete data by ℓθ,Λ (y, δ, x) = log{(eθT x Λ {y})δ e−Λ(y)eθT x} where Λ{t} is the (point) mass of Λ at t. Then maximizing the weighted log likelihood reduces to solving the system of equations for every h ∈ ℋ. The efficient score for θ for complete data is given by
, and the efficient information for θ for complete data is
where .
Theorem 4.1 (Consistency). Under Conditions 3.1, 3.2, 4.1–4.3, the WLEs are consistent for (θ0, Λ0).
Proof. This proof follows along the lines of the proof given by [29], but with the usual empirical measure replaced by the IPW empirical measure (with adjusted weights), and by use of Theorem 5.1. For details see [25].
Our Z-theorem (Theorem 3.1) yields asymptotic normality of the WLEs.
Theorem 4.2 (Asymptotic normality). Under Conditions 3.1, 3.2, 4.1–4.3,
where # ∈ {e, c, mc, cc}, is the efficient influence function for θ for complete data, and Σ and Σ# are given in Theorem 3.1.
Proof. We verify the conditions of Theorem 3.1. Condition 3.3 holds by Theorem 4.1. Conditions 3.4 and 3.5 hold under the present hypotheses as was shown in [31], Section 25.12.
For variance estimation regarding can be used to estimate I0. Letting , we can estimate by where and . The other four cases are similar.
4.2. Cox model with interval censored data
Let Y be a censoring time that is assumed to be conditionally independent of a failure time T given a covariate vector X. Under the case 1 interval censoring, we do not observe T but (Y, Δ) where Δ ≡ I (T ≤ Y). The phase I data is V = (Y, Δ, U) and the observed data is (Y, Δ, ξ X, U, ξ) where ξ is the sampling indicator. In the case of complete data, maximum likelihood estimates for this model were studied by Huang [10]. For a generalized version of this model and two-phase data with Bernoulli sampling, weighted likelihood estimates with and without estimated weights have recently been studied by Li and Nan [11]. Here we treat two-phase data under sampling without replacement at phase II and with both estimated weights and calibration.
With complete data, the log likelihood for one observation is given by
where F̅ ≡ 1 − F = e−Λ. The score for θ and the score operator Bθ,Λ for Λ for complete data are ℓ̇θ,Λ = x exp(θT x)Λ(y)(δr(y, x; θ, Λ) − (1 − δ)) and Bθ,Λ[h] = exp(θT x)h(y){δr(y, x; θ, Λ) − (1 − δ)} where r(y, x; θ, Λ) = exp(−eθT x Λ (y))/{1 − exp(−eθT x Λ (y))}. The efficient score for θ for complete data is given by
where Q(y, δ, x; θ, Λ) = δr(y, x; θ, Λ) − (1 − δ) and O(y|x) = r(y, x; θ0, Λ0). The efficient information for θ for complete data is given by Ĩθ0, Λ0 = E[R(Y, X){X − E[X R(Y, X)|Y]/E[R(Y, X)|Y]}] where . See [10] for further details.
We impose the same assumptions made for complete data in [10].
Condition 4.4. The finite-dimensional parameter space Θ is compact and contains the true parameter θ0 as its interior point.
Condition 4.5. (a) The covariate X has bounded support; that is, there exists x0 such that |X| ≤ x0 with probability 1. (b) For any θ ≠ θ0, the probability .
Condition 4.6. F0(0) = 0. Let τF0 = inf{t : F0(t) = 1}. The support of Y is an interval S[Y] = [lY, uY] and 0 < lY ≤ uY < τF0.
Condition 4.7. The cumulative hazard function Λ0 has strictly positive derivative on S[Y], and the joint function G(y, x) of (Y, X) has bounded second order (partial) derivative with respect to y.
4.2.1. Consistency
The characterization of WLEs (θ̂N, Λ̂N) and (θ̂N, #, Λ̂N, #) with # ∈ {e, c, mc, cc} maximizing is given in [25], Lemma A.5. We prove consistency of the WLEs in the metric given by d((θ1, Λ1), (θ2, Λ2)) ≡ ∥θ1 − θ2∥ + ∥Λ1 − Λ2∥PY, where ∥ · ∥ is the Euclidean metric and , and PY is the marginal probability measure of the censoring variable Y.
Theorem 4.3 (Consistency). Under Conditions 3.1, 3.2, 4.4–4.7, the WLEs are consistent in the metric d.
Proof. We only prove consistency for the WLE. Proofs for the other four estimators are similar.
Let H̃ be the set of all subdistribution functions defined on [0, ∞]. We denote the WLE of F as F̂N = 1 − e−Λ̂N. Define the set ℱ of functions by
Boundedness of X and compactness of Θ ⊂ ℝp imply that the set {eθT x : θ ∈ Θ} is Glivenko–Cantelli. The set H̃ is also Glivenko–Cantelli since it is a subset of the set of bounded monotone functions. Thus, it follows from boundedness of functions in ℱ and the Glivenko–Cantelli preservation theorem [30] that ℱ is Glivenko–Cantelli.
Let 0 < α < 1 be a fixed constant. It follows by concavity of the function u ↦ log u and Jensen’s inequality that
where the first equality holds if and only if 1 + α(f (θ, F)/f (θ0, F0) − 1) is constant on S[Y], in other words, (θ, F) = (θ0, F0) on S[Y] by the identifiability Condition 4.5. Note also that by monotonicity of the logarithm
Thus, the set 𝒢 = {log{1 + α(f (θ, F)/f (θ0, F0) − 1)} : f (θ, F) ∈ ℱ} has an integrable envelope. To see this, form a sequence (θn, Fn) such that
Then {gn − log(1 − α)}n∈ℕ is a monotone increasing sequence of nonnegative functions. By the monotone convergence theorem, P0gn − log(1 − α) → P0G − log(1 − α) ≤ −log(1 − α). Thus we choose G ∨ − log(1 − α) as an integrable envelope. Also, the set 𝒢 is Glivenko–Cantelli by a Glivenko–Cantelli preservation theorem [30].
Now, by the concavity of the map u ↦ log u, and the definition of the WLE, we have
Since Θ and H̃ are compact, there is a subsequence of (θ̂N, F̂N) converging to (θ∞, F∞) ∈ Θ × H̃. Along this subsequence, Theorem 5.1 implies that
so that Pθ0, F0 log{1 + α(f (θ∞, F∞)/f (θ0, F0) − 1)} = 0. This is possible when (θ∞, F∞) = (θ0, F0) because (θ, F) ↦ P[log{1 + α(f (θ, F)/f (θ0, F0) − 1)}] attains its maximum only at (θ0, F0). Hence we conclude that (θ̂N, F̂N) converges to (θ0, F0) in the sense of Kullback–Leibler divergence. Since the Kullback–Leibler divergence bounds the Hellinger distance, it follows by Lemma A5 of [17] that d((θ̂N, Λ̂N), (θ0, Λ0)) = oP* (1).
4.2.2. Rate of convergence
We prove the rate of convergence of the WLE is N1/3 by applying the rate theorem (Theorem 5.2) in Section 5. Since we proved the consistency of (θ̂N, Λ̂N) to (θ0, Λ0) on S[Y], under Condition 4.6 we can restrict a parameter space of Λ to HM ≡ {Λ ∈ H : M−1 ≤ Λ ≤ M, on S[Y]}, where M is a positive constant such that M−1 ≤ Λ0 ≤ M on S[Y]. Define ℳ ≡ {ℓ(θ, Λ): θ ∈ Θ, Λ ∈ HM}.
Theorem 4.4 (Rate of convergence). Under Conditions 4.4–4.7,
This holds if we replace the WLE by the WLEs with adjusted weights assuming Conditions 3.1 and 3.2.
Proof. Since the rate of convergence for the WLE is easier to verify than the other four estimators, we only prove the theorem for the WLE with modified calibration. The cases for the WLEs with adjusted weights.
We proceed by verifying the conditions in Theorem 5.2. Bound (5.4) follows by Lemma 5.2 in Section 5 and Lemma A5 of [17]. For bound (5.5), we follow the proof of (5.3) in [10]. Since α̂N is consistent, we can specify the small neighborhood 𝒜mc, 0 of a zero vector such that Gmc(z; α) is contained in a small interval that contains 1 and consists of strictly positive numbers. Thus, multiplying the log likelihood by a uniformly bounded quantity Gmc(z; α) only requires a slight modification of Huang’s proof of his Lemma 3.1 to obtain supQ log N[·] (ε, Gℳ, L2(Q)) ≲ ε−1 for ε small enough where the supremum is taken over the all discrete probability measures and Gℳ = {Gmc(·; α)ℓ(θ, Λ) : α ∈ 𝒜mc, 0, ℓ(θ, Λ) ∈ ℳ}. Let Gℳδ = {m(θ, Λ, α) − m(θ0, Λ0, α) : m(θ, Λ, α) ∈ Gℳ, d((θ, Λ), (θ0, Λ0)) ≤ δ}. It follows by Lemma 3.2.2 of [32] that . Apply Theorem 5.2 to conclude rN = N1/3.
4.2.3. Asymptotic normality of the estimators
We apply Theorem 3.2 to derive the asymptotic distributions of the WLEs.
Theorem 4.5 (Asymptotic normality). Under Conditions 3.1, 3.2, 4.4–4.7,
where # ∈ {e, c, mc, cc}, is the efficient influence function for complete data and Σ and Σ# are given in Theorem 3.2.
Proof. We proceed by verifying the conditions of Theorem 3.2 for the WLE with modified calibration. The other four cases are similar.
Condition 3.6 is satisfied with β = 1/3 by Theorems 4.3 and 4.4. Conditions 3.7–3.9 are verified by [10] with
Since by Lemma A.5, it remains to show that
Let be the composition of h* and the inverse of Λ0. Note that Λ0 is a strictly increasing continuous function by our assumption. Since g0(Λ̂N, mc(y)) is a right continuous function and has exactly the same jump points as Λ̂N, mc(y), by Lemma A.5, . By Conditions 4.5–4.7, h* has bounded derivative. This and the assumption that Λ0 has strictly positive derivative by Condition 4.7 imply that g0 has bounded derivative, too. So, noting that h* = g0 ◦ Λ0, we have
Huang [10] showed that the second term in the display is oP*(N−1/2). We show that the first term in the display is also oP*(N−1/2). Let C > 0 be an arbitrary constant. Define for a fixed constant η > 0, 𝒟(η) ≡ {ψ(y, x; θ, Λ) : d((θ, Λ), (θ0, Λ0)) ≤ η, Λ ∈ HM}, where ψ(y, δ, x; θ, Λ) ≡ {g0 ◦ Λ0(y) − g0(Λ(y))} × eθT x Q(y, δ, x; θ, Λ). Because Huang [10] showed that 𝒟(η) is Donsker for every η > 0 and that ∥𝔾N∥𝒟 (CN−1/3) = oP*(1), it follows by Lemma 5.4 with ℱN replaced by 𝒟(CN−1/3) that . This completes the proof.
Unlike the previous example, depends on additional unknown functions, and the method of variance estimation used in the previous example does not apply to the present case. See the discussion in Section 6.
5. General results for IPW empirical processes
The IPW empirical measure and IPW empirical process inherit important properties from the empirical measure and empirical process, respectively. We emphasize the similarity between empirical processes and IPW empirical processes.
5.1. Glivenko–Cantelli theorem
The next theorem states that the Glivenko–Cantelli property for complete data is preserved under two-phase sampling.
Theorem 5.1. Suppose that ℱ is P0-Glivenko–Cantelli. Then
(5.1) |
where ∥·∥ℱ is the supremum norm. This also holds if we replace with # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2.
5.2. Rate of convergence
The rate of convergence of an M-estimator for complete data is often established via maximal inequalities for the empirical processes. If we follow the same line of reasoning, it is natural to derive maximal inequalities for IPW empirical processes, though this may require some efforts. Fortunately, these maximal inequalities for empirical processes (or slight modifications of them) suffice to establish the same rate of convergence under two-phase sampling.
Theorem 5.2. Let ℳ = {mθ : θ ∈ Θ} be the set of criterion functions and define ℳδ = {mθ − mθ0 : d(θ, θ0) < δ} for some fixed δ > 0 where d is a semimetric on the parameter space Θ.
(1) Suppose that for every θ in a neighborhood of θ0,
(5.2) |
here a ≲ b means a ≤ K b for some constant K ∈ (0, ∞). Assume that there exists a function ϕN such that δ ↦ ϕN(δ)/δα is decreasing for some α < 2 (not depending on N), and for every N,
(5.3) |
where 𝔾N is the empirical process. If an estimator θ̂N satisfying converges in outer probability to θ0, then rNd(θ̂N, θ0) = OP*(1) for every sequence rN such that for every N.
(2) Let # ∈ {e, c, mc, cc} be fixed. Suppose Condition 3.2 holds. Suppose also that for every θ ∈ Θ in a neighborhood of θ0,
(5.4) |
where G̃e = π0(V)/Ge or G̃# = G# with # ∈ {c, mc, cc}. Assume that
(5.5) |
where G̃#ℳδ ≡ {G̃#(·; α)f : |α| ≤ δ, α ∈ 𝒜N, f ∈ ℳδ} for some 𝒜N ⊂ 𝒜#. Then an estimator θ̂N,# satisfying has the same rate of convergence as θ̂N in part (1) if it is consistent.
Remark 5.1. The key to establishing a general theorem for the rate of convergence is to make use of the boundedness of the weights in the IPW empirical process and also deal with the dependence of the weights. In treating independent bootstrap weights in the weighted bootstrap ([15], Lemmas 1–3), require the boundedness of bootstrap weights because the product of an unbounded weight and a bounded function is no longer bounded. Our theorem exploits the boundedness of sampling indicators in the IPW empirical processes by applying a multiplier inequality for the case of bounded weights (Lemma 5.1) to cover more general cases.
The following is a multiplier inequality for bounded exchangeable weights. Note that the sum of stochastic processes in the second term is divided by n1/2 rather than k1/2.
Lemma 5.1. For i.i.d. stochastic processes Z1, …, Zn, every bounded, exchangeable random vector (ξ1, …, ξn) with each ξi ∈ [l, u] that is independent of Z1, …, Zn, and any 1 ≤ n0 ≤ n,
Bound (5.5) is not difficult to verify in the presence of bound (5.3) since G#(· ; α) is a bounded monotone function indexed by a finite-dimensional parameter. Bound (5.4) may be verified through the lemma below for some applications including the Cox model with interval censoring.
Lemma 5.2. Suppose Conditions 3.1 and 3.2 hold. Let mθ be the log likelihood log pθ where pθ is the density with dominating measure μ, and d is the Hellinger distance. Then the bound (5.4) holds.
5.3. Donsker theorem
The next theorem yields weak convergence of the IPW empirical processes under sampling without replacement.
Theorem 5.3. Suppose that ℱ with ∥P0∥ℱ < ∞ is P0-Donsker and Conditions 3.1 and 3.2 hold. Then
(5.6) |
(5.7) |
in ℓ∞(ℱ) where # ∈ {e, c, mc, cc}, the P0-Brownian bridge process, 𝔾, indexed by ℱ and the P0|j-Brownian bridge processes, 𝔾j, indexed by ℱ are all independent.
Remark 5.2. The integrability hypothesis ∥P0∥ℱ < ∞ is only required for the IPW empirical processes with adjusted weights.
For a Donsker set ℱ, it follows by Theorem 5.3 and Lemma 2.3.11 of [32] that asymptotic equicontinuity in probability and in mean follows for the metric that depends on the limit process. In applications, it is of interest to have these results for the original metric ρP0(f, g) = σP0(f − g).
Theorem 5.4. Let ℱ be Donsker and define ℱδ = {f − g : f, g ∈ ℱ, ρP0(f, g) < δ} for some fixed δ > 0. Then, for every sequence δN ↓ 0,
and consequently, . Moreover, for # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2.
We end this section with two important lemmas. The first lemma is an extension of Lemma 3.3.5 of [32] and will be used in our proof of Theorem 3.1 to verify asymptotic equicontinuity.
Lemma 5.3. Suppose ℱ = {ψθ,h − ψθ0,h : ∥θ − θ0∥ < δ, h ∈ ℋ} is P0-Donsker for some δ > 0 and that suph∈ℋ P0(ψθ,h − ψθ0,h)2 → 0, as θ →θ0. If θ̂N converges in outer probability to θ0, then
This also holds if we replace with # ∈ {e, c, mc, cc} assuming Conditions 3.1 and 3.2 hold and ∥P0∥ℱ < ∞.
The second lemma is used to verify asymptotic equicontinuity in the proof of Theorem 3.2, the first part for the IPW empirical process and the second part for the other four IPW empirical processes with adjusted weights.
Lemma 5.4. Let ℱN be a sequence of decreasing classes of functions such that ∥𝔾N∥ℱN = oP*(1). Assume that there exists an integrable envelope for ℱN0 for some N0. Then E∥𝔾N∥ℱN → 0 as N → ∞. As a consequence, .
Suppose, moreover, that ℱN is P0-Glivenko–Cantelli with ∥P0∥ℱN1 < ∞ for some N1, and that every f = fN ∈ ℱN converges to zero either point-wise or in L1(P0) as N → ∞. Then , assuming Conditions 3.1 and 3.2.
6. Discussion
We developed asymptotic theory for weighted likelihood estimation under two-phase sampling, introduced and studied a new calibration method, centered calibration, and compared several WLE estimation methods involving adjusted weights. The methods of proof and general results for the IPW empirical process are applicable to other estimation procedures. For example, the weighted Kaplan–Meier estimator can be shown to be asymptotically Gaussian via our Donsker theorem (Theorem 5.3) together with the functional delta method. A particularly interesting application is to study asymptotic properties of estimators that are known to be efficient under Bernoulli sampling (e.g., estimator of [19]). Whether or not these estimators are “efficient” under our sampling scheme is an open problem; see [16] for a definition of efficiency with non-i.i.d. data.
There are several other open problems. Variance estimation under two-phase sampling has been restricted to the case where the asymptotic variance is a known function up to parameters as discussed in Section 4, while there are several methods available for complete data in a general case (e.g., [18]). In [24] the first author has proposed and studied nonparametric bootstrap variance estimation methods which remain valid even under model misspecification; these results will appear elsewhere. Another direction of research is to study (local and global) model misspecification under two-phase sampling where missingness is by design. An interesting open problem beyond our sampling scheme is to study other complex survey designs. Stratified sampling without replacement is sufficiently simple for the existing bootstrap empirical process theory to apply. Other complex designs may provide interesting theoretical challenges, perhaps in connection with extensions of bootstrap empirical process theory.
Supplementary Material
Acknowledgements
We owe thanks to Kwun Chuen Gary Chan for suggesting the modified calibration method introduced in Section 2.1.3. We also thank Norman Breslow for many helpful conversations concerning two-phase sampling, and two referees for their constructive comments and suggestions.
Footnotes
Supported by NIH/NIAID R01 AI089341.
Supported in part by NSF Grant DMS-11-04832, NI-AID Grant 2R01 AI291968-04 and the Alexander von Humboldt Foundation.
Supplementary material for “Weighted likelihood estimation under two-phase sampling” (DOI: 10.1214/12-AOS1073SUPP;.pdf). Due to space constraints, the proofs and technical details have been given in the supplementary document [25]. References here beginning with “A.” refer to [25].
REFERENCES
- 1.Binder DA. Fitting Cox’s proportional hazards models from survey data. Biometrika. 1992;79:139–147. MR1158522. [Google Scholar]
- 2.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Stat. Biosc. 2009;1:32–49. doi: 10.1007/s12561-009-9001-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Breslow NE, Lumley T, Ballantyne C, Chambless L, Kulich M. Using the whole cohort in the analysis of case-cohort data. Am. J. Epidemiol. 2009;169:1398–1405. doi: 10.1093/aje/kwp055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Breslow NE, Wellner JA .Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Stat. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. MR2325244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Breslow NE, Wellner JA. A Z-theorem with estimated nuisance parameters and correction note for:“Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression” [ScandJStatist. 34 (2007), no. 1, 86-102; MR2325244] Scand. J. Stat. 2008;35:186–192. doi: 10.1111/j.1467-9469.2007.00574.x. MR2391566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chan KCG. Uniform improvement of empirical likelihood for missing response problem. Electron. J. Stat. 2012;6:289–302. [Google Scholar]
- 7.Cox DR. Regression models and life-tables (with discussion) J. R. Stat. Soc. Ser. B Stat. Methodol. 1972;34:187–220. MR0341758. [Google Scholar]
- 8.Deville J-C, Särndal C-E. Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 1992;87:376–382. MR1173804. [Google Scholar]
- 9.Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 1952;47:663–685. MR0053460. [Google Scholar]
- 10.Huang J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 1996;24:540–568. MR1394975. [Google Scholar]
- 11.Li Z, Nan B. Relative risk regression for current status data in case-cohort studies. Canad. J. Statist. 2011;39:557–577. MR2860827. [Google Scholar]
- 12.Lin DY. On fitting Cox’s proportional hazards models to survey data. Biometrika. 2000;87:37–47. MR1766826. [Google Scholar]
- 13.Lumley T. Complex Surveys:A Guide to Analysis Using R. Hoboken, NJ: Wiley; 2010. [Google Scholar]
- 14.Lumley T, Shaw PA, Dai JY. Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 2011;79:200–232. doi: 10.1111/j.1751-5823.2011.00138.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ma S, Kosorok MR. Robust semiparametric M-estimation and the weighted bootstrap. J. Multivariate Anal. 2005;96:190–217. MR2202406. [Google Scholar]
- 16.McNeney B, Wellner JA. Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference. 2000;91:441–480. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998) MR1814795. [Google Scholar]
- 17.Murphy SA, van der Vaart AW. Semiparametric likelihood ratio inference. Ann. Statist. 1997;25:1471–1509. MR1463562. [Google Scholar]
- 18.Murphy SA, van der Vaart AW. Observed information in semi-parametric models. Bernoulli. 1999;5:381–412. MR1693616. [Google Scholar]
- 19.Nan B. Efficient estimation for case-cohort studies. Canad. J. Statist. 2004;32:403–419. MR2125853. [Google Scholar]
- 20.Neyman J. Contribution to the theory of sampling human populations. J. Amer. Statist. Assoc. 1938;33:101–116. [Google Scholar]
- 21.Præstgaard J, Wellner JA. Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 1993;21:2053–2086. MR1245301. [Google Scholar]
- 22.Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
- 23.Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 1994;89:846–866. MR1294730. [Google Scholar]
- 24.Saegusa T. Ph.D. thesis. Seattle, WA: Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Saegusa T, Wellner JA. Supplement to"Weighted likelihood estimation under two-phase sampling". 2012 doi: 10.1214/12-AOS1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Saegusa T, Wellner JA. Technical Report 592. Seattle, WA: Dept. Statistics, Univ. Washington; 2012. Weighted likelihood estimation under two-phase sampling. Available at arXiv:1112.4951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 1988;16:64–81. MR0924857. [Google Scholar]
- 28.Tan Z. Efficient restricted estimators for conditional mean models with missing data. Biometrika. 2011;98:663–684. MR2836413. [Google Scholar]
- 29.van der Vaart A. Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Lecture Notes in Math. Vol. 1781. Berlin: Springer; 2002. Semiparametric statistics; pp. 331–457. MR1915446. [Google Scholar]
- 30.van der Vaart A, Wellner JA. High Dimensional Probability. II (Seattle. WA, 1999). Progress in Probability. Vol. 47. Boston, MA: Birkhäuser; 2000. Preservation theorems for Glivenko– Cantelli and uniform Glivenko–Cantelli classes; pp. 115–133. MR1857319. [Google Scholar]
- 31.van der Vaart AW. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 3. Cambridge: Cambridge Univ. Press; 1998. MR1652247. [Google Scholar]
- 32.van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes :With Applications to Statistics. New York: Springer; 1996. MR1385671. [Google Scholar]
- 33.White JE. A two stage design for the study of the relationship between a rare exposure and and a rare disease. Am. J. Epidemiol. 1986;115:119–128. doi: 10.1093/oxfordjournals.aje.a113266. [DOI] [PubMed] [Google Scholar]
- 34.Zheng H, Little RJA. Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Survey Methodology. 2004;30:209–218. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.