Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Oct 7.
Published in final edited form as: J Am Stat Assoc. 2017 Apr 25;112(519):1221–1235. doi: 10.1080/01621459.2016.1205500

Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation

JON ARNI STEINGRIMSSON 1, ROBERT L STRAWDERMAN 2,
PMCID: PMC7540935  NIHMSID: NIHMS1502777  PMID: 33033419

Abstract

This paper considers linear regression with missing covariates and a right censored outcome. We first consider a general two-phase outcome sampling design, where full covariate information is only ascertained for subjects in phase two and sampling occurs under an independent Bernoulli sampling scheme with known subject-specific sampling probabilities that depend on phase one information (e.g., survival time, failure status and covariates). The semiparametric information bound is derived for estimating the regression parameter in this setting. We also introduce a more practical class of augmented estimators that is shown to improve asymptotic efficiency over simple but inefficient inverse probability of sampling weighted estimators. Estimation for known sampling weights and extensions to the case of estimated sampling weights are both considered. The allowance for estimated sampling weights permits covariates to be missing at random according to a monotone but unknown mechanism. The asymptotic properties of the augmented estimators are derived and simulation results demonstrate substantial efficiency improvements over simpler inverse probability of sampling weighted estimators in the indicated settings. With suitable modification, the proposed methodology can also be used to improve augmented estimators previously used for missing covariates in a Cox regression model.

1. Introduction

The semiparametric accelerated failure time (AFT) model is an interesting and useful alternative to the Cox regression model (Cox, 1972). The standard form of the AFT model relates a failure time T to a fully observed p-dimensional covariate vector W through the log-linear model

log(T)=Wβ0+ϵ, (1)

where ϵ has an unspecified absolutely continuous distribution function. In this model, estimation of the regression parameter β0 under model (1) from a cohort sample consisting of n independent and identically distributed (i.i.d.) observations is of primary interest. Numerous authors, beginning with Miller (1976) and Buckley and James (1979), have proposed estimation procedures for this model in settings where covariates do not vary with time. Tsiatis (1990) proposed using a weighted class of linear-rank estimating equations and rigorously developed the asymptotic properties of the corresponding regression parameter estimate; see also Ying (1993). Ritov (1990) studied the class of generalized Buckley-James estimators and established important connections between the two approaches. Fast, computationally efficient procedures for parameter and variance estimation are studied in Johnson and Strawderman (2009) and statistically efficient estimation for β0 is considered in Zeng and Lin (2007), Ding and Nan (2011) and Lin and Chen (2013).

In this article, estimation and inference for β0 is considered when some of the components of W may be missing at random (MAR). In particular, we assume the covariate W can be partitioned as (Wobs, Wmiss) where Wobs is always available and Wmiss may be missing on some fraction of subjects. We consider estimation and inference for β0 when Wmiss is missing by design and also when it is missing according to an unknown probabilistic mechansim that must be modeled and estimated from the observed data. These problems are considered in Nan et al. (2009, Ex. (v), p. 2354), who extended the linear rank estimating equation of Tsiatis (1990) by inversely weighting the contribution of each subject with complete covariate information according to the probability of observing Wmiss for that subject. The resulting inverse probability weighted (IPW) estimating equation leads to a simple but inefficient estimator for β0. The principal goal of this paper is to develop estimators for β0 with substantially greater efficiency.

Two important examples of problems with right-censored outcomes where covariates are missing by design include the classical case-cohort design (Prentice, 1986; Self and Prentice, 1988) and the exposure-stratified case-cohort design (Borgan et al., 2000). Such designs, originally introduced in connection with estimation of the regression parameter in a Cox regression model for the underlying failure times, intend to estimate the effect of certain risk factors on survival that cannot feasibly be measured on the entire cohort, e.g., for economic reasons. The typical form of this design entails collecting full covariate information on all cases (i.e., failures) and also on a subcohort sampled from the full cohort without regard to failure status. Inference for β0 under model (1) for a case-control design is considered in Nan et al. (2006), Nan et al. (2009) and Kong and Cai (2009). In each case, the authors consider an IPW-based extensions of the linear rank estimating equation introduced in Tsiatis (1990). The asymptotic results for solutions to the IPW estimating equations considered in Nan et al. (2009, Ex. (v), p. 2354) also apply to general two-phase sampling designs in which phase one constitutes the assemblance of the cohort (i.e., including baseline covariates available on all cohort members) and phase two involves the collection of new covariate information on a sample of subjects selected according to some known, possibly outcome-dependent independent Bernoulli sampling design in which both failures and non-failures may be sampled. Hereafter, we refer to such independent Bernoulli sampling designs as general two-phase sampling designs.

For estimating β0 in model (1) with missing covariate data, the IPW estimating equation of Nan et al. (2009, Ex. (v), p. 2354) represents the current state of the art. In contrast, the statistical literature contains numerous methodological developments focused on improving the estimation efficiency of analgous estimators when failure times follow a Cox regression model. Much of this work relates directly or indirectly to the semiparametric efficiency theory for missing data problems developed in Robins et al. (1994) and, in particular, the use of augmented versions of IPW estimating equations (e.g., Tsiatis, 2006, Ch. 9 and 10). For example, for a general two-phase sampling design, Nan et al. (2004) obtained the semiparametric efficiency bound for estimating the Cox regression parameter and Nan (2004) obtained a computationally feasible version of the efficient augmented estimating equation in the case where all covariates are discrete. The resulting estimator requires the number of “strata” formed by the number of unique covariate patterns to be small relative to sample size, limiting its applicability. Kulich and Lin (2004) considered methods for improving efficiency for estimating the Cox model regression parameter when covariates are missing under a general two-phase sampling design. The motivation for their estimating equation stems from attempts to improve the efficiency of the observed data estimator of the time-dependent centering term that appears in the usual partial likelihood score equation. However, it is demonstrated there how the resulting estimating equation (i) is asymptotically related to a corresponding restricted class of augmented IPW estimating equations that uses the Cox partial likelihood score as the full data estimating equation; and, (ii) can be chosen such it is asymptotically equivalent to the most efficient augmented estimating equation within this restricted class. In more general problems where monotone missingness in the covariate vector does not necessarily occur by design, Wang and Chen (2001) propose to augment an IPW estimating equation constucted from the partial likelihood score equation. Strongly related subsequent papers that use different methods of augmenting the same IPW estimating equation include Qi et al. (2005), Luo et al. (2009), Xu et al. (2009), and Qi et al. (2010). Whether or not covariate information is missing by design, a feature common to each of the afore-cited works is the dependence of the most efficient choice of augmentation term on the conditional distribution of Wmiss given the observed data. The afore-cited work can largely be differentiated by the techniques used to model or otherwise estimate the various quantities in the augmentation term that depend on this unknown conditional distribution.

The IPW estimating function of Nan et al. (2009, Ex. (v), p. 2354) is general enough to handle estimation in monotone missing covariate problems whether or not missingness occurs by design, giving a consistent estimator for β0 provided that the missingness mechansim is either known or consistently estimated. This paper introduces an appropriate class of augmented IPW (AIPW) estimating equations designed to improve estimation efficiency. The remainder of this paper is organized as follows. Section 2 introduces required notation as well as some important background information. In Section 3, the relevant class of AIPW estimating equations and the asymptotic properties of the corresponding estimators are obtained. Sections 3.13.3 consider the setting in which covariates are missing according to a general two-phase design with known sampling weights; the corresponding semiparametric information bound for estimating β0 in this same setting is also obtained. Section 3.4 extends the theory from Section 3 to the case of estimated sampling weights. These results demonstrate asymptotic efficiency improvements compared to assuming sampling weights are known and, importantly, permit treatment of problems involving monotone missing covariates where the missingness mechansim is unknown and must be modeled using the observed data. Section 3.5 discusses double robustness and the associated challenges in devising estimators that share this property. Section 4 provides methods for calculating the AIPW estimators in practice, including a novel approach to estimating the augmentation term. The methods introduced here generalize substantially, providing a new method of handling missing covariate problems in other censored data problems (e.g., Cox regression). In Section 5, we provide simulation results that show that the proposed AIPW estimators reliably estimate the regression parameters with strong efficiency gains in situations of practical interest. In Section 6, we re-analyze the Wilms tumor data (e.g., Kulich and Lin, 2004) from the perspective of the AFT model. Section 7 closes the paper with a few remarks on extensions. Here, we also demonstrate how the approach to augmentation extends to, and improves upon, methods previously developed for missing covariates under a Cox regression model in Wang and Chen (2001), Qi et al. (2005), Luo et al. (2009), Xu et al. (2009); and Qi et al. (2010). Further details on computational algorithms, simulation and data analysis results, and detailed proofs of the asymptotic properties of the proposed AIPW estimators with both known and estimated weights are provided in a Supplementary Web Appendix.

2. Notation and Background

In model (1), and in the absence of missing covariate data, it is of primary interest to estimate β0 from a cohort sample consisting of n i.i.d. observations (T˜i=min(Ti,Ci),Δi=I(TiCi),Wi), i = 1, …, n, where Ci denotes a censoring time for subject i. The errors ϵ1, …, ϵn are assumed to be i.i.d., and ϵi is assumed to be independent of (Ci, Wi) for every i.

For later use, define the (unobserved) residuals ϵi(β)=logTiWiβ,i=1,,n and let F(·|β), λ(·|β) and Λ(t|β)=tλ(s|β)ds respectively denote the unconditional (i.e., on W1) cumulative distribution, hazard, and cumulative hazard functions of ϵ1(β). Importantly, F (·|β), λ(·|β) and Λ(t|β)=tλ(s|β)ds depend on both the underlying true value β0 as well as β. For β = β0, let F(t|β0)=tf(u)du,λ(t|β0)=λ(t) and Λ(t|β0) = Λ(t) for every t ∈ ℝ.

For each β, define the observed residuals as εi(β)=logT˜iWiβ,i=1,,n; observe that εi(β)=ϵi(β)(logCiWiβ) for every i. For i = 1, …, n and u ∈ ℝ, define the counting process for the residuals as Ni(u|β) = I(εi(β) ≤ u, Δi = 1), the at-risk process as Yi(u|β) = I(εi(β) ≥ u), and the recentered counting process Mi(u|β)=Ni(u|β)uYi(s|β)dΛ(s|β). Define Mi(u) = Mi(u|β0), Ni(u) = Ni(u|β0) and Yi(u) = Yi(u|β0); the process Mi(·|β0) is a martingale under the usual (full covariate) filtration (Tsiatis, 1990).

The setting where W1, …, Wn are fully observed will be referred to as the “full covariate” setting. Let s(j)(u|β)=n1i=1nWijYi(u|β) for j = 0, 1, 2 where υ⊗0= 1, υ⊗1 = υ and υ⊗2 = υυ’ for any vector υ and define ζ^(t,β)=S(1)(t|β)/S(0)(t|β). In the full covariate setting, Tsiatis (1990) introduced the linear-rank estimating function

Ψ^F(β,ρ^)=1ni=1nρ^(u,β)(Wiζ^(u,β))dNi(u|β) (2)

for β0. Here, ζ^(,) and (in many cases) the weight function ρ^(,) are empirical estimates of unknown functions based on the entire dataset. Tsiatis (1990) showed that efficient estimation for β0 is achieved if ρ^(,) in (2) is set equal to Q(t)=λ˙(t)/λ(t), where λ˙() is the first derivative of the unknown hazard function λ(·) for ϵ1 = ϵ1(β0). The problem with estimating Q(·) and achieving efficiency is that the usual non-parametric estimator for Λ(t) is highly non-smooth. Selecting particular weights instead leads to locally efficient estimators. Two popular choices for ρ^(,) in (2) include the Log-Rank weight (i.e., ρ^(t,β)=1) and the Gehan weight (i.e., ρ^(t,β)=S(0)(t|β)), both being defined for all (t, β) ∈ ℝp+1. The Log-Rank weight is efficient when ϵ1 has an extreme value distribution. The Gehan weight is not known to be efficient in any realistic situations, but has the important advantage of producing a monotone estimating equation, a useful feature not associated with the Log-Rank weight and most other weight functions (Fygenson and Ritov, 1994).

Suppose now that any Wi can be partitioned as (Wobs,i, Wmiss,i), where Wobs,i is always available and Wmiss,i may be missing. For i = 1, …, n, let U0i=(Wobs,i,Δi,T˜i) denote the available data on subject i and let Ri be the indicator that Wmiss,i is available on subject i. The observed data for every subject i can then be written as Oi=(T˜i,Δi,Wobs,i,RiWmiss,i) It is assumed that P(R1=1|W1,Δ1,T˜1)=π(U01), that is, covariate information is MAR. In addition, R1, …, Rn are assumed to be mutually independent. In a general two-phase design, {U0i,i=1,,n} corresponds to data available on the entire phase one sample; the phase two sample then consists of the missing covariate information collected on a subsample of failures and censored observations selected from the cohort with sampling probabilities π(U0i),i=1,,n. Here, the MAR assumption is satisfied and the independence assumption on R1, …, Rn is characteristic of using an independent Bernoulli sampling scheme in the second phase (e.g., Breslow and Wellner, 2007). In the classical case-cohort design, Ri=π(U0i)=1 whenever Δi = 1 (i.e., all failures are sampled) but Ri ∈ {0, 1} with P(Ri=1|Wobs,i,Δi,T˜i)=π(U0i)=p(0,1) whenever Δi = 0 (i.e., a fixed fraction of censored observations are subsampled). Sampling all failures is beneficial when failure is comparatively rare (i.e., higher censoring rates). However, this is not required, and benefits can also be realized when events are not especially rare (e.g., Cai and Zeng, 2007; Breslow et al., 2015).

Nan et al. (2009) study “doubly weighted” estimators of β0 under the AFT model (1) with monotone missing covariates. Under a general two-phase design, or equivalently when covariates exhibit monotone missingness and the missing-at-random mechanism is known, Nan et al. (2009, Ex. (v), p. 2354) consider the IPW estimating function

Ψ^HT(β,ρ^;π)=1ni=1nΔiRiπ(U0i)ρ^(εi(β),β;π)(Wiζ^(εi(β),β;π)), (3)

where

ζ^(u,β;π)=S(1)(u|β;π)S(0)(u|β;π),S(j)(u|β;π)=1ni=1nRiπ(U0i)WijYi(u|β),j=0,1,

ρ^(,β;π) is a weight function that may depend on both β and the sampling scheme, and the sampling weights π(U0i),i=1,,n are known functions of the first phase sample. This estimating function does not assume that all failures are necessarily sampled; observe that (3) also reduces to (2) when Ri=π(U0i)=1 for all i = 1, …, n (i.e., the full covariate setting). Large sample results for estimating β0 using (3) are obtained in Nan et al. (2009) and subsequently extended to the setting in which π(U0i) is modeled (i.e., π(U0i)=π(U0i,γ) for some γ to be estimated from the data). This extension carries relevance to two-phase sampling designs where stratified finite population sampling is used in place of independent Bernoulli sampling in the second phase (e.g., Breslow and Wellner, 2007); in addition, it allows consideration of settings in which Wmiss is not necessarily missing by design and sampling weights must be modeled and estimated from the observed data.

3. Augmented estimating functions: derivation, asymptotics, and efficiency

It is well known that the use of an IPW estimating function like (3) results in an inefficient estimator (e.g., Robins et al., 1994; Tsiatis, 2006). The main goal of this paper is to improve the efficiency of estimators for β0 derived from (3) and its extension to the case where the sampling weights π(U0i),i=1,,n are modeled and estimated. This will be accomplished using augmentation (e.g., Tsiatis, 2006, Ch. 9 and 10).

In a general two-phase sampling design, where the failure time Ti follows some appropriate failure time model (e.g., AFT or Cox regression model), the observed covariate information is either Wobs,i or (Wobs,i, Wmiss,i). Assuming (T˜i,Δi,Wi), i = 1, …, n represents the fully observed data, results in Tsiatis (2006, Ch. 10.2) show that all regular and asymptotically linear observed data estimating functions for β can be written

1ni=1n(Riπ(U0i)R(β,T˜i,Δi,Wi)+L2(β,Oi)), (4)

where R(β,T˜1,Δ1,W1),,R(β,T˜n,Δn,Wn) are i.i.d. R(β,T˜,Δ,W) has mean zero and finite variance at β = β0 and L2(β,Oi),i=1,,n each lie in the augmentation space {L2(β,O):E[L2(β,O)|T˜,Δ,W]=0} Equation (4) shows that the derivation of an observed data estimating equation can be viewed as a two-stage process: (i) choose an appropriate full covariate estimating equation; then, (ii) choose an augmentation term. For a given full covariate estimating function R(β,T˜,Δ,W), the most efficient choice of L2(β,Oi) is (e.g., Tsiatis, 2006, Ch. 10.2)

Riπ(U0i)π(U0i)E[R(β,T˜i,Δi,Wi)|U0i]. (5)

In Supplementary Web Appendix S.7, we derive the semiparametric information bound for estimating β0 under a general two-phase sampling design when Ti follows the AFT model (1). Theorem 3.1 considers the special case where all failures are sampled and shows that finding the most efficient observed data estimating equation essentially requires minimizing the variance of (4) within a certain class of full covariate estimating functions that use (5) as the augmentation term. For convenience, the subject-level index is dropped from the theorem statement.

Theorem 3.1. Given W, let (T, C) be an independent copy of (T, C) and define T˜=min(T,C). Consider a general two-phase design where, in phase two, all failures are sampled and non-failures are sampled using a Bernoulli (i.e., independent) design. Define

Du(T˜,W,Δ)=Δu(T˜eWβ0,W)0T˜eWβ0u(t,W)λ˜(t)dt

where λ˜() denotes the hazard function for TeWβ0=eε(β0); here, the subscript u means that the operator D takes in a function u(·) and (T˜,Δ,W) has the same joint distribution as (T˜1,Δ1,W1). The information bound for estimating β0 is given by Iβ01, where Iβ0=E([sβ0eff]2) and the efficient score is given by

sβ0eff=Rπ(U0)Dk(T˜eβ0W,W,Δ)Rπ(U0)π(U0)E[Dk(T˜eβ0W,W,Δ)|T˜,Δ,Wobs].

Here, the function k(·) is the unique solution to

k(T˜eβ0W,W)(Kk(T˜,W)E[Kk(T˜,W)|T˜eβ0W,Δ=1])=Q˜(T˜eβ0W)(WE[W|T˜eβ0W,Δ=1]), (6)

where Q˜(u)=1+u(dlog[λ˜(u)/du), and Kk(T˜,W)=E[bk(T˜,Δ,W)|T˜>T˜,W] with

bk(T˜,Δ,W)=1π(U0)π(U0)(Dk(T˜eβ0W,W,Δ)E[Dk(T˜eβ0W,W,Δ)|Δ,Wobs]).

Equation (6) shows that the computation of the efficient estimator requires solving a complicated integral equation. The solution to (6) depends on the form of the most efficient full covariate estimating function but the resulting efficient observed data estimating function is usually not equivalent to that obtained by (i) choosing the most efficient full covariate estimating function; and, (ii) calculating (5) for this particular choice. The challenges are only compounded when failures as well as non-failures are sampled; see Theorem S.7.8 in the Supplementary Web Appendix for the efficient observed data score (i.e., with known sampling weights) in this case.

The most efficient observed data estimating equation is typically intractable in missing data problems (e.g., Yu and Nan, 2006; Tsiatis, 2006, Ch. 10); see Nan et al. (2004, Prop. 3.3, Cor. 3.1) and also Nan (2004, Eqn. 4) for developments specific to the Cox regression model, respectively for covariates missing at random and for case cohort designs. A challenge specific to (1) is the presence of the difficult-to-estimate efficient weight function Q˜() that appears in (6). In particular, efficient estimation in the observed data problem is generally harder than it is in the full data setting, increasing the challenges that one already faces in constructing efficient estimators for the regression parameter when covariates are missing at random (e.g., Nan et al., 2004, Section 5).

The characterization of the most efficient observed data estimating equation is nevertheless useful because it suggests a roadmap for improving the efficiency of simple weighted estimators like (3) using augmentation. The developments described above, and in particular the derivation of (5) as the most efficient choice of augmentation term, rely on (4) being a sum of i.i.d. terms. However, under the AFT model (1) and with a finite sample, the estimating function (3) cannot be written in the form (4) with L2(β,Oi)=0,i=1,,n. The desired i.i.d. representation fails in this instance because each term in (3) depends on ζ^(,) and ρ^(,), functions that are estimated using the entire dataset. In order to derive the appropriate form for an AIPW estimating equation, we will therefore proceed as follows: (i) derive an asymptotically equivalent i.i.d. representation for the full covariate estimating equation; and, (ii) calculate the optimal augmentation term in (5) for this equivalent i.i.d. representation. This approach will give an “idealized” AIPW linear rank estimating equation (Section 3.1); substituting appropriate estimators for all unknowns in a way that preserves asymptotic equivalence of the corresponding estimator sequences yields the AIPW linear rank estimator (Section 3.2). The asymptotic properties of the AIPW estimating equation and its solution are established in Section 3.3.

The results in Sections 3.13.3 below are derived assuming that the sampling probabilities π(U0i),i=1,,n are known (i.e., given U0i,i=1,,n and apply to general forms of the case-cohort design. In Section 3.4, these results are extended to the case where π(U0i),i=1,,n are modeled and estimated; this extension expands the applicability to more general missing covariate problems (i.e., assuming a monotone missingness mechansim).

3.1. Augmenting the idealized estimating function ΨHT (β, ρ0; π)

For each fixed t and β, let ζ0(t, β) = E[W1Y1(t|β)]/E[Y1(t|β)] denote the almost-sure limit of ζ^(t,β;π) and, assuming it exists, let ρ0(t, β) be the almost-sure limit of ρ^(t,β;π). Then, under mild regularity conditions, the results of Tsiatis (1990) imply that (2) is asymptotically equivalent to the full covariate estimating function n1i=1nρ0(u,β)(Wiζ0(u,β))dMi(u|β), where asymptotic equivalence means that the normalized sequence of solutions respectively obtained using (2) and the above have the same asymptotic distribution. As this estimating function is a sum of i.i.d. terms and is mean zero for β = β0, one can set

R(β,T˜i,Δi,Wi)=ρ0(u,β)(Wiζ0(u,β))dMi(u|β) (7)

in the class (4). Selecting L2(β,Oi)=0,i=1,,n gives the mean zero IPW estimating function

ΨHT(β,ρ0;π)=1ni=1nRiπ(U0i)ρ0(u,β)(Wiζ0(u,β))dMi(u|β); (8)

it is shown later how to obtain (3) from (8). The lack of predictability of the sampling weights Ri/π(U0i),i=1,,n (e.g., Nan et al., 2009) means that the individual summands appearing in (8) are no longer martingales and hence that martingale theory cannot be applied in establishing its asymptotic properties. However, under the assumptions of this paper, all integrals here and elsewhere in this paper continue to be well-defined as (pathwise) Lebesgue-Stieltjes integrals; see Section S.6 of the Supplementary Web Appendix for additional discussion.

Setting L2(β,Oi) equal to (5), calculated using (7), leads to the most efficient choice of augmentation term for (8) that is possible using the weight function ρ0(·, β). Using the definition of Mi(·|β) given earlier and the fact that ΔiU0i, we thus propose to augment (8) with the term

1ni=1nRiπ(U0i)π(U0i)ΔiE[ρ0(εi(β),β)(Wiζ0(εi(β),β))|U0i]+ρ0(u,β)H¯(u|β;π)dΛ(u|β) (9)

for H¯(u|β;π)=(S¯(1)(u|β;π)S¯(1)(u|β))ζ0(u,β)(S¯(0)(u|β;π)S¯(0)(u|β)), where S¯(j)(u|β;π)=n1i=1nRiπ(U0i)E[WijYi(u|β)|U0i],j=0,1, and S¯(j)(u|β)=n1i=1nE[WijYi(u|β)|U0i],j=0,1 The efficient idealized AIPW estimating function is now given by the sum of (8) and (9). In designs where all failures are sampled (i.e., Ri=π(U0i)=1 whenever Δi = 1; e.g., a classical case-cohort design), the first term of (9) vanishes because (Riπ(U0i))Δi=0,i=1,,n.

In deriving the efficient idealized AIPW estimating equation, both E[WijYi(u|β)|U0i] and E[ρ0(εi(β),β)(Wiς0(εi(β),β)|U0i)] are assumed to be a function of u, β and U0i only. In practice, these quantities typically need to be modeled. Letting the (possibly infinite dimensional) parameter α index the missing data model Wmiss,i|U0i, the models induced for E[WijYi(u|β)|U0i] and E[ρ0(εi(β),β)(Wiς0(εi(β),β))|U0i] can respectively be denoted by

Ei(j)[u,β,α]=E[WijYi(u|β)|U0i,α],i=1,,n,andj=0,1 (10)

and

Ei[ρ0,ζ0,β,α]=E[ρ0(εi(β),β)(Wiζ0(εi(β),β))|U0i,α]. (11)

Define the corresponding model-dependent versions of S¯(j)(u|β;π) and S¯(j)(u|β) as

S¯(j)(u|β,α;π)=1ni=1nRiπ(U0i)Ei(j)[u,β,α]andS¯(j)(u|β,α)=1ni=1nEi(j)[u,β,α],j=0,1;

similarly, let H¯(u|β,α;π) be defined in the same way as H¯(u|β;π), with S¯(j)(u|β;π) and S¯(j)(u|β) replaced by S¯(j)(u|β,α;π) and S¯(j)(u|β,α), respectively. Then, the modeled version of the efficient idealized augmentation term (9) may be rewritten

ρ0(u,β)H¯(u|β,α;π)dΛ(u|β)1ni=1nRiπ(U0i)π(U0i)ΔiEi[ρ0,ζ0,β,α] (12)

and the model-based efficient AIPW estimating equation for β0 is derived from the sum of (8) and (12). Section 4 proposes one novel approach to this modeling problem.

3.2. Augmentation in practice: improving on Ψ^HT(β,ρ^;π)

Straightforward algebra shows that (3) can be obtained from the idealized estimating function (8) upon (i) substituting the observed data estimators ρ^(,β;π) and ζ^(,β;π) in for ρ0(,β) and ζ0(,β); and, (ii) substituting in the weighted cumulative hazard estimator

Λ^(s|β;π)=1ni=1nRiπ(U0i)sdNi(u|β)S(0)(u|β;π)

in for Λ(|β). We propose to derive the corresponding AIPW estimator following a similar plug-in principle: substitute the observed data quantities ρ^(,β;π), ζ^(,β;π), and Λ^(|β;π) in for ρ0(,), ζ0(,) and Λ(|β) in (12) Using the definition of Λ^(u|β;π), one may rewrite the resulting plug-in estimator for (12) as

1ni=1nΔiRiπ(U0i)ρ^(εi(β),β;π)H¯(εi(β)|β,α;π)S(0)(εi(β)|β;π)1ni=1nRiπ(U0i)π(U0i)ΔiEi[ρ^,ζ^,β,α], (13)

where H¯(u|β,α;π)=(S¯(1)(u|β,α;π)S¯(1)(u|β,α))ζ^(u,β;π)(S¯(0)(u|β,α;π)S¯(0)(u|β,α)) depends on Ei(j)[u,β,α],i=1,,n (i.e., the model selected for E[WjY(u|β)|U0],j=0,1 and Ei[ρ^,ζ^,β,α] is (11) with the unknown quantities ρ0(,) and ζ0(,) replaced by their corresponding estimators. For further discussion on estimation of these conditional expectations, see Section 4

Using (3) with (13) as the augmentation term now gives the AIPW estimating function

Ψ^ALR(β,ρ^,α^;π)=Ψ^HT(β,ρ^;π)+1ni=1n(ΔiRiπ(U0i)ρ^(εi(β),β;π)H¯(εi(β)|β,α^;π)S(0)(εi(β)|β;π)Riπ(U0i)π(U0i)ΔiEi[ρ^,ζ^,β,α^]), (14)

where α^ is any suitable estimator for α. The AIPW estimator β^ALR is defined as any “solution” to (14) that satisfies Ψ^ALR(β^ALR,ρ^,α^;π)=op(n1/2). Section 3.3 establishes the large sample properties of β^ALR, the result demonstrating that the use of plug-in estimators does not alter the asymptotic distribution of the estimator that would be obtained using the idealized estimating equation of the previous section.

3.3. Asymptotic results: known sampling weights

Nan et al. (2009) used empirical process theory to derive the asymptotic properties of estimators for β derived from (3). We first extend that theory to develop the asymptotic properties of the estimators obtained from (14) assuming that π(U0i) is a known function of U0i for i = 1,…,n. Adapting notation in van der Vaart and Wellner (1996), let ng=n1i=1ng(Xi) and Gn(g)=n1/2(nP)(g) for some function g of a random variable X. Also, let be the supremum norm.

Theorem 3.2, given below, establishes the consistency and asymptotic normality of the augmented linear-rank estimator for both the Gehan and the Log-Rank weight functions. Each choice ensures ρ^(t,β;π) converges uniformly in (t, β) to a deterministic function ρ0(t, β) almost surely (e.g., Nan et al., 2009, Thm. 3.2) under mild regularity conditions. In particular, when ρ^(t,β;π)=1 for all β and t, we have ρ0(t, β) = 1; when ρ^(t,β;π)=S(0)(t|β;π) (i.e., the Gehan weight function), ρ0(t,β)=E[Y1(t|β)]. The statement of this theorem imposes the same regularity conditions as in Nan et al. (2009) in order to ensure the desired behavior of the IPW estimator (see (C.1) - (C.5) in Section S.6 of the Supplementary Web Appendix), along with additional conditions required to manage the inclusion of the augmentation term (see (C.6) - (C.9) in Section S.6 of the Supplementary Web Appendix). Unless otherwise stated, all integrals and the supremum over the residual time t are with respect to the interval ]−∞, τ1], where τ1 is given in (C.4) (Section S.6, Supplementary Web Appendix).

Theorem 3.2. Let ρ^(t,β;π) be either the Gehan or the Log-Rank weight function. Let Ψ(β,ρ0)=E[Δρ0(ε(β),β)(Wζ0(ε(β),β))]. Suppose the parameter space for β, denoted by Θ0, is compact and that β0 is the only solution to Ψ(β, ρ0) in Θ0. With probability one, suppose π(U01)π*>0. Then, under (C.1) - (C.9) given in Supplementary Web Appendix S.6 the AIPW linear-rank estimator β^ALR of β0 has the following properties:

  1. nβ^ALRβ0=Op(1)(i.e.,nconsistent);

  2. n(β^ALRβ0)=Ψ˙β(β0,ρ0)1Gn(Z0(R,T˜,Δ,W,ϵ,β0))+op(1) where (R,T˜,Δ,W,ϵ) has the same joint distribution as (R1,T˜1,Δ1,W1,ϵ1),
    Z0(R,T˜,Δ,W,ϵ,β0)=Rπ(U0)Δρ0(ε(β0),β0)(Wζ0(ε(β0),β0))Rπ(U0)ρ0(u,β0)(Wζ0(u,β0))Y(u)dΛ(u)Rπ(U0)π(U0)ΔE[ρ0,β0,α*]+Rπ(U0)π(U0)ρ0(u,β0)(E(1)[u,β0,α*]ζ0(u,β0)E(0)[u,β0,α*])dΛ(u),
    Ψ˙β(β0,ρ0) (see (C.5)) is the derivative of Ψ(β,ρ0) with respect to β evaluated at β0, and α* is such that nα^α*=OP(1) (see (C.8)).

Defining ϒ(β0)=Var(Z0(R1,T˜1,Δ1,W1,ϵ1,β0)), the asymptotic variance of n(β^ALRβ0) is Ψ˙β(β0,ρ0)1ϒ(β0)Ψ˙β(β0,ρ0)1. A detailed proof of Theorem 3.2 can be found in Section S.6 of the Supplementary Web Appendix. Let Ω(β0)=ρ0(u,β0)(W1ζ0(u,β0))dM1(u); then, calculations in Supplementary Web Appendix (Section S.1) further show that ϒ(β0)=ϒ1(β0)+ϒ2(β0,μ1*(β0,ρ0)), where ϒ1(β0)=Var(Ω(β0)),

ϒ2(β0,μ1*(β0,ρ0))=E[1π(U01)π(U01)(Ω(β0)μ1*(β0,ρ0))2], (15)

and μ1*(β0,ρ0)=Δ1E1[ρ0,β0,α*]ρ0(u,β0)(E1(1)[u,β0,α*]ζ0(u,β0)E1(0)[u,β0,α*])dΛ(u). This additive decomposition shows that the asymptotic variance consists of two terms, the first term ϒ1(β0) being the asymptotic variance of the full covariate estimator (i.e., where full covariate information is available on all subjects) and the second term ϒ2(β0) capturing the increased variance due to the missing covariate information. This decomposition shows that the choice of augmentation term affects efficiency only through the second term.

Theorem 3.2 does not assume that the model for Wmiss,i|U0i is correctly specified. When this model is correctly specified, calculations in Supplementary Web Appendix S.1 show that ϒ(β0)=ϒ1(β0)+ϒ2(β0,E[Ω(β0)|u01]). It follows that the augmentation term (13) is the most efficient choice (i.e., for a particular choice of weight function ρ and known sampling weights) when the full covariate estimating equation is derived from (2); see, for example, Tsiatis (2006, Ch. 7.4). Under an incorrectly specified model for Wmiss,i|U0i, the estimator β^ALR remains consistent and asymptotically normal but may have inflated variance. However, as long as the model for the conditional expectation is a reasonable approximation, the above results suggest that one can still expect the AIPW estimator to be more efficient than the simple IPW estimator.

3.4. Extension of methods and results to estimated sampling weights

Extending the theory developed to the case where the sampling weights π(U0i) are estimated is of interest and importance for two main reasons: (i) when π(U0i) is in fact known, estimating π(U0i) can improve asymptotic efficiency (e.g. Breslow and Wellner, 2007; Robins et al., 1994); and, (ii) when π(U0i) is not known, modeling and estimating this sampling weight allows the proposed methodology to be applied in settings where missingness occurs at random. Nan et al. (2009, Sec. 3.2) propose an IPW estimator for estimating β0 in this setting and derive its asymptotic properties.

As in Nan et al. (2009), let π(U0i) be modeled using a finite dimensional parametric model π(U0i,γ),γA0 with a true value γ0. In a general two-phase sampling design, it can be assumed that π(U0i)=π(U0i,γ0). The IPW estimating function with estimated sampling weights is

Ψ^HT(β,ρ^,γ^;π)=1ni=1nΔiRiπ(U0i,γ^)ρ^(εi(β),β,γ^;π)(Wiζ^(εi(β),β,γ^;π))

where ρ^(u,β,γ;π) is ρ^(u,β;π) with π(U0i) replaced by π(U0i,γ) for each i and

ζ^(u,β,γ;π)=S(1)(u|β,γ;π)S(0)(u|β,γ;π)withS(j)(u|β,γ;π)=1ni=1nRiπ(U0i,γ)WijYi(u|β),j=0,1.

Similarly to before, define the cumulative hazard estimator

Λ^(s|β,γ;π)=1ni=1nRiπ(U0i,γ)sdNi(u|β)S(0)(u|β,γ;π)

Let H¯(u|β,α,γ;π)=(S¯(1)(u|β,α,γ;π)S¯(1)(u|β,α))ζ^(u,β,γ;π)(S¯(0)(u|β,α,γ;π)S¯(0)(u|β,α)), where we have defined S¯(j)(u|β,α,γ;π)=n1i=1nRiπ(U0i,γ)Ei(j)[u,β,α],j=0,1.

Suppose γ^ estimates γ0 and let β^HT,γ^ be any “solution” that satisfies Ψ^HT(β^HT,γ^,ρ^,γ^;π)=oP(n1/2). Replacing the known sampling weights π(U0i)=π(U0i,γ0) in (13) with π(U0i,γ^), we obtain χ¯(β,ρ^,α^,γ^;π)=n1i=1nχi(β,ρ^,α^,γ^;π) as the estimated augmentation term, where

χi(β,ρ^,α^,γ^;π)=ΔiRiπ(U0i,γ^)ρ^(εi(β),β,γ^;π)H¯(εi(β)|β,α^,γ^;π)S(0)(εi(β)|β,γ^;π)ΔiRiπ(U0i,γ^)π(U0i,γ^)Ei[ρ^,ζ^,β,α^].

Define the AIPW estimating function with estimated sampling weights as

Ψ^ALR(β,ρ^,α^,γ^;π)=Ψ^HT(β,ρ^,γ^;π)+χ¯(β,ρ^,α^,γ^;π). (16)

Now define β^ALR,γ^ as any “solution” that satisfies Ψ^ALR(β^ALR,γ^,ρ^,α^,γ^;π)=oP(n1/2). As in Nan et al. (2009) we will focus on the Gehan weight when deriving the asymptotic properties of β^ALR,γ^; the following theorem gives the asymptotic properties of β^ALR,γ^. With the exception of Conditions (C.6)-(C.9), the conditions imposed are the same as in Nan et al. (2009, Sec. 3.2).

Theorem 3.3. With probability one, suppose π(U0,γ) is uniformly bounded away from zero for γA0, where A0 is compact, and that π(U0,γ) is twice differentiable with respect to γA0 with continuous and bounded derivatives. Suppose that γ^ is an asymptotically efficient and n consistent estimator of γ0 with a bounded influence function at γ0. Then, under the same conditions as Theorem 3.2, the estimator β^ALR,γ^ calculated using the Gehan weight has the following properties:

  1. nβ^ALR,γ^β0=Op(1)(i.e.,nconsistent);

  2. n(β^ALR,γ^β0) is asymptotically normal with variance Γ0Ψ˙β(β0,ρ0)1BV0BΨ˙β(β0,ρ0)1, where Γ0 is the asymptotic variance of n(β^ALRβ0) given in Theorem 3.2 and V0 is the asymptotic variance of n(γ^γ0). Here,
    B=E[ρ0(ε(β0),β0)A2(ε(β0),β0)Δ]E[ρ0(ε(β0),β0)(Wζ0(ε(β0),β0))(π˙(U0,γ0))Δ]
    with π˙(U0,γ0) being the derivative of π(U0,γ) with respect to γ evaluated at γ = γ0, and
    A2(t,β0)=E[I(ε(β0)t)W(π˙(U0,γ0))]ζ0(ε(β0),β0)E[π˙(U0,γ0))I(ε(β0)t)]ρ0(ε(β0),β0).

The proof of Theorem 3.3 is given in Supplementary Web Appendix S.6.2. Similarly to Nan et al. (2009), the proof can be adjusted in a straightforward manner to handle the case of the Log-Rank weight, with the only change in the theorem statement being in the form of A2(t, β0). The asymptotic variance of β^ALR,γ^ depends on the augmentation term only through Γ0, the asymptotic variance of β^ALR. The above result shows that efficiency improvements can be expected for both β^ALR and β^HT when sampling weights are estimated using the observed data. The results in Section 3.3 further imply that if models used in calculating χ¯(β,ρ^,α^,γ^;π) are correctly specified then it will be the asymptotically efficient choice (i.e., for the given choice of full covariate estimating equation).

3.5. Augmentation and double robustness

For a general two-phase design, and more generally in settings where covariates are subject to monotone missingness under a known mechanism, Theorem 3.2 shows that the AIPW estimator β^ALR is consistent even if the missing covariate model used to derive the augmentation term is misspecified. In particular, one can expect consistency, and ideally improved efficiency. Provided that the model for the sampling weights is not only correctly specified but efficiently estimated, Theorem 3.3 demonstrates that β^ALR,γ^ is asymptotically more efficient than β^ALR.

In settings where missingness mechanism is unknown (e.g., not by design) and the unknown sampling weights need to be modeled and estimated, the AIPW estimating function may be considered doubly robust if it consistently estimates zero when at least one of the model specifications for π(U0i,γ) and Wmiss,i|U0i (but not necessarily both) is correctly specified (e.g., Scharfstein et al., 1999). It is shown in Section S.2 of the Supplementary Web Appendix that the idealized AIPW estimating function given as the sum of (8) and (9) is doubly robust in the indicated sense. Importantly, these calculations assume (7) has mean zero at β = β0; this assumption provided that model (1), hence ζ0(,β) and Λ(|β), are each correctly specified.

In practice, each of the functions ρ0(,β), ζ0(,β), Λ(|β), E[W1jY1(u|β)|U01,α], j = 0, 1 and E[ρ0(ε1(β),β)(W1ζ0(ε1(β),β))|U01,α] must be modeled and/or estimated. The results of Section 3.4 show that β^ALR,γ^ is consistent for β0 provided that the model for the sampling weights is correctly specified (i.e., at γ=γ0,π(U0i,γ0)=π(U0i) for every i). Under this condition, the estimators used for ζ0(,β) and Λ(|β) in constructing Ψ^ALR(β,ρ^,α^,γ^;π) are consistent. These results do not additionally require the models used for E[W1jY1(u|β)|U01], j = 0, 1 and E[ρ0(ε1(β),β)(W1ζ0(ε1(β),β))|U01] to be correctly specified; in addition, beyond general convergence conditions, neither ρ0(,β0) nor how it is estimated impacts consistency. For similar reasons, β^ALR,γ^ is not expected to be consistent when π(U0i,γ) is misspecified. Consistency and double robustness fail here mainly because the estimators used for ζ0(,β) and Λ(|β) rely on the IPW estimators S(j)(β,γ^;π), j = 0, 1 and these typically will not be consistent for E[W1jY1(u|β)], j = 0, 1 when π(U0i,γ) is misspecified.

The AIPW estimating function (16) is derived as a plug-in estimator of the sum of (8) and (9); the form obtained in Section 3.4 arises as a result of using the IPW estimators S(j)(β,γ;π), j = 0, 1 in constructing estimators for ζ0(,β) and Λ(|β). In view of the arguments noted above, preservation of the double robustness property requires using consistent estimators for ζ0(,β) and Λ(|β). In Supplementary Web Appendix S.2, we give conditions on the various plug-in estimators that must hold to ensure doubly robustness, though further regularity conditions are needed in order to make these statements fully rigorous. The required conditions obviously depend on the plug-in estimators used, each of which relates the behavior of observed data estimators to quantities whose calculation requires knowledge of the true relationship between Wmiss and U0. Because the use of the IPW estimators S(j)(β,γ;π), j = 0, 1 is enough to ensure consistency in Conditions 2 and 3 when γ^ is consistent for γ0 (hence π(U0i,γ) is correctly specified), replacing these IPW estimators by suitable AIPW estimators should be suffcient to achieve double robustness. In particular, double robust estimation is expected to be feasible in cases where W is of low dimension or exhibits a certain structure so that nonparametric estimation is feasible. Examples of approaches developed for settings where (1) is replaced by a Cox regression model include Nan (2004), Qi et al. (2005) and Luo et al. (2009); examples that enforce more parametric modeling assumptions include Wang and Chen (2001) and Xu et al. (2009).

4. Calculating β^ALR and β^ALR,γ^ in practice

As in the case where covariates are fully observed, the IPW estimating function (3) is neither differentiable nor necessarily monotone. Consequently, many of the issues surrounding estimation in the full covariate setting also arise when using (3) and, by extension, to the augmented estimating functions (14) and (16). For example, the discontinuity and lack of monotonicity of these estimating functions make standard optimization procedures hard to use. Fygenson and Ritov (1994) showed that the full covariate estimating function is monotone when the weight function is the Gehan weight. In this same setting, Johnson and Strawderman (2009) establish the properties of estimators obtained from a smoothing procedure designed to to deal with the discontinuity of (2). Jin et al. (2003) proposed calculating the full covariate linear-rank estimator for a general weight function by solving a sequence of Gehan-type monotone estimating equations. In Supplementary Web Appendix S.3 we show how to extend these iterative procedures to AIPW estimating equations by introducing a related sequence of continuous and monotone estimating equations.

Calculation of the AIPW estimator further relies on modeling and estimating the conditional expectations E[WijYi(u|β)|U0i],j=0,1 and E[ρ0(εi(β),β)(Wiζ0(εi(β),β))|U0i]; see (10) and (11). The following theorem gives a general analytic expression that can be used to calculate conditional expectations of this form. For simplicity of presentation, the conditional distribution [Wmiss|Wobs] will be assumed to have a density (i.e., Wmiss,i is conditionally continuous); the case where Wmiss,i is discrete is handled in a nearly identical fashion.

Theorem 4.1. Let (T˜,Δ,W) be a generic observation. Define L¯(s|W)=P(T>s|W) and G¯(s|W)=P(C>s|W)fors0. Let l(|W) and g(|W) be the corresponding conditional density functions, each of which is assumed to be continuous and bounded for s ≥ 0. Define W [w] = (Wobs, Wmiss = w) and let k2(w|Wobs) denote the conditional density function of Wmiss given Wobs. Then, for a function η(W,Δ,T˜) such that E[η(W,Δ,T˜)|T˜,Δ,Wobs] is well defined, we have

E[η(W,Δ,T˜)|T˜,Δ,Wobs]=Δη(W[w],Δ,T˜)G¯(T˜|W[w])l(T˜|W[w])k2(w|Wobs)dwG¯(T˜|W[w])l(T˜|W[w])k2(w|Wobs)dw+(1Δ)η(W[w],Δ,T˜)L¯(T˜|W[w])g(T˜|W[w])k2(w|Wobs)dwL¯(T˜|W[w])g(T˜|W[w])k2(w|Wobs)dw. (17)

A proof of Theorem 4.1 is given in Supplementary Web Appendix S.3.2. Equation (17) shows that estimating E[η(W,Δ,T˜)|T˜,Δ,Wobs] requires (i) modeling the conditional distributions [T|W], [C|W] and [Wmiss|Wobs]; and, (ii) calculating or approximating the various integrals involved. A useful feature of the proposed methodology is that the specification of the missing covariate model is confined to [Wmiss|Wobs]. The dependence of (17) on g(|W[w]) vanishes if g(s|W[w])=g(s|Wobs) for every s and w; see Rathouz (2007) and references there within for a discussion of the implications of such an assumption in general missing data problems.

The expectation (17) depends on the specification of k2(·|Wobs), ℓ(·|W), and g(·|W). Suppose momentarily that these densities are all available; then, (17) can be calculated or approximated using an appropriate method of integration. In particular, one can use a Monte Carlo approximation in cases where a procedure is available to simulate observations of Wmiss from the conditional density k2(·|Wobs). Let Wmiss,i(1),,Wmiss,i(M) be simulated values from [Wmiss|Wobs = Wobs,i] for a given M. Define Wi(m)=Wobs,i,Wmiss,i(m) and and let ηi(m)=η(Wi(m),Δi,T˜i). Then, E[η(Wi,Δi,T˜i)|T˜i,Δi,Wobs,i] in (17) can be approximated for each i = 1, …, n using

m=1MΔiηi(m)l(T˜i|Wi(m))G¯(T˜i|Wi(m))m=1Ml(T˜i|Wi(m))G¯(T˜i|Wi(m))+m=1M(1Δi)ηi(m)L¯(T˜i|Wi(m))g(T˜i|Wi(m))m=1ML¯(T˜i|Wi(m))g(T˜i|Wi(m)). (18)

The modeled expectations (10) and (11) that are needed for computing β^ALR and β^ALR,γ^ each take the form (18), the (potentially) infinite dimensional parameter α indexing the various models used for [T|W], [C|W], and [Wmiss|Wobs] (i.e., the distributions (·|·), g(·|·). and k2(·|·)). Hence, the above provides a simple methodology for calculating (10) and (11). In particular, the expectation E[WijYi(u|β)|U0i,α] in (10) can be estimated using (18) with ηi(m)=[Wi(m)]jYi(m)(u|β),j=0,1. Similarly, E[ρ0(εi(β),β)(Wiζ0(εi(β),β))|U0i,α] in (11) can be estimated using (18) with ηi(m)=ρ^(εi(m)(β),β)(Wi(m)ζ^(εi(m)(β),β)), where ρ^(,β) and ζ^(,β) are estimates of ρ0(,β) and ζ0(,β) derived from the observed data.

Under a general two-phase sampling design, the asymptotic developments in Section 3.3 are robust to misspecification of these models, creating significant flexibility. The model (1) implies a particular functional form for (·|W), hence L¯(|W); in place of a semiparametric model or kernel-based estimation procedure, one can use a flexible parametric AFT model for the purposes of estimating the augmentation term that depends on a finite-dimensional parameter α1. Since a functional form for the density g(·|W) is not assumed anywhere in our developments, a flexible parametric model indexed by a second finite dimensional parameter α2 can also be used here. The r package flexsurv (Jackson, 2014) provides significantly flexibility in this regard and is used to model both densities in the simulation study and data analysis of Sections 5 and 6, respectively, with α1 and α2 respectively being estimated using maximum (IPW) pseudolikelihood. The literature on modeling missing data is enormous and the choice of model for [Wmiss|Wobs] (indexed, say, by α3) can also depend on the nature of Wmiss. Sections 5 and 6 discuss several examples, the estimation of model parameters again incorporating sampling weights. As noted in Section 3.5, the use of IPW estimators necessitates correct specification of the sampling weights in order to achieve consistency.

5. Simulations

Simulations were carried out to evaluate the finite sample relative efficiency of the various AIPW and IPW estimators. As in Nan et al. (2009), we focus on a case-cohort design where the second phase of sampling includes all failures and stratified independent Bernoulli sampling of non-failures. In Section S.4.3 of the Supplementary Web Appendix, we provide additional simulation results for which the missingness mechanism is not known and must be be modeled.

Failure times are simulated from the model log T = β1Wobs + β2Wmiss + ϵ where Wobs and Wmiss are standard normally distributed with correlation 0.5 and the true regression coefficients are (β1, β2) = (1, 1). We use the six error distributions considered in Zeng and Lin (2007): standard normal, a light tailed mixture of normal 0.95N(0,1)+0.05N(0,9), a heavier tailed mixture of normal 0.5N(0,1)+0.5N(0,9), a standard extreme value distribution, and the Weibull(0.5,1) and Weibull(2,1) distributions. Here, the specification of the Weibull distributions corresponds to eϵ; all others refer to the distribution of ϵ. The censoring distribution is exponential with mean chosen to achieve 70% censoring rate. Further truncation is not used; the simulation results nevertheless demonstrate little bias, suggesting that one may let τ1 → ∞ in Theorem 3.2 (e.g., see also Ying, 1993). Results for 30% censoring rate are given in Supplementary Web Appendix S.4.1.

As noted above, Wmiss is collected on all failures but only on a subsample of the non-failures. Stratified independent Bernoulli sampling is used to sample Wmiss from the non-failures within three strata: {Wobs < a1}, {a1Wobsa2}, and {Wobs > a2}, where a1 and a2 are the 25th and 75th sample quantiles of Wobs. Controls (i.e., censored subjects) are selected in the second phase sample with probability 0.4 for the two end strata and 0.2 for the middle stratum, resulting in approximately 30% of controls having phase two covariate information. Using this sampling scheme the overall sampling rate is approximately 51%.

The augmentation term in Section 4 requires specifying a model for the failure time distribution, the censoring distribution and a method for simulating from [Wmiss|Wobs]. The failure and censoring time distributions are each modeled using a parametric AFT model that assumes eϵ follows a generalized F distribution; pseudolikelihood estimators that incorporate sampling weights in a manner similar to Breslow and Wellner (2007) are fit using the flexsurv package. The parametric models are all correctly specified with the exception of the mixture of normals distributions for the failure time distribution. The expectations (18) are computed using Monte Carlo simulation as described in the previous section. Observations from the required conditional distribution [Wmiss|Wobs] were simulated in one of two different ways: (i) simulating from a conditional normal distribution where the parameters involving Wmiss in the conditional normal distribution are estimated using IPW estimators; and, (ii) simulating from a marginal normal distribution assuming that Wmiss|Wobs~N(μ^,Σ^), where μ^ and Σ^ are respectively IPW estimators for the marginal mean and the marginal variance of Wmiss. The first model is a correct model for the conditional distribution while the second model represents a setting where the conditional distribution is misspecified. For both models, M = 50 observations are simulated; increasing the number of simulations to M = 200 showed similar efficiency gains. In the simulations presented here, the estimator α^ consists of three finite dimensional estimators, one for each parametric AFT model and one for the parameters needed to estimate the conditional distribution [Wmiss|Wobs].

In a stratified case-cohort study, the maximum likelihood estimator of the within-strata sampling probabilities is the proportion of subjects in each strata selected into the subsample. The estimators to be considered in this study include β^HT and β^ALR, each using the known sampling weights. In addition, we include β^HT,γ^, the solution to Ψ^HT(β,ρ^,γ^;π), and β^ALR,γ^, the solution to Ψ^ALR(β,ρ^,α^,γ^;π), where γ^ denotes the parameter indexing a correctly specified parametric model for π(U0,i,γ) that is estimated by maximum likelihood.

For each of the 12 simulation settings described, we ran 500 simulations with a sample size of 1000. For each setting, the same dataset was used to obtain β^HT, β^HT,γ^, β^ALR, and β^ALR,γ^. The weight functions ρ^(,;π) used were the Log-Rank weight and the Gehan weight. Inverse numerical differentiation (Huang, 2002) was used to calculate the variance of the AIPW estimators. All estimators for the regression coefficients are observed to be approximately unbiased. For the Gehan weight all variance estimators are approximately unbiased and achieve the desired 95% coverage; similar results hold for the Log-Rank weight, save the occasional underestimation of the empirical variance when the error distribution follows either a Weibull(0.5,1) or an extreme value distribution. Figure 1 shows several plots showing the relative efficiency of β^ALR, β^HT,γ^, and β^ALR,γ^ versus β^HT. Here, relative efficiency is defined as the ratio of the empirical variances of β^HT versus the empirical variance of the estimator to which it is being compared, with values exceeding 1.0 denoting reduced efficiency for β^HT. Tables with average bias, empirical and average estimated standard errors and coverage of 95% confidence intervals for β^ALR are provided in the Supplementary Web Appendix (Section S.4.2). Tables for β^ALR,γ^ (not shown) are very similar.

Figure 1:

Figure 1:

Relative efficiency of β^ALR, β^HT,γ^, and β^ALR,γ^ compared to β^HT for both the Gehan and the Log-Rank weight in the case of a 70% censoring rate; values exceeding 1.0 imply β^HT is less efficient. The first row shows the result when using the correct model for the conditional expectations and the second row corresponds to the misspecified model. The first column shows the efficiency gains for the phase one coefficient β1 and the second column shows the efficiency gains for the phase two coefficient β2. In the legend Aug denotes an augmented estimator, EW denotes that estimated sampling probabilities were used, and Gehan and LR denotes the Gehan and Log-Rank weight. The labels on the x-axis represent the six different distributions used: SN = N(0,1), MN = 0.95 N(0,1) + 0.05 N(0,9), HN = 0.5 N(0,1) + 0.5 N(0,9), W1 = Weibull(2,1), W2 = Weibull(0.5,1), EV = Extreme Value.

All four plots in Figure 1 show relative efficiency for the six different error distributions for both the Gehan and the Log-Rank weight. The upper left plot shows the results for the phase one covariate (i.e., for estimating β1) when [Wmiss|Wobs] is correctly specified and the bottom left plot when [Wmiss|Wobs] is misspecified. Similarly, the upper right and lower right plots show the relative efficiency for the phase two covariate (i.e., for estimating β2) when the model for the conditional distribution is correct (upper right) and misspecified (lower right).

In all simulation settings, β^ALR and β^ALR,γ^ are observed to be more efficient than β^HT for both linear-rank weight functions when [Wmiss|Wobs] is correctly specified. When [Wmiss|Wobs] is misspecified, β^ALR and β^ALR,γ^ are more efficient than β^HT for the phase one covariate in all simulation settings and for both weight functions. The estimate β^HT was observed to be (slightly) more efficient for the phase two coefficient when the conditional distribution is misspecified and the error distribution is the Weibull(0.5, 1). When comparing β^ALR and β^ALR,γ^ to β^HT,γ^, both β^ALR and β^ALR,γ^ are more efficient in almost all of the settings considered and never do significantly worse. As expected from the simulation results in Nan et al. (2009), the efficiency gains from estimating the sampling weights in a stratified case-cohort study tend to be slight. This is supported by the simulation result presented in Figure 1 where β^HT,γ^ and β^ALR,γ^ respectively perform very similarly to β^HT and β^ALR for the phase two covariate and show at best modest efficiency gains for the phase one covariate (−1% to 13%).

As expected, correct specification of [Wmiss|Wobs] typically improves the efficiency of the AIPW estimators relative to when this distribution is misspecified. However, the efficiency price for misspecifying this model is not large; in fact, on some occasions the estimator using the misspecified conditional distribution is observed to be slightly more efficient. When comparing the empirical variances of β^ALR and β^ALR,γ^ for both linear-rank weight functions, the tables in Supplementary Web Appendix S.4.2 show that the Log-Rank weight is more efficient when the error distribution is Weibull (where it is in fact the asymptotically efficient choice in the full covariate setting) but the Gehan weight is more efficient for the other four error distributions, sometimes significantly so, with this trend being consistent across the variations of the other factors.

The results for the case of 30% censoring shown in Figure S.1 in Supplementary Web Appendix S.4.1 show similar trends. In Section S.4.3 of the Supplementary Web Appendix, the results of simulations where the sampling probabilities are unknown and need to be estimated are summarized. The results are generally similar to those presented in Figures 1 and S.1 with the exception that the efficiency gains for the phase one covariate Wobs tend to be greater in this setting compared to those observed under the case-cohort design.

6. Data Analysis: Wilms Tumor Study

To illustrate the performance of the proposed AIPW estimators in a practical setting, we consider data from the National Wilms’ Tumor Study Group (NWTS) dataset; see D’angio et al. (1989). In the case-cohort setting, this dataset has been previously analyzed using a Cox regression model; see Kulich and Lin (2004) and Breslow et al. (2009).

The dataset consists of 3915 subjects followed until either death or occurrence of disease progression. The censoring rate is 83%. Covariate information was collected on all subjects; among these variables is histology (favorable or unfavorable). Histological type was both classified by the registering institute (local histology) and by a more experienced pathologist at a central facility (central histology). There is a strong relationship between local and central histology, the sensitivity and specificity of unfavorable local histology being 74 and 98%, respectively.

The collection of histology from the experienced pathologists was both expensive and time consuming. Because local histology is an excellent surrogate for central histology, one alternative approach to collecting central histology on all subjects is to use a case-cohort design in which central histology is collected on a subset of the non-failures only. Following Kulich and Lin (2004) and Breslow et al. (2009), who studied this problem in settings where failure times are assumed to follow a Cox regression model, we simulate a case-cohort design where only part of the subjects with central histology are assumed to have been assessed; all other covariate information is available on all subjects, and we study the aspects of the performance of β^HT, β^HT,γ^, β^ALR, and β^ALR,γ^ for estimating β0 under model (1). The stratified Bernoulli sampling scheme used here is the same as in Kulich and Lin (2004); Breslow et al. (2009) sample similarly, instead using a stratified finite population sampling scheme. In particular, all failures are sampled and controls (i.e., non-failures) were sampled from sixteen strata based on local histology, stage of the disease (stage 1–2 vs 3–4), age at diagnosis (≥ 1 year, < 1 year), and event-free survival. This resulted in a case-cohort sample of approximately 1329 subjects having full information, including central histology. Because the missing central histology covariate (CH) is binary, the expectations E[WijYi(u|β)|T˜i,Δi,Wobs,i] needed for computing the augmentation term can be calculated analytically using a simple adaptation of the argument leading to (17) in Theorem 4.1. The required censoring and failure time densities are each modeled and estimated using an accelerated failure time model; here, the flexsurv package is used with a generalized gamma error distribution (i.e., on the failure time scale). The required probabilties P (CH = m|Wobs), m = 0, 1 are estimated using the weighted logistic regression model from Kulich and Lin (2004) and includes the covariates local histology, an indicator I if stage is greater than 2, the interaction of local histology and the binary variable I, an indicator if age is greater than 10, and the study variable.

The final AFT model of interest includes central histology, indicator if stage is 3 or 4 (stage), the diameter of the tumor in cm centered (diam), as the effect of age on survival time was non monotone, age centered was fit as a piece-wise linear function with a break point at 1age¯, where age¯ is the mean of age. In the model age1 is the effect of age centered up until the breakpoint, and age2 is the effect after the breakpoint. Furthermore, the interactions between stage and diam, CH and both age1 and age2 were also included. This model includes the same covariates as the model fit in Kulich and Lin (2004), but differs in that both age and tumor diameter are centered.

Because full covariate information is available, we can compare the proposed AIPW estimators to those obtained when no covariates are missing. As in Kulich and Lin (2004), to evaluate the efficiency of β^ALR and β^ALR,γ^ when respectively compared to β^HT and β^HT,γ^ we calculate the mean squared error centered at the full covariate estimator, capturing both the variance and the bias of the estimator compared to the full cohort estimator. The relative efficiency for the Gehan weight given in Table 1 is the mean squared error of β^HT divided by the mean squared error of β^ALR based on 50 simulations. Values greater than 1.0 indicate better performance of the augmented estimator. The relative efficiency of β^ALR,γ^ is similarly defined as the mean squared error of β^HT,γ^ divided by the mean squared error of β^ALR,γ^. The results for the Log-Rank weight are given in Supplementary Web Appendix S.5. The average regression coefficient and standard error estimates for β^ALR are given as well as the corresponding estimates obtained using the full cohort. The estimated parameter and standard errors for β^ALR,γ^ (not shown) are very similar to those reported for β^ALR,γ^.

Table 1:

Analysis of the Wilms’ tumor data for the Gehan weight function. β^ALR and S^D(βALR) are the average regression coefficient and standard error estimates for β^ALR. β^F and S^D(βF) are the regression coefficient and standard error estimates for the full data estimator, that is when CH is available on everyone. RE is the relative MSE conditional on the full data estimator for β^ALR compared to β^HT with values larger than one indicating better performance of β^ALR. RE EW is the RE for β^Aug,γ^ compared to β^HT,γ^.

Gehan
Cov β^F S^D(βF) β^ALR S^D(βALR) RE RE EW
CH 3.64 2.720 3.85 3.240 1.01 1.01
Age1 1.31 0.890 1.30 1.060 1.00 1.00
Age2 −0.24 0.029 −0.24 0.040 3.10 3.52
Stage −1.06 0.210 −1.06 0.230 1.40 1.30
Diam −0.15 0.031 −0.15 0.034 3.22 3.11
Stage*Diam 0.18 0.041 0.17 0.041 2.17 2.49
CH*Age1 2.64 1.030 2.71 1.210 0.97 0.96
CH*Age2 0.18 0.047 0.17 0.062 2.15 1.84

Table 1 shows that all coefficient estimates for the AIPW estimate β^ALR are close to the corresponding estimates obtained using the full cohort. The difference between the standard error estimates for the full covariate estimator and β^ALR are sometimes negligible, indicating a strong degree of information recovery. Whether known or estimated sampling weights are used, there are large efficiency gains for the main effects of age2, stage, diam and the interactions of stage and diam, and CH and age2 compared to using β^HT. Table S.9 in Supplementary Web appendix S.5 shows similar results for the Log-Rank weight, with standard error estimates either comparable to or larger than those for the Gehan weight.

7. Discussion

This paper derives AIPW estimating equations from the inefficient IPW estimating function (3) in the case where failure times follow the AFT model (1). We considered the setting where the full covariate estimating equation was selected to be the linear-rank estimating equation, and where the missing data indicators are mutually independent. To our knowledge, this paper is the first to address the problem of improving the efficiency of simple IPW estimators for β0 under the AFT model (1) when covariates are MAR. Several extensions of this work are potentially of interest.

One interesting direction would be to investigate the feasibility and performance of using the efficient full covariate score in place of the linear rank estimating equation. Zeng and Lin (2007) derive the efficient estimator of β0 in this setting; Ding and Nan (2011) proposed a sieve-based estimator using B-splines that is asymptotically equivalent and permits easier computation. Relative to more typical choices of weight functions used in connection with linear-rank estimating equations, the use of an AIPW estimating equation derived from the efficient full covariate score (e.g., see Ding and Nan, 2011) might, or might not, improve efficiency because the resulting estimating equation will not necessarily result in an estimator that achieves the semiparametric information bound. The relevant estimators and corresponding asymptotic results also cannot be readily derived from the results developed in this paper. This is because the full covariate estimating equations appearing in the developments of both Zeng and Lin (2007) and Ding and Nan (2011) are not members of the class of weighted linear rank estimating equations and involve infinite dimensional nuisance parameters that can only be estimated with convergence rates slower than n−1/2.

Related, it would also be of interest to explore the possibility of approximating the fully efficient observed data estimator. Under a two-phase sampling design, one approach would be to make use of the efficiency results just described in an attempt to approximate the solution to the integral equation that defines the efficient observed data score. Alternatively, in the same setting, Zeng and Lin (2014) obtained an efficient estimator for the regression parameter in the semiparametric transformation model by maximizing a modification of the nonparametric likelihood. Their results, which also cover efficient estimation for general two-phase designs under the Cox regression model but not the AFT model, require the availability of a low-dimensional set of continuous phase one variables (e.g., 3 or fewer) that is correlated with the covariate to be collected in the second phase sample. Extending these results to the setting of the AFT model presents challenges, both in the bundling of parameters as well as the lack of smoothness; see, for example, Nan and Wellner (2013).

A third extension of interest would be to explore the efficiency gains possible under sampling designs that rely on a finite population sampling scheme (FPSS), particularly stratified FPSS and also nested case control studies. Kong and Cai (2009) consider IPW estimators identical to those in Nan et al. (2009, Ex. (v), p. 2354), but where sampling scheme is a stratified FPSS with a fixed number of strata. The asymptotic variance of β^HT,γ^ (see also Nan et al., 2009) matches that in Kong and Cai (2009), demonstrating that results established in Breslow and Wellner (2007) can be extended to the AFT model. However, it is not clear whether such equivalences extend to AIPW estimators, in part due to the fact that the existing efficiency theory for AIPW estimators depends heavily on i.i.d. sampling assumptions. For related reasons, the results developed in this paper cannot currently be used to justify efficiency improvements in the setting of nested case-control studies (e.g., Cai and Zheng, 2013). In such settings, stratified FPSS is used to select controls at each failure time; here, sampling indicators are again dependent and, in addition, the number of sampling strata increases with sample size. To date, the existing theory for IPW estimators under a Cox regression failure time model developed in Cai and Zheng (2013) has not yet been rigorously extended to the case of the AFT model. There are currently no results in the literature applicable to the study of AIPW estimators for nested case-control designs.

The approach to deriving the augmented estimating function Ψ^ALR(β,ρ^,α^,γ;π) taken in this paper differs in an important way from earlier literature on augmented estimators derived under a Cox regression model. Influential papers with covariates missing under a MAR mechanism include Wang and Chen (2001) and Qi et al. (2005), the latter being heavily influenced by the former; the work of Luo et al. (2009), Xu et al. (2009), and Qi et al. (2010) uses the basic augmented estimating equation from Qi et al. (2005) as a starting point for their respective developments. To illustrate the key differences between earlier work and the present approach, suppose that the underlying data generating process comes from a Cox model with a true regression parameter β0. The cumulative intensity function is given by 0tY(u)eβWλ0(u)du where Y(u)=I(T˜u). For later use define Λ(u)=0uλ0(u)du and Mi(s)=Ni(s)0sYi(u)eβWidΛ(u). When full covariate information is available the parameter β0 is efficiently estimated using the estimating function

1ni=1n(WiSF(1)(u,β)SF(0)(u,β))dNi(u), (19)

where Ni(u) is usual counting process and SF(j)(u,β)=n1i=1nYi(u)WijeβWi, j = 0, 1. Let s(j)(u,β)=E[Y(u)WjeβW],j=0,1. As in the case of (2) for the AFT model, the full covariate estimating function (19) is not a sum of i.i.d. terms. Consequently, the efficiency theory developed in Robins et al. (1994) cannot be applied directly to the estimating function (19).

Similarly to Section 3.1, standard calculations for the Cox model show that the “idealized” estimating function can be represented as the sum of the following four terms:

1ni=1nRiπ(U0i)(WiS(1)(u,β)S(0)(u,β))dNi(u); (20)
1ni=1nRiπ(U0i)π(U0i)(E[WidNi(u)|U0i]S(1)(u,β)S(0)(u,β)E[dNi(u)|U0i]) (21)
1ni=1nRiπ(U0i)(WiS(1)(u,β)S(0)(u,β))Yi(u)eβWidΛ(u) (22)

and

1ni=1nRiπ(U0i)π(U0i)(E[WiYi(u)eβWi|U0i]S(1)(u,β)S(0)(u,β)E[Yi(u)eβWi|U0i])dΛ(u). (23)

In order to be a function of the observed data, estimators must be substituted for all unknowns. Observe, in particular, that substitution of S˜(j)(u,β;π)=n1i=1nRiπ(U0i)Yi(u)WijeβWi, j = 0, 1 in for s(j)(u, β), j = 1, 2 zeros out (22). The augmented estimating equation used in Wang and Chen (2001, Eqn. 5) can be derived from that perspective, and corresponds to (i) assuming (22) is zero; (ii) substituting doubly robust augmented estimators, say S˜ALR(j)(u,β), in for s(j)(u, β), j = 1, 2; and, (iii) positing a model for [Wmiss|U0] and using an EM-type algorithm to iteratively estimate all unknown quantities. Qi et al. (2005, Eqns. 4 & 7) derive their basic estimating equations from Equation (5) in Wang and Chen (2001) by incorrectly asserting that (23) is zero. Alternatively, and equivalently, they essentially assume that both (22) and (23) are zero; then, simple or doubly robust augmented estimators S˜ALR(j)(u,β) are substituted in place of for s(j)(u, β), j = 1, 2 in (20) and (21). In Qi et al. (2005), all quantities depending on Wmiss|U0 are estimated nonparametrically using kernel estimators, as are the unknown sampling weights; these features distinguish their approach from that in Wang and Chen (2001).

Critically, (22) does not vanish in general when estimators other than S˜(j)(u,β;π) are used for s(j)(u, β), j = 1, 2. Wang and Chen (2001) do not account for the presence of (22); this negatively impacts efficiency and negates the validity of claims pertaining to double robustness. The augmented estimating equation in Qi et al. (2005, Eqn. 7), though double robust as claimed, is problematic in terms of efficiency because neither (22) nor (23) will typically vanish when general estimators are substituted in for unknown functions. That their estimating equation (i.e., the sum of (20) and (21)) can be expected to suffer in terms of efficiency is most easily seen in the case of the classical case-cohort study (i.e., where all failures are sampled). Specifically, consider (21) and note that the measurability of dNi(u) with respect to U0i means that Δi can be factored outside each conditional expectation that appears in the integrand. As a result, each integral in (21) can be multiplied by Δi(Riπ(U0i))/π(U0i) without changing the resulting expression. When all failures are sampled, it further follows that Δi(Riπ(U0i))=0 because Ri=π(U0i)=1 whenever Δi = 1. Consequently, in a classical case-cohort study, (21) vanishes regardless of how s(j)(u, β), j = 1, 2 are estimated, showing that Equation (7) in Qi et al. (2005) reduces to an inefficient IPW estimating equation. These observations have important implications for the work of Luo et al. (2009), Xu et al. (2009), and Qi et al. (2010) because each uses an augmented estimating equation derived from those in Qi et al. (2005, Eqns. 4 & 7). It would be interesting to revisit these estimation problems and study the behavior of AIPW estimating equations that are instead derived from (20)(23).

Supplementary Material

Supp1

Acknowledgements

The authors thank the editors, an associate editor, and three referees for their helpful comments. The partial support of NIH grant R01CA163687 is gratefully acknowledged.

Contributor Information

JON ARNI STEINGRIMSSON, Department of Biostatistics, Johns Hopkins University, Baltimore MD 21205 jsteing5@jhu.edu.

ROBERT L. STRAWDERMAN, Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, Robert_strawderman@urmc.rochester.edu

References

  1. Borgan O, Langholz B, Samuelsen SO, Goldstein L, and Pogoda J “Exposure stratified case-cohort designs.” Lifetime Data Analysis, 6(1):39–58 (2000). [DOI] [PubMed] [Google Scholar]
  2. Breslow NE, Hu J, and Wellner JA “Z-estimation and stratified samples: application to survival models.” Lifetime data analysis, 1–24 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, and Kulich M “Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology.” Statistics in Biosciences, 1(1):32–49 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Breslow NE and Wellner JA “Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression.” Scandinavian Journal of Statistics, 34(1):86–102 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Buckley J and James I “Linear regression with censored data.” Biometrika, 66(3):429–436 (1979). [Google Scholar]
  6. Cai J and Zeng D “Power Calculation for Case–Cohort Studies with Nonrare Events.” Biometrics, 63(4):1288–1295 (2007). [DOI] [PubMed] [Google Scholar]
  7. Cai T and Zheng Y “Resampling Procedures for Making Inference Under Nested Case–Control Studies.” Journal of the American Statistical Association, 108(504):1532–1544 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cox DR “Regression models and life tables (with discussion).” Journal of the Royal Statistical Society, Series B (Methodological), 34(2):187–220 (1972). [Google Scholar]
  9. D’angio GJ, Breslow N, Beckwith JB, Evans A, Baum E, Delorimier A, Fernbach D, Hrabovsky E, Jones B, Kelalis P, et al. “Treatment of Wilms’ tumor. Results of the third national Wilms’ tumor study.” Cancer, 64(2):349–360 (1989). [DOI] [PubMed] [Google Scholar]
  10. Ding Y and Nan B “A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data.” Annals of Statistics, 39(6):2795 (2011). [PMC free article] [PubMed] [Google Scholar]
  11. Fygenson M and Ritov Y “Monotone estimating equations for censored data.” Annals of Statistics, 22(2):732–746 (1994). [Google Scholar]
  12. Huang Y “Calibration regression of censored lifetime medical cost.” Journal of the American Statistical Association, 97(457):318–327 (2002). [Google Scholar]
  13. Jackson C flexsurv: Flexible parametric survival models (2014). R package version 0.3. URL http://CRAN.R-project.org/package=flexsurv
  14. Jin Z, Lin D, Wei L, and Ying Z “Rank-based inference for the accelerated failure time model.” Biometrika, 90(2):341–353 (2003). [Google Scholar]
  15. Johnson LM and Strawderman RL “Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data.” Biometrika, 96(2):577–590 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Kong L and Cai J “Case-cohort analysis with accelerated failure time model.” Biometrics, 65(1):135–142 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Kulich M and Lin D “Improving the efficiency of relative-risk estimation in case-cohort studies.” Journal of the American Statistical Association, 99(467):832–844 (2004). [Google Scholar]
  18. Lin Y and Chen K “Efficient estimation of the censored linear regression model.” Biometrika, 100(2):525–530 (2013). [Google Scholar]
  19. Luo X, Tsai WY, and Xu Q “Pseudo-partial likelihood estimators for the Cox regression model with missing covariates.” Biometrika, 96(3):617–633 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Miller RG “Least squares regression with censored data.” Biometrika, 63(3):449–464 (1976). [Google Scholar]
  21. Nan B “Efficient estimation for case-cohort studies.” Canadian Journal of Statistics, 32(4):403–419 (2004). [Google Scholar]
  22. Nan B, Emond MJ, and Wellner JA “Information bounds for Cox regression models with missing data.” Annals of Statistics, 32(2):723–753 (2004). [Google Scholar]
  23. Nan B, Kalbfleisch JD, and Yu M “Asymptotic theory for the semiparametric accelerated failure time model with missing data.” The Annals of Statistics, 37(5A):2351–2376 (2009). [Google Scholar]
  24. Nan B and Wellner JA “A general semiparametric Z-estimation approach for case-cohort studies.” Statistica Sinica, 23:1155–1180 (2013). [PMC free article] [PubMed] [Google Scholar]
  25. Nan B, Yu M, and Kalbfleisch JD “Censored linear regression for case-cohort studies.” Biometrika, 93(4):747–762 (2006). [Google Scholar]
  26. Prentice RL “A case-cohort design for epidemiologic cohort studies and disease prevention trials.” Biometrika, 73(1):1–11 (1986). [Google Scholar]
  27. Qi L, Wang C, and Prentice RL “Weighted estimators for proportional hazards regression with missing covariates.” Journal of the American Statistical Association, 100(472):1250–1263 (2005). [Google Scholar]
  28. Qi L, Wang Y-F, and He Y “A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates.” Statistics in medicine, 29(25):2592 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Rathouz PJ “Identifiability assumptions for missing covariate data in failure time regression models.” Biostatistics, 8(2):345–356 (2007). [DOI] [PubMed] [Google Scholar]
  30. Ritov Y “Estimation in a linear regression model with censored data.” Annals of Statistics, 18(1):303–328 (1990). [Google Scholar]
  31. Robins JM, Rotnitzky A, and Zhao LP “Estimation of regression coefficients when some regressors are not always observed.” Journal of the American Statistical Association, 89(427):846–866 (1994). [Google Scholar]
  32. Scharfstein DO, Rotnitzky A, and Robins JM “Adjusting for nonignorable drop-out using semiparametric nonresponse models.” Journal of the American Statistical Association, 94(448):1096–1120 (1999). [Google Scholar]
  33. Self SG and Prentice RL “Asymptotic distribution theory and efficiency results for case-cohort studies.” Annals of Statistics, 16(1):64–81 (1988). [Google Scholar]
  34. Tsiatis AA “Estimating regression parameters using linear rank tests for censored data.” Annals of Statistics, 18(1):354–372 (1990). [Google Scholar]
  35. Tsiatis AA Semiparametric Theory and Missing Data. Springer; (2006). [Google Scholar]
  36. van der Vaart AW and Wellner JA Weak Convergence and Empirical Processes. Springer; (1996). [Google Scholar]
  37. Wang C and Chen HY “Augmented inverse probability weighted estimator for Cox missing covariate regression.” Biometrics, 57(2):414–419 (2001). [DOI] [PubMed] [Google Scholar]
  38. Xu Q, Paik MC, Luo XL, and Tsai W-Y “Reweighting Estimators for Cox Regression With Missing Covariates.” Journal of the American Statistical Association, 104(487):1155–1167 (2009). [Google Scholar]
  39. Ying Z “A large sample study of rank estimation for censored regression data.” Annals of Statistics, 1(21):76–99 (1993). [Google Scholar]
  40. Yu M and Nan B “A revisit of semiparametric regression models with missing data.” Statistica Sinica, 16(4):1193 (2006). [Google Scholar]
  41. Zeng D and Lin D “Efficient estimation for the accelerated failure time model.” Journal of the American Statistical Association, 102(480):1387–1396 (2007). [Google Scholar]
  42. Zeng D and Lin D “Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies.” Journal of the American Statistical Association, 109(505):371–383 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp1

RESOURCES