Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Sep 1.
Published in final edited form as: Can J Stat. 2010 Sep;38(3):333–351. doi: 10.1002/cjs.10072

Additive hazards regression with censoring indicators missing at random

Xinyuan SONG 1, Liuquan SUN 2, Xiaoyun MU 3, Gregg E DINSE 4
PMCID: PMC3010164  NIHMSID: NIHMS202676  PMID: 21197117

Abstract

In this article, the authors consider a semiparametric additive hazards regression model for right-censored data that allows some censoring indicators to be missing at random. They develop a class of estimating equations and use an inverse probability weighted approach to estimate the regression parameters. Nonparametric smoothing techniques are employed to estimate the probability of non-missingness and the conditional probability of an uncensored observation. The asymptotic properties of the resulting estimators are derived. Simulation studies show that the proposed estimators perform well. They motivate and illustrate their methods with data from a brain cancer clinical trial.

Keywords: Additive hazards model, censoring, kernel smoother, missing at random, weighted estimating equation

1. INTRODUCTION

In the analysis of failure time data, the cause of failure may be unknown for some subjects for a variety of reasons (e.g., autopsies were not performed or medical records were missing). We motivate and illustrate our methods with data on patients from a brain cancer clinical trial, where we evaluate the effect of two potential explanatory variables on a measure of quality of life. All patients were initially ambulatory, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. To assess quality of life, we define “survival time” as the time to non-ambulatory progression. Thus, patients who progressed and were no longer ambulatory contributed uncensored times, patients who progressed but were still ambulatory or who had not progressed by the end of the study contributed censored times, and patients who progressed but whose ambulatory status was unknown contributed times with missing censoring indicators. We apply our regression analysis to evaluate the effects of sex and age on the time to non-ambulatory progression.

Specifically, let T be the failure time, let Z be a p×1 vector of covariates, and let C be a censoring time that is assumed to be conditionally independent of T given Z. Data are available on Z and X = T ^C, but the censoring indicator δ = I(TC) may be missing. If the probability that δ is missing does not depend on either the true value of δ or the values of X and Z, then a missing δ is said to be missing completely at random (MCAR). Alternatively, if the probability that δ is missing depends on the values of X and Z but not on the true value of δ, then a missing δ is said to be missing at random (MAR); see Little & Rubin (1987).

Under the MCAR assumption and in the absence of covariates, Dinse (1982) obtained a nonparametric maximum likelihood estimator (NPMLE) of the survival function using an EM algorithm. Lo (1991) proved that there are infinitely many NPMLEs and some of them may be inconsistent; he consequently constructed a consistent and asymptotically normal estimator. Gijbels, Lin & Ying (1993, 2007) and McKeague & Subramanian (1998) proposed further improvements on these estimators. When covariates are present, Gijbels, Lin & Ying (1993) initiated research on estimation under the Cox model. McKeague & Subramanian (1998) provided an alternative approach to estimation. Subramanian (2000) considered estimation under proportionality of conditional hazards. Zhou & Sun (2003) studied the additive hazards regression model.

Under the MAR assumption, van der Laan & McKeague (1998) first addressed efficient estimation of the survival function and proposed a sieved nonparametric maximum likelihood estimator. Further developments along the lines of efficient estimation can be found in Subramanian (2004, 2006) and Wang & Ng (2008). Goetghebeur & Ryan (1995) and Lu & Tsiatis (2001) analyzed competing risks data with missing cause of failure under proportional hazards regression models. Gao & Tsiatis (2005) considered the linear transformation competing risks model with missing cause of failure. Recently, Lu & Liang (2008) studied competing risks data with missing cause of failure under the semiparametric additive hazards model, and suggested the inverse probability weighted (IPW) and double robust (DR) estimators. To obtain these estimators, however, they imposed parametric models for two components: the probability that the censoring indicator is not missing and the conditional probability of a given failure type.

In this article, we propose estimators for the regression parameters in a semiparametric additive hazards model, where the failure times are subject to right censoring and some censoring indicators are missing at random. We provide simple and fully augmented weighted estimators that incorporate incomplete data nonparametrically. Unlike Lu & Liang (2008), no parametric models are assumed for the missingness probability or the conditional probability of an uncensored observation; instead, we use nonparametric kernel smoothing techniques to estimate these probabilities. The resulting estimators have closed forms and are easy to implement. Under the usual MAR assumption, both the simple and fully augmented weighted estimators are consistent and asymptotically equivalent, i.e., they have the same asymptotic normal distribution. In addition, the asymptotic properties of the estimated baseline cumulative hazard function are also established for the model.

The remainder of the paper is organized as follows. Section 2 presents the simple and fully augmented weighted estimators and their asymptotic properties under the MAR assumption. Section 3 reports simulation results that show the proposed estimators perform well. In Section 4, our methods are applied to analyze the brain cancer data described earlier. Our concluding remarks follow in Section 5 and technical proofs are relegated to the Appendix.

2. ESTIMATION PROCEDURE

Under an additive hazards model, the hazard function for failure time T given covariate Z is assumed to be of the form

λ(tZ)=λ0(t)+β0Z, (1)

where λ0(t) is an unspecified baseline hazard function and β0 is a p-vector of unknown regression parameters. In the case where all data are observed, Lin & Ying (1994) introduced a pseudoscore function for the parameter vector β0 and showed that the resulting estimator is consistent and asymptotically normal, with an easily estimated covariance matrix.

When censoring indicators are missing for right-censored data, we observe n independent and identically distributed vectors (Xi, ξi, ξiδi, Zi, Ri) (i = 1, …, n), where ξi is an indicator that δi is not missing, and Ri is an auxiliary covariate that is not used to model the hazard but may be used to describe the probability that δi is missing. The probability that δi is missing is characterized by the distribution of ξi given δi and Wi = (Xi, Zi, Ri), which is Bernoulli with probability P{ξi = 1|δi, Wi = w}. Under the MAR assumption (Little & Rubin 1987), we have

P{ξi=1δi,Wi=w}=P{ξi=1Wi=w}ρ(w). (2)

Another function of interest is π(w) = P{δi = 1|Wi = w, ξi = 1}, which is the conditional probability of an uncensored observation, given that δi is observed and Wi = w.

A naive method for estimating β0 is to simply ignore the missing data and to apply the pseudoscore function of Lin & Ying (1994) to the complete data only. Such a procedure (called the complete case estimator) may not only lose efficiency due to discarding incomplete observations, but may also generate biased estimators, even when the censoring indicators are MAR. If either ρ(w) or π(w) is modeled correctly, we can use the approach of Lu & Liang (2008) to obtain the IPW and DR estimators. In many situations, however, knowledge of ρ(w) and π(w) is limited, and thus both models may be misspecified. In this article, no parametric models are assumed for these two probabilities; rather, both are estimated nonparametrically by kernel smoothers. We begin by introducing the simple weighted estimator, which is derived under the MAR assumption.

Because ρ(Wi) is a function of continuous variables such as Xi, we estimate it with the Nadaraya-Watson estimator based on complete observations. Specifically, let d denote the number of continuous elements of Wi and let K be an rth-order (r > d) kernel function of d variables with finite support that satisfies ∫K(u)du = 1, ∫umK(u)du = 0, m = 1,, r − 1, ∫urK(u)du ≠ 0, and ∫K(u)2du < ∞, where u can be a scalar or a vector. If u is a vector, say u = (u1, …, ud)′, then um denotes (u1m,,udm). The motivation for using higher-order kernels is to reduce the order of magnitude of the bias of the curve estimator, leading to a faster rate of convergence of the mean integrated squared error (Wand & Schucany 1990). This type of kernel function may be constructed in various manners. For instance, Wand & Schucany (1990) gave a univariate Gaussian-based kernel of order 2r:

K(u1)=(1)rφ(2r1)(u1)2r1(r1)!u1,

where φ(2r−1)(u1) is the (2r − 1)-th derivative of the standard normal density function φ(u1). Hall & Marron (1988) proposed a class of univariate kernels of order r:

K(u1)=π10cos(tu1)exp(tr)dt.

Some higher-order polynomial kernels can be found in Müller (1984) and Gasser, Müller & Mammitzsch (1985).

Define Kh(·) = K/h), where h is a bandwidth sequence, and K(u/h) = K(u1/h,, ud/h) for u = (u1, …, ud)′. Write Wi = (W1i, W2i), where W1i and W2i include all continuous and discrete elements of Wi, respectively. Then the Nadaraya-Watson estimator of ρ(w) is given by

ρ^(w)=i=1nξiKh(w1W1i)I(W2i=w2)i=1nKh(w1W1i)I(W2i=w2), (3)

where w = (w1, w2). The choice of the kernel function K usually has little effect on the estimator ρ̂(w), and thus the estimator of β0, but the bandwidth sequence h typically does influence these estimators, both theoretically and practically. We assume that h satisfies nh2r → 0 and nh2d → ∞ as n → ∞. If h = O(n−1/p) for some integer p > 2d, then a reasonable choice for r is the smallest even integer such that rpd (Qi, Wang & Prentice 2005). For example, when d = 2, we might choose p = 5 and r = 4. In a similar manner, we can estimate π(w) by

π^(w)=i=1nξiδiKh(w1W1i)I(W2i=w2)i=1nξiKh(w1W1i)I(W2i=w2). (4)

Note that the kernel function K and bandwidth sequence h used in (3) need not be identical to those used in (4), and the bandwidth can be different for each component of W1i. For example, we can define h = (h1, …, hd)′ for different bandwidths, and write K(u/h) = K(u1/h1, …, ud/hd). Here, we use the same K and h in both for notational convenience.

Let Λ0(t)=0tλ0(s)ds denote the baseline cumulative hazard function. Using the inverse probability weighted approach, consider the following estimating equations for β0 and Λ0:

i=1n0τξiρ^(Wi)Zi[dNiu(t)Yi(t)βZidtYi(t)dΛ0(t)]=0, (5)
i=1nξiρ^(Wi)[dNiu(t)Yi(t)βZidtYi(t)dΛ0(t)]=0, (6)

where Niu(t)=I(Xit,δi=1), Yi(t) = I(Xit), and τ is a prespecified positive constant such that P(Xiτ) > 0. The resulting simple weighted estimators for β0 and Λ0 have the following closed forms:

β^=[i=1n0τξiρ^(Wi)Yi(t){ZiZ¯(t)}2dt]1i=1n0τξiρ^(Wi){ZiZ¯(t)}dNiu(t)

and

Λ^0(t)=0ti=1nρ^(Wi)1ξi[dNiu(s)Yi(s)β^Zids]i=1nρ^(Wi)1ξiYi(s),

where a⊗2 = aa′ for any vector a and

Z¯(t)=i=1nρ^(Wi)1ξiYi(t)Zii=1nρ^(Wi)1ξiYi(t).

In practice, we often choose τ to be the largest observation time, say τ = max{Xi}.

Let (t) = E[Yi(t)Zi(t)]/E[Yi(t)]. Define Ni(t) = I(Xit) and Bi=0τ{Ziz¯(t)}dNi(t). The asymptotic properties of β̂ are given in the following theorem.

Theorem 1

Under regularity conditions (C1)–(C6), which are stated in the Appendix, β̂ is consistent and n1/2(β̂−β0) is asymptotically normal with mean zero and covariance matrix V = A−1ΣA−1 + A−1Σ*A−1, where

=E[0τ{Ziz¯(t)}2dNiu(t)],=E[π(Wi)(1π(Wi))(1ρ(Wi))ρ(Wi)1Bi2],

and

A=E[0τYi(t){Ziz¯(t)}2dt].

Note that the first term in V is the asymptotic variance of the Lin & Ying (1994) estimator based only on the complete data (ξi ≡ 1) and the second term represents the effect of the missing censoring indicators. If we let B^i=0τ{ZiZ¯(t)}dNi(t), then the covariance matrix V can be consistently estimated by = Â−1(Σ̂ + Σ̂*)Â−1, where

^=n1i=1n0τξiρ^(Wi){ZiZ¯(t)}2dNiu(t),^=n1i=1nπ^(Wi)(1π^(Wi))(1ρ^(Wi))ρ^(Wi)1B^i2,

and

A^=n1i=1n0τξiρ^(Wi)Yi(t){ZiZ¯(t)}2dt.

Define d(t)=0tz¯(s)ds and

D(t)=0t{E[Yi(s)Zi2]E[Yi(s)]z¯(s)2}β0ds.

The asymptotic properties of Λ̂0(t) are given in the next theorem.

Theorem 2

Under the assumptions of Theorem 1, Λ̂0(t) converges in probability to Λ0(t) uniformly in t ∈ [0, τ], and n1/2{Λ̂0(t) − Λ0(t)g converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function at (t, s) (t ≤ s) equal to

Γ(t,s)=0tdE{Niu(u)}(E{Yi(u)})2+E{π(Wi)(1π(Wi))(1ρ(Wi))ρ(Wi)10tdNi(u)(E{Yi(u)})2}d(t)A1{π(Wi)(1π(Wi))(1ρ(Wi))ρ(Wi)10s(Ziz¯(u))dNi(u)E{Yi(u)}}d(s)A1{π(Wi)(1π(Wi))(1ρ(Wi))ρ(Wi)10t(Ziz¯(u))dNi(u)E{Yi(u)}}+d(t)A1(+)A1d(s)d(t)A1D(s)d(s)A1D(t).

The covariance function Γ(t, s) can be consistently estimated by substituting β̂, ρ̂ and π̂ for the unknowns β0, ρ and π in the appropriate empirical estimators, and by replacing the (unobserved) processes Niu with ρ^(Wi)1ξiNiu. For an individual with a given covariate vector z0, the corresponding estimator of the survival function S(t, z0) is

S^(t,z0)=exp{Λ^0(t)β^z0t}.

Using the functional delta-method and Theorem 2, we can obtain the asymptotic properties of Ŝ(t, z0), which can be applied to construct confidence bands for S(t, z0).

When the missingness probability ρ(w) is known or a parametric model is specified for ρ(w), the simple weighted estimator uses only the complete case data (i.e., only individuals with ξi = 1), and the fully augmented weighted estimator (also called the double robust estimator) incorporates contributions from the incomplete observations (i.e., individuals with ξi = 0), thus the fully augmented weighted estimator is more efficient than the corresponding simple weighted estimator (Lu & Liang 2008). In addition, the fully augmented weighted estimator has the so-called double-robustness property; that is, the estimator is consistent if one can correctly specify either the missingness probability ρ(w) or the conditional probability of an uncensored observation π(w) (Wang & Chen 2001). However, estimating ρ(w) nonparametrically enables the simple weighted estimator β̂ to follow the same asymptotic distribution as the fully augmented weighted estimator β̂a (described next). This indicates that β̂ is equivalent asymptotically to β̂a. These conclusions are consistent with the results of Qi, Wang & Prentice (2005) for proportional hazards regression with missing covariates.

The fully augmented weighted estimators for β0 and Λ0 are the solutions to the following estimating equations:

i=1n0τZi[ξiρ^(Wi)dNiu(t)+(1ξiρ^(Wi))π^(Wi)dNi(t)Yi(t)βZidtYi(t)dΛ0(t)]=0, (7)
i=1n[ξiρ^(Wi)dNiu(t)+(1ξiρ^(Wi))π^(Wi)dNi(t)Yi(t)βZidtYi(t)dΛ0(t)]=0. (8)

The resulting fully augmented weighted estimators for β0 and Λ0 have the following closed forms:

β^a=[i=1n0τYi(t){ZiZ¯(t)}2dt]1i=1n0τ{ZiZ¯(t)}×[ξiρ^(Wi)dNiu(t)+(1ξiρ^(Wi))π^(Wi)dNi(t)]

and

Λ^a(t)=0ti=1n[ρ^(Wi)1ξidNiu(s)+(1ρ^(Wi)1ξi)π^(Wi)dNi(s)Yi(s)β^aZids]i=1nYi(s),

where

Z¯(t)=i=1nZiYi(t)i=1nYi(t).

Similar to Theorems 1 and 2, the asymptotic properties of β̂a and Λ̂a are given in the following theorem.

Theorem 3

Under the assumptions of Theorem 1, we have:

  1. β̂a is consistent and n1/2(β̂a − β0) is asymptotically normal with mean zero and covariance matrix V:

  2. Λ̂a(t) converges in probability to Λ0(t) uniformly in t ∈ [0, τ], and n1/2{Λ̂a(t) − Λ0(t)} converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function Γ(t, s) at (t, s) (t ≤ s).

For the fully augmented weighted method, the covariance matrix V and covariance function Γ(t, s) can be consistently estimated by substituting β̂a, ρ̂ and π̂ for β0, ρ and π in the appropriate empirical estimators, and by replacing the processes Niu with ρ^(Wi)1ξiNiu+(1ξiρ^(Wi)1)π^(Wi)Ni.

Theorems 1, 2 and 3 show that both the simple and fully augmented weighted estimators have the same asymptotic normal distribution, and the resulting estimators of the baseline cumulative hazard function converge to the same Gaussian process. This means that the simple weighted estimators with nonparametric ρ̂(w) are as efficient as the kernel-assisted fully augmented weighted estimators. One intuitive explanation for this is that the incomplete observations are indirectly incorporated in the simple weighted estimator by using the inverse of ρ̂(w) as a weight.

Note that Λ̂0(t) and Λ̂a(t) may not always be monotonic in t, in which case simple modifications such as those discussed in Lin & Ying (1994) can be made to ensure monotonicity while preserving asymptotic properties.

3. SIMULATION STUDIES

We conducted simulation studies to examine and compare the finite-sample performance of the simple and fully augmented weighted estimators proposed in Section 2, and also to compare their performance with that of the full data and complete-case analyses under the MAR model. In these studies, we considered three situations for the covariate Z: (a) Z was assumed to follow a Bernoulli distribution with success probability 0.5; (b) Z was generated from a uniform distribution on (0, 1); (c) Z = (Z1, Z2)′, where Z1 follows a uniform distribution on (0, 1) and Z2 follows a Bernoulli distribution with success probability 0.5. The underlying additive hazards model for the failure time T was taken to be λ(tZ)=1+β0Z, where β0 = 0, 0.5 and 1 for the case Z is a scalar, and β0 = (0, 0)′ and β0 = (1, −1)′ for the two-dimensional covariate. The censoring time C was generated from a uniform distribution on (0, c), where c was selected to give a censoring rate of either 15% or 55%.

The missingness indicators were generated from the logistic model

ρ(W)=exp(θW)1+exp(θW), (9)

where W = (X, Z), X = T ^ C, and θ was chosen to produce a missingness rate of 50% under each censoring level. When Z was a Bernoulli random variable, there was only one (d = 1) continuous element in W, and we used the univariate Gaussian kernel function K(u) = (2π)−1/2 exp(−u2/2) and a bandwidth of h = 0.5n−1/3, with sample size of n = 100. When Z was a uniform random variable or a two-dimensional covariate as in (c), there were two (d = 2) continuous elements in W, and we used the bivariate Gaussian-based kernel function of order 4 (Wand & Schucany 1990)

K(u1,u2)=18π(3u12)(3u22)exp((u12+u22)/2) (10)

and a bandwidth vector of h = (h1, h2)′ = (1.5n−1/5, n−1/5)′, with sample size of n = 400. We took τ to be the largest observed value of X, so that all data were used in the analysis. All simulation studies were based on 1000 replications for each combination of parameters.

Our simulation results are summarized in Tables 1 and 2. In these tables, Bias is the sample mean of the estimate minus the true value; MSE is the sample mean of the squared differences between the estimate and the true value; and CP is the 95% empirical coverage probability for β0 based on a normal approximation. Similar summaries for the full-data and complete-case estimators are calculated for comparison.

Table 1.

Simulation results for one covariate with a missingness rate of 50%

Parameters β0 = 1
β0 = 0.5
β0 = 0
CR Estimate Bias MSE CP Bias MSE CP Bias MSE CP
Z ~ Bernoulli(0.5) with n=100
15% Full 0.0322 0.1304 0.952 −0.0076 0.0816 0.948 0.0009 0.0506 0.960
SWE −0.0205 0.1563 0.939 −0.0033 0.1066 0.948 −0.0168 0.0603 0.943
FAWE 0.0336 0.1569 0.945 −0.0049 0.0992 0.948 0.0020 0.0607 0.952
CC −0.3630 0.3356 0.795 −0.1729 0.1461 0.882 −0.0865 0.0836 0.936
55% Full −0.0154 0.2061 0.956 −0.0067 0.1459 0.944 −0.0027 0.0914 0.954
SWE 0.0030 0.3652 0.948 0.0138 0.2194 0.941 0.0093 0.1631 0.951
FAWE 0.0042 0.3536 0.950 0.0056 0.2199 0.945 −0.0010 0.1451 0.953
CC −0.1918 0.4250 0.913 −0.0788 0.2489 0.923 −0.1122 0.1805 0.932
Z ~ uniform(0, 1) with n=400
15% Full 0.0063 0.0810 0.958 0.0164 0.0565 0.961 0.0058 0.0359 0.953
SWE 0.0133 0.1009 0.946 −0.0094 0.0622 0.939 0.0049 0.0401 0.934
FAWE 0.0049 0.0957 0.941 0.0187 0.0615 0.956 0.0039 0.0395 0.942
CC −0.2304 0.1608 0.866 −0.1510 0.0988 0.918 −0.0962 0.0592 0.912
55% Full 0.0199 0.1436 0.948 0.0061 0.0845 0.964 0.0052 0.0666 0.947
SWE −0.0052 0.2263 0.943 −0.0094 0.1288 0.947 0.0050 0.1130 0.933
FAWE 0.0195 0.2258 0.950 −0.0034 0.1248 0.952 0.0002 0.1099 0.934
CC −0.2371 0.2695 0.910 −0.1839 0.1541 0.910 0.0624 0.1175 0.925

CR stands for censoring rate, Full stands for full data estimator, SWE stands for simple weighted estimator, FAWE stands for fully augmented weighted estimator, and CC stands for complete case estimator.

Table 2.

Simulation results for a two-dimensional covariate vector with a missingness rate of 50%

CR Estimate Bias MSE CP Bias MSE CP
β10 = 0
β20 = 0
15% Full 0.0103 0.0369 0.947 −0.0050 0.0122 0.957
SWE −0.0129 0.0526 0.942 0.0092 0.0144 0.949
FAWE 0.0091 0.0512 0.946 −0.0077 0.0145 0.938
CC −0.2222 0.0932 0.805 0.1091 0.0261 0.836
55% Full −0.0161 0.0640 0.958 0.0010 0.0212 0.956
SWE −0.0352 0.1044 0.949 0.0048 0.0346 0.940
FAWE −0.0231 0.1057 0.955 0.0014 0.0322 0.946
CC −0.1290 0.1122 0.809 0.1701 0.0605 0.835
β10 = 1
β20 = −1
15% Full −0.0001 0.0537 0.957 −0.0053 0.0221 0.952
SWE −0.0100 0.0784 0.957 0.0062 0.0297 0.940
FAWE −0.0161 0.0805 0.950 0.0027 0.0302 0.955
CC −0.0951 0.0950 0.902 0.0797 0.0460 0.891
55% Full 0.0113 0.0541 0.949 −0.0078 0.0221 0.953
SWE −0.0135 0.0796 0.961 0.0057 0.0309 0.933
FAWE 0.0086 0.0810 0.948 −0.0014 0.0335 0.955
CC −0.0806 0.0930 0.908 0.0714 0.0450 0.887

CR stands for censoring rate, Full stands for full data estimator, SWE stands for simple weighted estimator, FAWE stands for fully augmented weighted estimator, and CC stands for complete case estimator.

Tables 1 and 2 show that the complete case estimator is highly biased in all situations, with coverage probabilities that are too small, whereas the simple and fully augmented weighted estimators are nearly unbiased, with very reasonable coverage probabilities. Furthermore, the simple and fully augmented weighted estimators have similar MSE values, which are only slightly larger than those of the full data estimator and are often much smaller than those of the complete case estimator. These results suggest that our proposed estimators are more efficient than the complete case estimator and are adequate for practical use. We also simulated data under different parameter configurations and obtained similar results.

We compared the proposed methods and the parametric approach of Lu & Liang (2008) under MAR and MCAR assumptions. Data were simulated under correctly and incorrectly specified parametric models, using the same setup as in Table 1 with a censoring rate of 55% and a missingness rate of 50%, where Z follows a Bernoulli distribution with a sample size of n = 200 and β0 = 0 and 1. The results are presented in Table 3. In Table 3, LIPW1 and LIPW2 stand for the inverse probability weighted (IPW) estimators of Lu & Liang (2008) when using the logistic model and the constant model for ρ(w), respectively; LDR1 and LDR2 stand for the double robust (DR) estimators of Lu & Liang (2008) when using the logistic model and the constant model for ρ(w), respectively. In all cases, we used a constant model for π(w), which is misspecified.

Table 3.

Comparison of the proposed method with the parametric approach of Lu and Liang (2008) under MAR and MCAR for a missingness rate of 50%

Parameters β0 = 0
β0 = 1
Estimates Bias MSE CP Bias MSE CP
MAR Full 0.0040 0.0448 0.948 0.0063 0.1071 0.950
SWE 0.0018 0.0669 0.955 0.0099 0.1670 0.945
FAWE 0.0051 0.0677 0.953 0.0011 0.1669 0.947
LIPW1 0.0077 0.0909 0.945 0.0013 0.2192 0.950
LDR1 0.0031 0.0675 0.950 0.0006 0.1683 0.947
LIPW2 0.6229 0.4311 0.139 1.1296 1.3814 0.087
LDR2 0.2121 0.0843 0.813 0.5999 0.4515 0.471
MCAR Full 0.0060 0.0447 0.961 0.0108 0.1079 0.958
SWE 0.0089 0.0694 0.950 0.0080 0.1679 0.954
FAWE 0.0104 0.0698 0.946 0.0083 0.1661 0.950
LIPW2 0.0004 0.0906 0.944 0.0298 0.2099 0.942
LDR2 0.0089 0.0686 0.946 0.0209 0.1686 0.954
CC 0.0041 0.0932 0.955 0.0041 0.2191 0.955

CR stands for censoring rate, Full stands for full data estimator, CC stands for complete case estimator, SWE stands for simple weighted estimator, FAWE stands for fully augmented weighted estimator, LIPW1 and LIPW2 stand for the inverse probability weighted estimators of Lu and Liang (2008) when using the logistic model and the constant model for ρ(w), respectively, LDR1 and LDR2 stand for the double robust estimators of Lu and Liang (2008) when using the logistic model and the constant model for ρ(w), respectively, and in all cases a constant model is used for π(w).

It can be seen from Table 3 that the proposed methods are essentially unbiased in all the settings, and the parametric approach of Lu & Liang (2008) is also unbiased when the parametric model for ρ(w) is correctly specified. Furthermore, the proposed estimators are as efficient as the DR estimator of Lu & Liang (2008), and are more efficient than the IPW estimator of Lu & Liang (2008). When both ρ(w) and π(w) are misspecified, however, both the IPW and DR estimators of Lu & Liang (2008) are biased under MAR. The key advantage of our method is that it provides reasonable estimation without making parametric modeling assumptions about ρ(w) and π(w). Rather than assuming parametric models, our approach uses nonparametric smoothing techniques to estimate these probabilities. In addition, the proposed estimators are more efficient than the complete case estimator under MCAR. So, if MCAR is true, our proposed approach still works well and does not lose efficiency.

We also conducted simulation studies to examine the performance of the proposed methods when MNAR (missing not at random) is true. In the study, the setup was the same as in Table 1, where Z follows a uniform distribution on (0, 1) with β0 = 0 and n = 400, except that the censoring rate was set to be 20%, and the missingness probability was given by

ρ(W,δ)=exp(θ1W+θ2δ)1+exp(θ1W+θ2δ),

where θ1 and θ2 were chosen to produce a missingness rate of either 20% or 50%. The results are summarized in Table 4. It can be seen from Table 4 that the proposed estimation procedures perform well when the missingness rate is low (say, 20%), but when the missingness rate is high (say, 50%), the proposed estimators are a little biased. However, the biases are relatively small compared to those of the complete case estimator.

Table 4.

Simulation results for the proposed method under MNAR

MR Estimate Full SWE FAWE CC
20% Bias 0.0023 0.0103 0.0184 −0.0119
MSE 0.0385 0.0416 0.0424 0.0430
CP 0.936 0.935 0.938 0.938
50% Bias −0.0012 0.0167 0.0562 −0.0891
MSE 0.0383 0.0434 0.0491 0.0716
CP 0.938 0.936 0.930 0.926

MR stands for missingness rate, Full stands for full data estimator, SWE stands for simple weighted estimator, FAWE stands for fully augmented weighted estimator, and CC stands for complete case estimator.

4. EXAMPLE: ANALYSIS OF BRAIN CANCER DATA

We applied our methods to the brain cancer data mentioned earlier. We analyzed the data on all 387 patients who entered the clinical trial with a form of brain cancer known as glioblastoma. Dinse (1982) used a subset of these data to illustrate his nonparametric maximum likelihood analysis, which did not account for covariates. All patients were ambulatory when they entered the trial, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. As a measure of quality of life, we defined “survival time” as the time to non-ambulatory progression, and we evaluated the effects of sex and age on this event time.

Of the 387 patients, 86 progressed and were non-ambulatory, 24 progressed but were still ambulatory, 220 did not progress by the end of the study, and 57 progressed but had an unknown ambulatory status. Thus, our analysis treated these outcomes as 86 uncensored times, 244 censored times, and 57 times with a missing censoring indicator. There were 144 women and 243 men, ranging in age from 14 to 74 years, and the length of time on study (or until progression) varied from 2 to 1088 days.

Let X be the observed time (in days), measured from the beginning of the trial, and let δ indicate whether the patient had progressed and was non-ambulatory. We defined Z1 to be a binary indicator of the patient’s sex, which was 1 for men and 0 for women, and Z2 to be the age at trial entry (in years), which was treated as a continuous covariate. Since W = (X, Z1, Z2) contains two continuous elements, we used the bivariate Gaussian-based kernel function of order 4 for K, as defined in (10), with a bandwidth vector of h = (h1, h2)′ = (34, 10)′. We used τ = 1088, which was the largest observed value of X.

The analysis of the brain cancer data is summarized in Table 5, which gives the results for our simple weighted estimator (SWE) and our fully augmented weighted estimator (FAWE). For comparison, Table 5 also gives the results of the complete case (CC) analysis. None of the three methods suggested that men and women had different hazard rates for non-ambulatory progression. On the other hand, our two estimators showed that age is important (p = 0.037 for SWE and p = 0.011 for FAWE), but the CC analysis did not (p = 0.367). Specifically, the hazard rate for non-ambulatory progression increased as patients grew older, which is consistent with worsening quality of life. The age coefficients were of similar magnitude for all three methods, but the standard error was much larger for the CC analysis than for our SWE and FAWE analyses. Thus, as a result of excluding data, the complete case analysis missed the age effect on non-ambulatory progression that our approaches appropriately identified.

Table 5.

Estimates of regression coefficients for sex and age (in years), along with estimated standard errors and significance levels, from the analysis of time (in days) to non-ambulatory progression for patients in the brain cancer clinical trial.

CC
SWE
FAWE
sex age sex age sex age
β̂ × 104 1.570 0.081 1.589 0.077 1.708 0.087
SE(β̂) × 104 2.445 0.090 2.306 0.037 2.241 0.034
P-Value 0.521 0.367 0.491 0.037 0.446 0.011

Note: The sample size is n = 387; the bandwidth vector is h = (h1, h2)′ = (34, 10)′ CC denotes the complete case estimator; SWE denotes the simple weighted estimator; FAWE denotes the fully augmented weighted estimator.

5. CONCLUDING REMARKS

Model (1) has the limitation that the linear predictor β0Z needs to be constrained to ensure non-negativity for the right side of (1). One may avoid this constraint by using a nonnegative link function, such as λ0(t)+exp(β0Z). The ideas presented in this paper can be applied to any regression function g(β0Z), where g(·) is a known link function. In addition, Our approach can be extended to incorporate missing covariates (Qi, Wang & Prentice 2005) in the situation where both the failure indicators and the covariates are partially observed.

Nonparametric kernel estimation can be done for a small number of continuous covariates, but for categorical covariates, it would usually require stratified kernel estimation within each strata defined by the categorical covariates. In practice, when there are too many categories, it may be desirable to specify a more flexible model for the missingness probability, such as a partially linear additive model, and then use local kernel regression to estimate the missingness probability. Here we focus on a kernel estimation approach for ρ(w) and π(w). Of course, other smoothing techniques such as the local polynomial method (Fan & Gijbels 1996) may be used and require the same assumptions. Furthermore, n1/2-rate asymptotic normality of the proposed estimators indicates that an appropriate choice for the bandwidth sequence h depends only on the second order terms of the mean square error of the estimators, and thus bandwidth selection may not be critical for estimating β0 and Λ0.

Since the estimating functions in (5) to (8) were obtained in a somewhat ad hoc fashion, it might be worthwhile investigating possible improvements that could result from other approaches, such as the one suggested by McKeague & Sasieni (1994) or perhaps a nonparametric maximum likelihood approach. Alternatively, estimation procedures based on the general Aalen additive model (Aalen 1980) or the linear transformation model (Gao & Tsiatis 2005) with missing censoring information might also be worthy of investigation.

Another limitation of the approach given here is that the covariates Z are time-invariant. In some applications, we might want to incorporate time-dependent covariates. Thus, a more general approach might extend model (1) to a time-varying version:

λ(tZ(t))=λ0(t)+β0(t)Z(t),

where β0(t) is an unknown p-vector of time-varying regression coefficients and Z(t) is a vector of covariates that may depend on time. However, the proposed estimation procedure cannot be extended in a straightforward manner to deal with time-dependent covariates because of the curse of dimensionality created by Z(t) and a need for alternative smoothing techniques for estimating β0(t). In addition, when the dimension of Z(t) is high, the probabilities ρ(w) and π(w) can be modeled parametrically (Lu & Liang 2008). As a different approach, perhaps dimension-reduction techniques could be extended in conjunction with a partially linear model (Liang, Härdle & Carroll 1999) for ρ(w) and π(w).

Acknowledgments

The authors would like to thank the Editor (Paul Gustafson), the Associate Editor, two reviewers and Shyamal Peddada for their constructive and insightful comments and suggestions that greatly improved the paper. Xinyuan Song’s research was fully supported by two grants from the Research Grant Council of the Hong Kong Special Administration Region. Liuquan Sun’s research was fully supported by the National Natural Science Foundation of China Grants, the National Basic Research Program of China (973 Program) and Key Laboratory of RCSDS, CAS. Gregg Dinse’s research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences.

APPENDIX

We will use the same notation defined in the previous sections and assume that the following regularity conditions hold:

  • (C1) Λ0(τ) < ∞, Pr(Xτ) > 0, Z is bounded, and inf0tτ[λ0(t)+β0Z]>0 a.e.

  • (C2) The probability (density) f(w) of Wi is bounded away from 0, and has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.

  • (C3) The missingness probability ρ(w) is bounded away from 0, and has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.

  • (C4) The conditional probability π(w) has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.

  • (C5) A=E[0τYi(t){Ziz¯(t)}2dt] is nonsingular.

  • (C6) nh2r → 0 and nh2d → ∞, as n → ∞.

We give the proof of Theorem 3 and outline the proof of Theorem 1; Theorem 2 can be proven in the same manner. For notational convenience, we assume that all components of Wi are continuous in the following proof.

Proof of Theorem 3(i)

Substituting Λ̂a into equation (7), we find that β̂a is the solution to U (β) = 0, where

U(β)=i=1n0τ{ZiZ¯(t)}[ξiρ^(Wi)dNiu(t)+(1ξiρ^(Wi))π^(Wi)dNi(t)Yi(t)βZidt].

Let Mi(t)=Niu(t)0tYi(s){λ0(s)+β0Zi}ds. Then we can write

U(β0)=U1(β0)+U2(β0)+U3(β0), (A.1)

where

U1(β0)=i=1n0τ{ZiZ¯(t)}dMi(t),U2(β0)=i=1n0τ{ZiZ¯(t)}(ξiρ^(Wi)1)dNiu(t),

and

U3(β0)=i=1n0τ{ZiZ¯(t)}(1ξiρ^(Wi))π^(Wi)dNi(t).

Note that U1(β0) is a martingale integral. Thus, it follows that

U1(β0)=i=1n0τ{Ziz¯(t)}dMi(t)+op(n1/2). (A.2)

Define Φn(t)=n1i=1n(ρ^(Wi)1ξi1)Niu(t), and write Φn(t) = Φn1(t) + Φn2(t) + Φn3(t), where

Φn1(t)=n1i=1n(ξiρ(Wi))Niu(t)ρ(Wi),Φn2(t)=n1i=1n(ρ(Wi)ρ^(Wi))Niu(t)ρ(Wi),

and

Φn3(t)=n1i=1n(ξiρ^(Wi))(ρ(Wi)ρ^(Wi))Niu(t)ρ^(Wi)ρ(Wi).

By the uniform strong law of large numbers (Pollard, 1990), sup0≤tτn1(t)| = o(1) almost surely. It can be checked that

Φn2(t)=n2i=1nj=1n(ρ(Wi)ξj)Kh(WiWj)Niu(t)ρ(Wi)hdf^(Wi),

where (w) = (nhd)−1Kh(wWi), which is a kernel density estimate of f(w). By a Taylor expansion of 1/f̂(Wi) about f (Wi), Φn2(t) can be written as Φn21(t) − Φn22(t) + op(1), where

Φn21(t)=n2i=1nj=1n(ρ(Wi)ξj)Kh(WiWj)Niu(t)ρ(Wi)hdf(Wi),

and

Φn22(t)=n2i=1nj=1n(ρ(Wi)ξj)Kh(WiWj)Niu(t)(f^(Wi)f(Wi))ρ(Wi)hdf(Wi)2.

A straightforward calculation yields that En21(t)} = O(hr) → 0, and Varn21(t)} = O(h2r + (nh2d)−1) → 0, which imply Φn21(t) = op(1). Similarly, Φn22(t) = op(1), and thus it follows that Φn2(t) = op(1). Likewise, Φn3(t) = op(1). Therefore, we have Φn(t) = op(1). Note that Φn(t) is monotone and bounded in t. Consequently, we obtain

sup0tτΦn(t)=op(1). (A.3)

The functional central limit theorem (Pollard 1990) implies that

sup0tτZ¯(t)z¯(t)=Op(n1/2). (A.4)

Using (A.3) and (A.4), we have

0τ{Z¯(t)z¯(t)}dΦn(t)=op(n1/2).

Hence,

U2(β0)=i=1n0τ{Ziz¯(t)}(ξiρ^(Wi)1)dNiu(t)n0τ{Z¯(t)z¯(t)}dΦn(t)=i=1n(ξiρ^(Wi)1)δiBi+op(n1/2). (A.5)

In a similar manner, we obtain

U3(β0)=i=1n(1ξiρ^(Wi))π^(Wi)Bi+op(n1/2). (A.6)

Thus, it follows from (A.1), (A.2), (A.5) and (A.6) that

U(β0)=i=1n0τ{Ziz¯(t)}dMi(t)+i=1n(ξiρ^(Wi)1)(δiπ^(Wi))Bi+op(n1/2)=i=1n0τ{Ziz¯(t)}dMi(t)+i=1n(ξiρ(Wi)1)(δiπ(Wi))Bi+Rn1+Rn2+op(n1/2),

where

Rn1=i=1n(1ξiρ(Wi))(π^(Wi)π(Wi))Bi,

and

Rn2=i=1n(ρ^(Wi)ρ(Wi))(π^(Wi)δi)ξiBiρ^(Wi)ρ(Wi).

Let m(w) = ρ(w)f(w) and m^(w)=(nhd)1i=1nξiKh(wWi). Then by the Taylor expansion of 1/m̂Wi) at m(Wi), we can write

i=1n(π^(Wi)π(Wi))Bi=Rn11+Rn12+op(n1/2),

where

Rn11=n1i=1nj=1nξj(δjπ(Wi))Kh(WiWj)Bihdm(Wi),

and

Rn12=n1i=1nj=1nξj(δjπ(Wi))Kh(WiWj)(m^(Wi)m(Wi))Bihdm(Wi)2.

Define

Rn11=n1/2Rn11n1/2i=1nξi(δiπ(Wi))Biρ(Wi).

Some straightforward calculation gives E{Rn11}=O(n1/2hr)0, and Var{Rn11}=O(nh2r+(nh2)1)0, which imply that

Rn11=i=1nξi(δiπi(Wi))Biρ(Wi)+op(n1/2).

Similarly, we have Rn12 = op(n1/2), and thus

i=1n(π^(Wi)π(Wi))Bi=i=1nξi(δiπi(Wi))Biρ(Wi)+op(n1/2). (A.7)

In a similar manner, we obtain

i=1nξiρ(Wi)(π^(Wi)π(Wi))Bi=i=1nξi(δiπi(Wi))Biρ(Wi)+op(n1/2). (A.8)

It follows from (A.7) and (A.8) that Rn1 = op(n1/2). Likewise, Rn2 = op(n1/2). Thus,

U(β0)=i=1n0τ{Ziz¯(t)}dMi(t)+i=1n(ξiρ(Wi)1)(δiπ(Wi))Bi+op(n1/2). (A.9)

The law of large numbers and the multivariate central limit theorem show that n−1U(β0) → 0 in probability and n−1/2U(β0) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ*. Note that

β^aβ0=U(β)βU(β0),

and

n1U(β)β=n1i=1n0τYi(t){ZiZ¯(t)}2dtA

almost surely by the uniform strong law of large numbers (Pollard 1990). Then it follows from (A.9) that β̂a is consistent and n1/2(β̂aβ0) is asymptotically normal with mean zero and covariance matrix V = A−1(Σ + Σ*)A−1.

Proof of Theorem 3(ii)

First write

Λ^a(t)Λ0(t)=0ti=1ndMi(s)i=1nYi(s)(β^aβ0)0tZ¯(s)ds+i=1n(ξiρ^(Wi)1)(δiπ^(Wi))0tdNi(s)i=1nYi(s).

Note that

sup0tτ|n1i=1nYi(t)E[Y1(t)]|=Op(n1/2).

Following similar arguments as in the proof of (i), we obtain

Λ^a(t)Λ0(t)=n10ti=1ndMi(s)E{Yi(s)}d(t)(β^aβ0)+n1i=1n(ξiρ(Wi)1)(δiπ(Wi))0tdNi(s)E{Yi(s)}+op(n1/2) (A.10)

uniformly on [0, τ]. In view of the consistency of β̂a, it follows from the uniform strong law of large numbers and the multivariate central limit theorem that sup0≤tτ |Λ̂a(t) − Λ0(t)| → 0 in probability, and n1/2{Λ̂a(t) − Λ0(t)} converges in finite dimensional distributions to a zero-mean Gaussian process. The first term on the right-hand side of (A.10) is tight as it is a martingale integral. The second term is tight because n1/2(β̂aβ0) converges in distribution and d(t) is a deterministic function. Note that for each i, (ξiρ(Wi)−1 − 1) (δiπ(Wi)) 0tdNi(s)/E{Yi(s)} can be written as sums of monotone processes. Then the tightness of the third term follows from Example 2.11.16 of van der Vaart & Wellner (1996). Thus, n1/2{Λ̂a(t)− Λ0(t)} is tight and converges weakly to a zero-mean Gaussian process with covariance function Γ(s, t) at (s, t).

Outlined proof of Theorem 1

Note that β̂ is the solution to U*(β) = 0, where

U(β)=i=1n0τξiρ^(Wi){ZiZ¯(t)}[dNiu(t)Yi(t)βZidt].

Then it can be checked that

U(β0)=U1(β0)+U2(β0), (A.11)

where

U1(β0)=i=1n0τξiρ(Wi){ZiZ¯(t)}dMi(t),

and

U2(β0)=i=1n0τξi(1ρ^(Wi)1ρ(Wi)){ZiZ¯(t)}dMi(t).

Similarly to (A.2), we get

U1(β0)=i=1n0τξiρ(Wi){Ziz¯(t)}dMi(t)+op(n1/2). (A.12)

From an argument similar to that in the proof of (A.7), we have

U2(β0)=i=1n0τξiρ(Wi)ρ(Wi){Ziz¯(t)}dMi(t)+i=1n(ξiρ(Wi)1)(δiπ(Wi))Bi+op(n1/2). (A.13)

It follows from (A.11)–(A.13) that

U(β0)=i=1n0τ{Ziz¯(t)}dMi(t)+i=1n(ξiρ(Wi)1)(δiπ(Wi))Bi+op(n1/2),

which implies that n−1/2U*(β0) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ*. Then it follows from the Taylor expansion of U*(β̂) that n1/2(β̂β0) is asymptotically normal with mean zero and covariance matrix V = A−1(Σ + Σ*)A−1.

Footnotes

MSC 2000: Primary 62N01; secondary 62G05.

Contributor Information

Xinyuan SONG, Email: xysong@sta.cuhk.edu.hk, Department of Statistics, Shatin, N. T., Hong Kong, P. R. China.

Liuquan SUN, Email: slq@amt.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.

Xiaoyun MU, Email: muxy@amss.ac.cn, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.

Gregg E. DINSE, Email: dinse@niehs.nih.gov, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA

References

  1. Aalen OO. A model for nonparametric regression analysis of counting processes. In: Klonecki W, Kozek A, Rosinski J, editors. Mathematical Statistics and Probability Theory, Lecture Notes in Statistics. 2. Springer-Verlag; New York: 1980. pp. 1–25. [Google Scholar]
  2. Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed] [Google Scholar]
  3. Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996. [Google Scholar]
  4. Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891. [Google Scholar]
  5. Gasser T, Müller HG, Mammitzsch V. Kernels for nonparametric curve estimation. Journal of the Royal Statistical Society Series B. 1985;47:238–252. [Google Scholar]
  6. Gijbels I, Lin D, Ying Z. Tech Report 039–93. Mathematical Sciences Research Institute; Berkeley: 1993. Non- and semi-parametric analysis of failure time data with missing failure indicators. [Google Scholar]
  7. Gijbels I, Lin D, Ying Z. Non- and semi-parametric analysis of failure time data with missing failure indicators. IMS Lecture Notes-Monograph Series. Inverse Problems: Tomography, Networks and Beyond. 2007;54:203–223. [Google Scholar]
  8. Goetghebeur EJ, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833. [Google Scholar]
  9. Hall P, Marron JS. Choice of kernel order in density estimation. The Annals of Statistics. 1988;16:161–173. [Google Scholar]
  10. van der Laan MJ, McKeague IW. Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics. 1998;26:164–182. [Google Scholar]
  11. Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-invariable model. The Annals of Statistics. 1999;27:1519–1535. [Google Scholar]
  12. Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71. [Google Scholar]
  13. Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; New York: 1987. [Google Scholar]
  14. Lo S-H. Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis. 1991;39:217–235. [Google Scholar]
  15. Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. doi: 10.1111/j.0006-341x.2001.01191.x. [DOI] [PubMed] [Google Scholar]
  16. Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;18:219–234. [Google Scholar]
  17. McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81:501–514. [Google Scholar]
  18. McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics. 1998;25:589–601. [Google Scholar]
  19. Müller HG. Smooth optimum kernel estimators of densities, regression curves and modes. The Annals of Statistics. 1984;12:766–774. [Google Scholar]
  20. Pollard D. Empirical Processes: Theory and Applications. Institute of Mathematical Statistics; Hayward, California: 1990. [Google Scholar]
  21. Qi L, Wang CY, Prentice RL. Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association. 2005;100:1250–1263. [Google Scholar]
  22. Subramanian S. Efficient estimation of regression coefficients and baseline hazard under proportionality of conditional hazards. Journal of Statistical Planning and Inference. 2000;84:81–94. [Google Scholar]
  23. Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics. 2004;16:797–817. [Google Scholar]
  24. Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology. 2006;3:125–136. doi: 10.1016/j.stamet.2005.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Spring-Verlag; New York: 1996. [Google Scholar]
  26. Wand MP, Schucany WR. Gaussian-based kernels. The Canadian Journal of Statistics. 1990;18:197–204. [Google Scholar]
  27. Wang CY, Chen HY. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57:414–419. doi: 10.1111/j.0006-341x.2001.00414.x. [DOI] [PubMed] [Google Scholar]
  28. Wang Q, Ng KW. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Statistica Sinica. 2008;18:749–768. [Google Scholar]
  29. Zhou X, Sun L. Additive hazards regression with missing censoring information. Statistica Sinica. 2003;13:1237–1257. [Google Scholar]

RESOURCES