Cox Regression with Survival-Time-Dependent Missing Covariate Values

Yanyao Yi; Ting Ye; Menggang Yu; Jun Shao

doi:10.1111/biom.13155

. Author manuscript; available in PMC: 2020 Jun 14.

Published in final edited form as: Biometrics. 2019 Nov 18;76(2):460–471. doi: 10.1111/biom.13155

Cox Regression with Survival-Time-Dependent Missing Covariate Values

Yanyao Yi ^1,^2,^3,^*, Ting Ye ^2,^3,^**, Menggang Yu ^3,^***, Jun Shao ^1,^2,^****

PMCID: PMC7145010 NIHMSID: NIHMS1051869 PMID: 31549744

Summary:

Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values and the missing at random assumption is commonly adopted (e.g., Qi et al., 2005), which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time (Rathouz, 2007). This is especially so when covariate missingness is related with an unmeasured variable affected by patient’s illness and prognosis factors at baseline. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite-sample performance is examined through simulation. An application to a real-data example is included for illustration.

Keywords: Censoring, Missing not at random, Nonparametric kernel estimator, Propensity

1. Introduction

Cox regression is one of the most popular methods dealing with censored failure time in survival analysis. For a continuous failure time T and covariate vector V measured at baseline, in this paper we consider the following Cox proportional hazard model,

λ (t | V) = λ_{0} (t) \exp (θ^{⊤} V),

(1)

where λ(t|V ) is the hazard at time t given V , λ₀(t) is an unspecified baseline hazard function common for all subjects, θ is a vector of unknown parameters, and θ ^⊤ is its transpose. In many survival studies, there exists censoring. In this paper, we focus on right censoring, i.e., there is a censoring time C and what we observe is (T ∧ C, δ), where T ∧ C = min(T, C) and $δ = I_{{T ⩽ C}}$ is the indicator of event T ⩽ C. A common assumption on censoring is

T ⊥ C | V,

(2)

i.e., T and C are conditionally independent given V . Based on a random sample from the distribution of (T ∧ C, δ,V ), θ can be estimated by maximizing the partial likelihood derived in Cox (1975) under (1) and (2), and the asymptotic properties of this estimator can be found in Andersen and Gill (1982).

In clinical and epidemiological studies some components of the covariate vector V may have missing data and the partial likelihood cannot be directly applied. Let V = (X,Z) with X being the sub-vector that may have missing values and Z being the sub-vector that is always observed, and let R be the indicator equaling 1 if X is completely observed and 0 if at least one component of X is missing. As pointed out in Paik and Tsai (1997), Lipsitz and Ibrahim (1998), and Rathouz (2007), the complete-case analysis with the partial likelihood based only on subjects with R = 1 is valid if R ⊥ (T, C) | V , i.e., missingness depends only on V , not on outcome (T,C), although missingness may be not at random and methods more efficient than complete-case analysis can be derived, e.g., Lin and Ying (1993) and Cook et al. (2011).

However, missingness of covariate values is often believed to be outcome related, either directly or indirectly. In survival studies, some researchers assume that missingness is T ∧ C related, i.e.,

R ⊥ (X, T, C) | (T \land C, Z, δ)

(3)

(e.g., Lipsitz and Ibrahim, 1998; Chen and Little, 1999; Herring and Ibrahim, 2001; Chen, 2002), which is a type of missing at random assumption (Rubin, 1976) because T ∧ C, Z and δ are all observed. Even if R is related with (T,C), however, it is hard to imagine why R is related to T ∧ C, a very special function of the outcome (T,C). We speculate that (3) is assumed for an easy analysis, since one can simply use methods valid under missing at random, e.g., inverse propensity weighting with estimated pr(R = 1 | T ∧ C,Z, δ) based on always observed (T ∧ C,Z, δ) (Wang and Chen, 2001; Qi et al., 2005; Xu et al., 2009).

Rathouz (2007) argues that the following survival-time-dependent missingness mechanism is more reasonable than assumption (3) in many biomedical studies,

R ⊥ (X, C) | (T, Z) .

(4)

One scenario in which (4) holds is when there exits an unmeasured variable U prior to the measurement of X that satisfies

R ⊥ (T, C, X) | (U, Z) and U ⊥ (C, X) | (T, Z) .

(5)

The first condition in (5) means that missingness of X is driven by U together with observed Z, whereas the second condition in (5) says that U is not related with (C,X) when (T,Z) is given. The directed acyclic graph in Figure 1 provides an illustration. It is also shown in the Supporting Information that (5) implies (4) and

pr (R = 1 | T, Z) = E {pr (R = 1 | U, Z) | T, C, V} .

(6)

Although pr(R = 1 | T,Z) may be hard to interpret since T is a future observation while R is observed at baseline, it is a constructed missingness propensity for analysis in which T is used as a surrogate in (6) for the unmeasured variable U. An example of U could be a subjective assessment, as the following example indicates.

Figure 1. — Directed acyclic graph for a scenario where (5) holds

Example 1: In our real data example analyzed in Section 4 based on the national cancer database, all patients were diagnosed with stage III Non-Small Cell Lung Cancer at baseline, but only about 40% patients had more accurate tumor stage X recorded as either stage IIIA or stage IIIB. From the lung cancer staging system provided by the American Joint Committee on Cancer, the main difference between stage IIIA and stage IIIB is the nodal involvement, i.e., stage IIIB has much more extensive metastasis in regional lymph nodes regardless of tumor size. Measuring X is difficult and may require invasive techniques. Recent advanced non-invasive methods such as positron emission tomography or compute tomography do not provide definitive stage confirmation, since the region around the lung to be examined for lymph node involvement includes the superior mediastinum, the lower mediastinum, the aortopulmonary window, and para-aorta (Teran and Brock, 2014). Thus, a major reason for missing X in this example is physician’s assessment on whether an invasive test for a difficult measurement of X is worthwhile. This somewhat subjective assessment could be the variable U in (5) and Figure 1, which is affected by patient’s prognosis factors in Z and illness related to the eventual survival time T, but is unlikely to be directly related with censoring time C. Thus, assumption (4) is more reasonable than assumption (3) for this real data example.

Since T may be censored, the missingness mechanism (4) is not missing at random. If (4) holds instead of (3), then estimators in Wang and Chen (2001), Qi et al. (2005) and Xu et al. (2009) will yield a biased result. Also, if (4) holds and T cannot be excluded from the missingness propensity, then the complete-case analysis is inconsistent.

Although the survival function is identifiable under (4) and censoring assumption (2) (Rathouz, 2007), there is no proposed method for estimating θ in Rathouz (2007) and afterwards. The main challenge is that pr(R = 1 | T,Z) cannot be directly estimated when T is censored.

In this article, we construct two inverse probability weighting estimators of θ in the Cox proportional hazard model (1) under assumptions (2) and (4). The first one is based on a weighted score function using only subjects with observed survival time and completely observed covariates. The first estimator may not be efficient since only data with (δ,R) = (1, 1) are involved in the score function, but this estimator is used as an initial estimator in the construction of a more efficient second estimator based on a weighted score function using all data with R = 1, censored or not. The major step in obtaining the second estimator is to overcome the difficulty in constructing suitable weights when T is censored with the help from the first estimator. In the construction of weights, we adopt nonparametric estimation of propensity, since a parametric form of propensity is hard to specify especially when the propensity is obtained through (6) by averaging over an unobserved variable U satisfying (5). The product kernel in Racine and Li (2004) is applied to handle both continuous and categorical covariates.

Both proposed estimators of θ are shown to be consistent and asymptotically normal under (1), (2), (4), and some regularity conditions, and their performance is examined in Section 3 by simulation. In Section 4, we return to the real data example of Non-Small Cell Lung Cancer to illustrate our procedures. All technical details are given in the Supporting Information.

2. Method

Under (2) and (4), we consider estimating θ in (1) based on a random sample (T_i ∧ C_i, δ_i,V_i,R_i), i = 1, …, n, from the population of (T ∧C, δ,V,R). For each i, Z_i is observed but X_i is completely observed if and only if R_i = 1. To illustrate the idea, in this section we do not use partially observed X when X is multivariate. An extension of using partially observed X is given in Section 5.

2.1. Doubly Weighted Estimator

We first derive an estimator of θ that is consistent and asymptotically normal under some conditions. It serves as an initial estimator in constructing a more efficient estimator in Section 2.2. Under (4), pr(R = 1|T,C,V) is a function of (T,Z) only. Define

π_{1} (T, Z) = pr (R = 1 | T, C, V) and ψ (T, V) = pr (δ = 1 | T, V) .

(7)

Assume first that π₁ and ψ in (7) are known functions. Based on the fact that, without censoring,

M^{*} (t) = I_{{T ⩽ t}} - \int_{0}^{t} I_{{T ⩾ u}} λ_{0} (u) \exp (θ^{⊤} V) d u

(8)

is a mean zero martingale with respect to filtration $F_{t}^{*} = {I_{{T ⩾ u}}, V : 0 ⩽ u ⩽ t}$ (Fleming and Harrington, 1991), our first proposed estimator ${\hat{θ}}_{1}$ of θ is obtained by using data from subjects with observed X (R = 1) and non-censored T (δ = 1), and solving the following weighted estimation equation:

\sum_{i = 1}^{n} \int_{0}^{\infty} \frac{R_{i} δ_{i}}{π_{1} (T_{i}, Z_{i}) ψ (T_{i}, V_{i})} {V_{i} - \frac{S_{1}^{(1)} (θ, t, π_{1}, ψ)}{S_{1}^{(0)} (θ, t, π_{1}, ψ)}} d N_{i} (t) = 0,

(9)

where $N_{i} (t) = δ_{i} I_{{T_{i} ⩽ t}}$ is the counting process,

S_{1}^{(1)} (θ, t, π_{1}, ψ) = \frac{1}{n} \sum_{i = 1}^{n} \frac{R_{i} δ_{i}}{π_{1} (T_{i}, Z_{i}) ψ (T_{i}, V_{i})} I_{{T_{i} ⩾ t}} V_{i} \exp (θ^{⊤} V_{i}),

S_{1}^{(0)} (θ, t, π_{1}, ψ) = \frac{1}{n} \sum_{i = 1}^{n} \frac{R_{i} δ_{i}}{π_{1} (T_{i}, Z_{i}) ψ (T_{i}, V_{i})} I_{{T_{i} ⩾ t}} \exp (θ^{⊤} V_{i}) .

Two inverse probability weights are applied in (9), i.e., π₁(T,Z) for missing X value, which is commonly used in missing data problems (Robins, 1997), and ψ(T,V ) for censoring (Robins and Finkelstein, 2000).

By differentiating the left hand side of (9) with respect to θ and applying the Cauchy-Schwartz inequality, we can show that the left hand side of (9) is the derivative of a convex function of θ and, hence, (9) can be easily solved. Once θ is estimated by ${\hat{θ}}_{1}$ , the following Breslow type semi-parametric estimator of the cumulative baseline hazard $Λ_{0} (t) = \int_{0}^{t} λ_{0} (u) d u$ can be obtained:

{\hat{Λ}}_{01} (t) = \int_{0}^{t} \frac{1}{n S_{1}^{(0)} ({\hat{θ}}_{1}, u, π_{1}, ψ)} \sum_{i = 1}^{n} \frac{R_{i} δ_{i} d N_{i} (u)}{π_{1} (T_{i}, Z_{i}) ψ (T_{i}, V_{i})}

(10)

(Fleming and Harrington, 1991).

Since π₁ and ψ in (7) are usually unknown, in the rest of this subsection we consider their estimation. Since π₁(T,Z) depends on (T,Z), not (C,X), the subset of non-censored data are used in estimating π₁, which makes use of data from all subjects with δ = 1 even if X is missing. A parametric form of pr(R = 1 | T,Z) is hard to specify, especially when it is derived from pr(R = 1 | T,Z) = E{pr(R = 1 | U,Z) | T,Z} with an unobserved U and nonparametric p(U | T,Z). To obtain robust estimators, here we consider nonparametric product kernel regression (Racine and Li, 2004) estimator of π₁(T,Z), which is given by

{\hat{π}}_{1} (T, Z) = \frac{\sum_{i} R_{i} δ_{i} K_{h} ((T_{i}, Z_{i}), (T, Z))}{\sum_{i} δ_{i} K_{h} ((T_{i}, Z_{i}), (T, Z))},

(11)

where K_h is the product of a kernel for T and continuous components of Z and a kernel for discrete components of Z (Racine and Li, 2004), h = (h_c, h_d), and h_c and h_d are bandwidths for continuous and discrete kernels, respectively, selected by cross validation according to Racine and Li (2004).

For the censoring propensity ψ(T,V ), it follows from assumption (2) that

ψ (T, V) = pr (δ = 1 | T, V) = pr (C > T | V) = S_{C | V} (T),

where $S_{C | V} (t)$ is the conditional “survival” function of the censoring time C given V . Note that

pr (C ⩽ c, R = r, T ⩽ t | V) = E {pr (C ⩽ c, R = r, T ⩽ t | T, V) | V} = E {pr (C ⩽ c | T, V) pr (R = r, T ⩽ t | T, V) | V} = E {pr (C ⩽ c | V) pr (R = r, T ⩽ t | T, V) | V} = pr (C ⩽ c | V) pr (R = r, T ⩽ t | V),

where the second equation is based on the fact that assumption (4) and Proposition 1.11 in Shao (2003) imply that $R ⊥ C | (T, X, Z)$ , and the third equation follows from assumption (2). Consequently, $C ⊥ (R, T) | V$ . Thus, the conditional survival function $S_{C | V} (t)$ can be estimated by using the subset of data with R = 1. Treating C ∧ T as the observed time and 1 − δ as the “event status” for censoring time C, we estimate $S_{C | V} (t)$ using a Cox proportional hazard model similar to Model (1): $λ_{C} (t | V) = λ_{0 C} (t) \exp (β^{⊤} V)$ . Then β is estimated by $\hat{β}$ which is the solution to

\sum_{i = 1}^{n} R_{i} \int_{0}^{\infty} {V_{i} - \frac{S_{δ}^{(1)} (β, t)}{S_{δ}^{(0)} (β, t)}} d N_{i}^{δ} (t) = 0,

with $N_{i}^{δ} (t) = (1 - δ_{i}) I_{{C_{i} ⩽ t}}$ , $S_{δ}^{(0)} (β, t) = n^{- 1} \sum_{i = 1}^{n} R_{i} Y_{i} (t) \exp (β^{⊤} V_{i})$ and $S_{δ}^{(1)} (β, t) = n^{- 1} \sum_{i = 1}^{n} R_{i} Y_{i} (t) V_{i} \exp (β^{⊤} V_{i})$ . The estimated cumulative baseline hazard is

{\hat{Λ}}_{0 C} (t) = \int_{0}^{t} \frac{1}{n S_{δ}^{(0)} (\hat{β}, u)} \sum_{i = 1}^{n} R_{i} d N_{i}^{δ} (u) .

Consequently, the estimator of ψ(T,V ) has the form

\hat{ψ} (T, V) = {\hat{S}}_{C | V} (T) = \exp {- \exp ({\hat{β}}^{⊤} V) {\hat{Λ}}_{0 C} (T)} .

Estimators ${\hat{θ}}_{1}$ of θ and ${\hat{λ}}_{01}$ of λ₀ can be obtained by using (9) and (10) with π₁ and ψ replaced by their estimators ${\hat{π}}_{1}$ and $\hat{ψ}$ , respectively. We call ${\hat{θ}}_{1}$ a doubly weighted estimator (DWE) as two weight functions ${\hat{π}}_{1}$ and $\hat{ψ}$ are used.

The following result establishes the consistency and asymptotic normality of ${\hat{θ}}_{1}$ . The proof is given in the Supporting Information.

Theorem 1: Assume the following regularity conditions.

Condition 1. V is time-independent and bounded.

Condition 2. $Ω^{*} = \int_{0}^{\infty} [s_{1}^{(2)} (θ, t) - s_{1}^{(1)} (θ, t) {s_{1}^{(1)} (θ, t)}^{⊤} {s_{1}^{(0)} (θ, t)}^{- 1}] λ_{0} (t) d t$ is a positive definite matrix, where $s_{1}^{(0)} (θ, t) = E [I_{{T ⩾ t}} \exp (θ^{⊤} V)]$ , $s_{1}^{(1)} (θ, t) = E [I_{{T ⩾ t}} V \exp (θ^{⊤} V)]$ , and $s_{1}^{(2)} (θ, t) = E [I_{{T ⩾ t}} V V^{⊤} \exp (θ^{⊤} V)]$ .

Condition 3. Let L denote the kernel function for continuous components in K_h and define $L_{h_{c}} (\cdot) = L (\cdot / h_{c})$ . L is bounded, symmetric, Lipschitz continuous with for $\int L (u) d u = 1$ , $\int u^{m} L (u) d u = 0$ for $m = 1, \dots, (r - 1)$ , and $\int u^{r} L (u) d u \neq 0$ .

Condition 4. $n h_{c}^{2 r} \to 0$ and $n h_{c}^{2 d} \to \infty$ as $n \to \infty$ , where d denotes the dimension of continuous components of (T,Z).

Condition 5. π₁(T,Z) has r continuous and bounded partial derivatives with respect to the continuous components of (T,Z) and $π_{1} (T, Z) ⩾ ϵ_{0}$ for any T and V , where ϵ₀ > 0 is a constant.

Then, as n → ∞,

\sqrt{n} ({\hat{θ}}_{1} - θ) \overset{d}{\to} N (0, Σ_{1}),

where $\overset{d}{\to}$ denotes convergence in distribution,

Σ_{1} = Ω^{* - 1} var [\frac{R δ}{π_{1} (T, Z) ψ (T, V)} M_{V}^{*} + {1 - \frac{R}{π_{1} (T, Z)}} E (M_{V}^{*} | T, Z) + Γ (R M_{V}^{δ})] Ω^{* - 1},

(12)

Ω^∗ is defined in Condition 2, $M_{V}^{*} = \int_{0}^{\infty} [V - s_{1}^{(1)} (θ, t) {s_{1}^{(0)} (θ, t)}^{- 1}] d M^{*} (t)$ and $M_{V}^{δ} = \int_{0}^{\infty} [V - s_{δ}^{(1)} (β, t) {s_{δ}^{(0)} (β, t)}^{- 1}] d M^{δ} (t)$ are both mean zero martingale transformations, $M^{δ} (t) = N^{δ} (t) - \int_{0}^{t} Y (u) λ_{0 C} (u) \exp (β^{⊤} V) d u$ , $s_{δ}^{(0)} (β, t) = E {R Y (t) \exp (β^{⊤} V)}$ , $s_{δ}^{(1)} (β, t) = E {R Y (t) \exp (β^{⊤} V)}$ and $Γ = E {Λ_{0 C} (T) \exp (β^{⊤} V) M_{V}^{*} V^{⊤}} {var (R M_{V}^{δ})}^{- 1}$ .

The quantity $Γ (R M_{V}^{δ})$ in (12) is due to the estimation of ψ defined in (7). Since the three quantities inside of the variance given in (12) can be correlated, the form of Σ₁ is complicated. Based on Theorem 1, Σ₁ can be estimated by the bootstrap, which is used in our simulation studies.

2.2. Compositely Weighted Estimator

In the construction of ${\hat{θ}}_{1}$ , although we utilize some incomplete data in the estimator of π₁ given in (11), the weighted score function (9) is derived from the martingale M^∗(t) in (8) that only involves non-censored subjects. Consequently, ${\hat{θ}}_{1}$ may be inefficient and, in particular, ${\hat{θ}}_{1}$ does not reduce to the usual maximum partial likelihood estimator of θ when there is no missing value.

The partial likelihood for the case of no missing data is based on the martingale

M (t) = N (t) - \int_{0}^{t} Y (u) λ_{0} (u) \exp (θ^{⊤} V) d u

(13)

with respect to the filtration $F_{t} = {I_{{T < u, δ = 1}}, I_{{T \land C ⩾ u}}, V : 0 ⩽ u ⩽ t}$ (Fleming and Harrington, 1991), where $Y (t) = I_{{T \land C ⩾ t}}$ is the at-risk process. Unlike the martingale M^∗(t) in (8), M(t) in (13) involves both non-censored and censored subjects. If we can derive a weighted score function for θ from M(t) in the presence of missing covariate values, then the estimation may be more efficient than the one based on M^∗(t). Specifically, we consider the following weighted estimation equation,

\sum_{i = 1}^{n} \int_{0}^{\infty} \frac{R_{i}}{π_{1} (T_{i}, Z_{i})} {V_{i} - \frac{S^{(1)} (θ, t, π_{1}, π_{0})}{S^{(0)} (θ, t, π_{1}, π_{0})}} d N_{i} (t) = 0

(14)

where

S^{(1)} (θ, t, π_{1}, π_{0}) = \frac{1}{n} \sum_{i = 1}^{n} {\frac{R_{i} δ_{i} I_{{T_{i} ⩾ t}}}{π_{1} (T_{i}, Z_{i})} + \frac{R_{i} (1 - δ_{i}) I_{{C_{i} ⩾ t}}}{π_{0} (C_{i}, V_{i})}} V_{i} \exp (θ^{⊤} V_{i})

S^{(0)} (θ, t, π_{1}, π_{0}) = \frac{1}{n} {\sum_{i = 1}^{n} {\frac{R_{i} δ_{i} I_{{T_{i} ⩾ t}}}{π_{1} (T_{i}, Z_{i})} + \frac{R_{i} (1 - δ_{i}) I_{{C_{i} ⩾ t}}}{π_{0} (C_{i}, V_{i})}}}_{i} \exp (θ^{⊤} V_{i})

π₁(T,Z) is given in (7), and π₀(C,V ) = pr(R = 1|C,V , δ = 0) is the inverse weight for R when δ = 0. Our proposed second estimator ${\hat{θ}}_{2}$ of θ is a solution to (14) after we substitute π₁ and π₀ by their estimators ${\hat{π}}_{1}$ and ${\hat{π}}_{0}$ , respectively. An estimator of Λ₀ (t) is then

{\hat{Λ}}_{02} (t) = \int_{0}^{t} \frac{1}{n S^{(0)} ({\hat{θ}}_{2}, u, {\hat{π}}_{1}, {\hat{π}}_{0})} \sum_{i = 1}^{n} \frac{R_{i} d N_{i} (u)}{{\hat{π}}_{1} (T_{i}, Z_{i})} .

The same estimator ${\hat{π}}_{1}$ defined in (11) can be used to estimate π₁. It remains to derive an estimator of π₀(C,V ). Under assumption (4), missingness of X values depends on T which is not observed when δ = 0. Thus, π₀(C,V ) = pr(R = 1|C,V , δ = 0) is a function of the entire V containing possibly missing values of X and its estimation is challenging. Note that

π_{0} (C, V) = pr (R = 1 | C, V, δ = 0) = \frac{pr (R = 1, δ = 0 | C, V)}{pr (δ = 0 | C, V)} .

(15)

Under assumption (2),

pr (δ = 0 | C, V) = pr (T > C | C, V) = S_{T | V} (C),

(16)

where $S_{T | V} (\cdot)$ denotes the survival function of T given V . Moreover, assumptions (2) and (4) imply $R ⊥ C | (T, V)$ and, therefore,

pr (R = 1, δ = 0 | C, V) = - \int_{C}^{\infty} π_{1} (t, Z) d S_{T | V} (t) .

(17)

Hence, the survival function $S_{T | V} (\cdot)$ is the only unknown part we need to estimate in order to estimate π₀(C,V ). One way to do this is using the estimators ${\hat{θ}}_{1}$ and ${\hat{Λ}}_{01} (t)$ in Section 2.1 to construct an initial estimator of the survival function,

{\hat{S}}_{T | V} (t) = \exp {- \exp ({\hat{θ}}_{1}^{⊤} V) {\hat{Λ}}_{01} (t)} .

(18)

Then, it follows from (15)-(18) that we can estimate π₀(C,V ) by

{\hat{π}}_{0} (C, V) = - \frac{\int_{C}^{\infty} {\hat{π}}_{1} (t, Z) d {\hat{S}}_{T | V} (t)}{{\hat{S}}_{T | V} (C)} .

(19)

In any application with a finite n, we set ${\hat{S}}_{T | V} (t) = 0$ when $t ⩾ \max {T_{i} \land C_{i} : i = 1, \dots, n}$ so that the integral in (19) is finite. This does not affect the asymptotic result, since ${\hat{S}}_{T | V} (t)$ converges to $S_{T | V} (t)$ almost surely. If there is no missing value, R_i = 1 for all i, then it follows from (11) that ${\hat{π}}_{1} \equiv 1$ and, hence, ${\hat{π}}_{0} \equiv 1$ and ${\hat{θ}}_{2}$ reduces to the usual maximum partial likelihood estimator.

We call ${\hat{θ}}_{2}$ a compositely weighted estimator (CWE) since the weight function π₀ involves both the weight function π₁ and the survival function of T. The following result for the consistency and asymptotic normality of ${\hat{θ}}_{2}$ is proved in the Supporting Information.

Theorem 2: Assume that Conditions 1–5 stated in Theorem 1 hold with Ω^∗ in Condition 2 replaced by $Ω = \int_{0}^{\infty} [s^{(2)} (θ, t) - s^{(1)} (θ, t) {s^{(1)} (θ, t)}^{⊤} {s^{(0)} (θ, t)}^{- 1}] λ_{0} (t) d t$ , where $s^{(0)} (θ, t) = E {Y (t) \exp (θ^{⊤} V)}$ , $s^{(1)} (θ, t) = E [{Y (t) V \exp (θ^{⊤} V)}]$ , and $s^{(2)} (θ, t) = E {Y (t) V V^{⊤} \exp (θ^{⊤} V)}$ . Besides, we assume that the support of survival time T is a subset of the support of censoring time C. Then, as n → ∞,

\sqrt{n} ({\hat{θ}}_{2} - θ) \overset{d}{\to} N (0, Σ_{2}),

where

Σ_{2} = Ω^{- 1} + Ω^{- 1} E {(π_{1}^{- 1} - 1) var (M_{V} | T, Z)} Ω^{- 1},

(20)

and $M_{V} = \int_{0}^{\infty} [V - s^{(1)} (θ, t) {s^{(0)} (θ, t)}^{- 1}] d M (t)$ is the martingale transformation with respect to mean-zero martingale M(t).

Note that $Ω^{- 1} = {var (M_{V})}^{- 1}$ is the asymptotic covariance matrix of the estimator of θ obtained under the usual Cox regression without missing data. Thus, the second term on the right hand side of (20) is the efficiency loss due to missing covariate values compared with the estimator derived from the Cox regression without missing data. This term does not depend on π₀, i.e., as long as ${\hat{π}}_{0}$ is consistent, the estimation of π₀ does not affect the efficiency of ${\hat{θ}}_{2}$ .

3. Simulation

Simulations are conducted in three different settings to examine the finite-sample performance of our proposed DWE and CWE of θ, and to compare them with the estimator computed without missing covariate values (Full) as a standard, the complete-case (CC) estimator, and the simple weighted estimator (SWE) in Qi et al. (2005). As we discussed in Section 1, CC estimator and SWE may be inconsistent under assumption (4).

3.1. Simulation Settings

The details about data generating for three simulation settings are given in Tables 1–3, respectively. The time-independent covariate vector is V = (X,Z) with independent univariate X and Z in settings 1–2, and V = (X,Z₁,Z₂) with correlated univariate X and bivariate (Z₁,Z₂) in setting 3. The survival time T follows the Cox proportional hazard model (1) with baseline hazard λ₀(t), which is equal to 1 in settings 1–2 and t/2 in setting 3, and with true values of θ specified in Tables 1–3. The true propensity for missing X values is logistic depending only on T in setting 1, on T and Z in setting 2, and on T and Z₂ in setting 3, as described in Tables 1–3. The censoring time C follows another Cox proportional hazard model, which depends on no covariate in setting 1, on X in setting 2, and on (X,Z₁) in setting 3. The rates of censoring and missing vary from 37% to 82% and 29% to 47%, respectively, and details are included in Tables 1–3.

Table 1.

Simulation bias, SD, SE, and CP based on 2000 runs for estimation of θ under setting 1

				Estimation of θ_x				Estimation of θ_z
n	Method	Variables used in ${\hat{π}}_{1}$		Bias	SD	SE	CP	Bias	SD	SE	CP
500	Full			0.002	0.075	0.079	0.945	0.000	0.131	0.135	0.946
	CC			0.151	0.115	0.122	0.663	0.149	0.195	0.204	0.864
	SWE	T ∧ C		0.092	0.113	0.118	0.845	0.087	0.189	0.196	0.925
	DWE	T		−0.015	0.131	0.134	0.950	−0.015	0.224	0.224	0.946
	CWE	T		0.018	0.114	0.120	0.947	0.009	0.185	0.192	0.947

	SWE			0.094	0.112	0.118	0.849	0.114	0.187	0.194	0.904
	DWE	T,Z		−0.013	0.131	0.134	0.953	−0.005	0.219	0.222	0.953
	CWE	T,Z		0.020	0.115	0.120	0.942	0.016	0.177	0.188	0.951
1000	Full			0.001	0.055	0.055	0.940	−0.001	0.093	0.094	0.952
	CC			0.146	0.083	0.082	0.467	0.146	0.138	0.140	0.777
	SWE	T ∧ C,Z		0.088	0.082	0.081	0.760	0.108	0.131	0.132	0.852
	DWE	T,Z		−0.014	0.092	0.091	0.946	−0.009	0.154	0.153	0.945
	CWE	T,Z		0.012	0.083	0.081	0.942	0.006	0.123	0.128	0.957

Covariate Vector:		V = (X,Z), X ~ N(0, 1), Z ~ binary(0.5), $X ⊥ Z$
True hazard of T:		$λ (t \| V) = \exp (θ_{x} X + θ_{z} Z)$ , $θ = (θ_{x}, θ_{z}) = (1, 1)$
True propensity:		$π_{1} (T, Z) = 1 - {1 + \exp (T - 0.5)}^{- 1}$
True censoring:		$ψ (T, V) = S_{C \| V} (T) = \exp (- T^{1 / 2})$
Variables used in $\hat{ψ}$ :		entire V

Censoring and missing rate:		δ = 0	R = 0	R = 1, δ = 1		R = 1, δ = 0		R = 0, δ = 1		R = 0, δ = 0
Censoring and missing rate:		0.476	0.448	0.249		0.303		0.275		0.173
Unconditional quantile of T:		25%	50%	75%
Unconditional quantile of T:		0.122	0.382	1.096

Open in a new tab

Table 3.

Simulation bias, SD, SE, and CP based on 2000 runs for estimation of θ under setting 3, n = 1000

		Estimation of θ_x				Estimation of θ_z1				Estimation of θ_z2
α	Method	Bias	SD	SE	CP	Bias	SD	SE	CP	Bias	SD	SE	CP
1	Full	0.004	0.111	0.110	0.944	0.004	0.109	0.110	0.945	0.020	0.350	0.357	0.948
	CC	0.207	0.179	0.182	0.743	0.206	0.173	0.180	0.754	−0.002	0.587	0.594	0.948
	SWE	0.147	0.175	0.176	0.852	0.221	0.164	0.169	0.711	0.022	0.552	0.569	0.956
	DWE	−0.025	0.234	0.227	0.939	−0.022	0.229	0.221	0.939	−0.135	0.794	0.746	0.938
	CWE	0.018	0.170	0.176	0.946	0.025	0.170	0.170	0.950	−0.091	0.498	0.528	0.954
2	Full	0.004	0.161	0.163	0.950	0.010	0.162	0.164	0.944	0.007	0.520	0.526	0.947
	CC	0.261	0.297	0.312	0.836	0.265	0.292	0.311	0.826	−0.022	0.969	0.997	0.954
	SWE	0.229	0.299	0.306	0.862	0.290	0.285	0.300	0.806	0.013	0.964	0.972	0.947
	DWE	−0.015	0.622	0.548	0.928	−0.106	0.627	0.554	0.929	−0.427	1.829	1.635	0.952
	CWE	0.099	0.377	0.335	0.926	0.123	0.358	0.336	0.924	−0.105	1.033	0.999	0.958

Covariate vector:			V = (X,Z₁,Z₂), $X \| Z_{1}$ , $Z_{2} ~ logit (0.5 Z_{1} - Z_{2})$ , Z₁ ∼ binary(0.5), Z₂ ∼ uniform(0, 0.5), $Z_{1} ⊥ Z_{2}$
True hazard of T:			$λ (t \| V) = 0.5 t \exp (θ_{x} X + θ_{z 1} Z_{1} + θ_{z 2} Z_{2})$ , $θ = (θ_{x}, θ_{z 1}, θ_{z 2}) = (1, 1, 1)$
True propensity:			$π_{1} (T, V) = 1 - {1 + \exp (2 T - Z_{2} - 1.5)}^{- 1}$
True censoring:			$ψ (T, V) = S_{C \| V} (T) = \exp {- {(α T)}^{1 / 2} \exp (0.2 X + 0.1 Z_{1})}$ α determines the censoring proportion Variables used in ψ: entire V
Variables used in $\hat{ψ}$			entire V
Variables used in ${\hat{π}}_{1}$			(T ∧ C,Z₁,Z₂) for SWE and (T,Z₁,Z₂) for DWE and CWE

Censoring and missing rate:				δ = 0	R = 0	R = 1, δ = 1		R = 1, δ = 0		R = 0, δ = 1		R = 0, δ = 0
α = 1				0.602	0.469	0.157		0.374		0.241		0.228
α = 2				0.815	0.469	0.058		0.473		0.127		0.342
Unconditional quantile of T:				25%		50%		75%
Unconditional quantile of T:				0.522		0.860		1.349

Open in a new tab

The propensity for setting 1 is used to check the performances of DWE and CWE when Z is “unnecessarily” included in the estimation of π₁, whereas the propensity for setting 2 is used to compare the performances of DWE and CWE with a misspecified propensity model including only T and a correct propensity including both T and Z.

3.2. Computation

To obtain the nonparametric kernel estimator of π₁ defined by (11), the “npreg” function from the “np” package in R is used with its default continuous kernel, the second order Gaussian kernel, and the unordered categorical kernel (Racine and Li, 2004) achieved by setting option “ukertype=liracine”. By default, cross-validation is applied to select the bandwidth parameter. As discussed in Section 2.1, the subset of data with δ = 1 is used to estimate π₁. Variables included in computing ${\hat{π}}_{1}$ are indicated in Tables 1–3.

The estimator of ψ is obtained by a Cox regression on V using the “coxph” function from the “survival” package in R. The subset of data with R = 1 is used to estimate ψ, according to the discussion in Section 2.1.

The score functions (9) and (14) are solved by the “multiroot” function from the “rootSolve” package in R with π₁, ψ and π₀ replaced by their estimators, where π₀ is estimated according to equations (18) and (19) with DWE as an initial estimate.

Under assumption (3), the SWE can be calculated using (14) with both π₁(T_i,Z_i) and π₀(C_i,V_i) replaced by $π (T_{i} \land C_{i}, Z_{i}, β_{i}) = pr (R_{i} = 1 | T_{i} \land C_{i}, Z_{i}, δ_{i})$ . Since $π (T_{i} \land C_{i}, Z_{i}, δ_{i})$ is a function of observed T_i ∧ C_i,Z_i and δ_i, it can be simply estimated using the kernel method (Qi et al., 2005), which is the Nadaraya-Watson estimator. For a fair comparison, the missing data propensity associated with SWE is also estimated by using “npreg” in R, instead of “ksmooth” and “sm.regression” mentioned in Qi et al. (2005). Variables included in estimating π are indicated in Tables 1–3.

3.3. Simulation Results

Simulation results are reported in Tables 1–3 for settings 1–3, respectively, with 2000 simulation runs and sample sizes n = 500 or 1000 as specified in Tables 1–3. The results include the simulation bias and standard deviation (SD) of estimators, the standard error (SE) of estimators based on the bootstrap with 1000 replications, and the coverage probability (CP) of 95% confidence intervals based on the bootstrap percentile. In the Supporting Information, we also include some figures to show the averages of estimated π₁ and ψ over simulation runs against the true curves.

The simulation results can be summarized as follows.

Both CC and SWE are substantially biased in settings 1–2, which leads to substantially low CP, e.g., CP is much smaller than 95% in many cases. In setting 3, the biases of CC and SWE are also serious except for the case of estimating θ_z2. Overall, SWE is less biased than CC. The proposed DWE and CWE have negligible biases in all cases and good CP, except for the case where π₁ is misspecified (the middle block of Table 2 with n = 500).
The SD of SWE and CWE are comparable, but having a small SD for a biased estimator SWE is not an advantage. In fact, the SD of CC is also comparable with those of SWE and CWE. The SD of DWE is largest as the estimation equation (9) for DWE does not directly use data from censored units. The relative efficiency of DWE with respect to CWE (variance ratio) is 0.85–0.89 in setting 2, 0.67–0.80 in setting 1, 0.53–0.74 in setting 3 with α = 1, and 0.37–0.57 in setting 3 with α = 2, which decreases as the rate of censoring increases.
When Z is “unnecessarily” included in the estimation of π₁ (the last block of Table 1 when n = 500), the SD of CWE (or DWE) is comparable with that in the case where Z is correctly not included. When an important variable is excluded in estimating π₁, however, the performances of DWE and CWE are affected (the middle block of Table 2 with n = 500). Similarly, including a few unnecessary covariates in the estimation of ψ does not lead to any problem, as we use the entire V in all cases.
The bootstrap SE is close to SD, even in the cases where point estimators are biased.

Table 2.

Simulation bias, SD, SE, and CP based on 2000 runs for estimation of θ under setting 2

				Estimation of θ_x				Estimation of θ_z
n	Method	Variables used in ${\hat{π}}_{1}$		Bias	SD	SE	CP	Bias	SD	SE	CP
500	Full			0.005	0.122	0.124	0.946	0.004	0.120	0.124	0.945
	CC			0.082	0.150	0.151	0.896	0.155	0.153	0.157	0.804
	SWE	T ∧ C,Z		0.055	0.147	0.148	0.917	0.101	0.143	0.149	0.880
	DWE	T,Z		0.004	0.165	0.166	0.946	0.028	0.157	0.161	0.946
	CWE	T,Z		0.024	0.151	0.150	0.938	0.043	0.141	0.144	0.940

	SWE	T ∧ C		0.058	0.149	0.148	0.920	0.127	0.148	0.153	0.858
	DWE	T		0.024	0.170	0.167	0.928	0.094	0.176	0.176	0.892
	CWE	T		0.039	0.150	0.149	0.925	0.108	0.151	0.155	0.878

1000	Full			0.000	0.086	0.086	0.941	0.000	0.087	0.086	0.944
	CC			0.075	0.103	0.105	0.874	0.151	0.108	0.109	0.674
	SWE	T ∧ C,Z		0.045	0.099	0.102	0.927	0.091	0.101	0.102	0.833
	DWE	T,Z		−0.002	0.115	0.114	0.942	0.014	0.113	0.110	0.941
	CWE	T,Z		0.010	0.106	0.105	0.948	0.027	0.106	0.099	0.944

Covariate Vector:		V = (X,Z), X ~ binary(0.5), Z ~ binary(0.5), $X ⊥ Z$
True hazard of T:		$λ (t \| V) = \exp (θ_{x} X + θ_{z} Z)$ , $θ = (θ_{x}, θ_{z}) = (1, 1)$
True propensity:		$π_{1} (T, Z) = 1 - {1 + \exp (T + Z)}^{- 1}$
True censoring:		$ψ (T, V) = S_{C \| V} (T) = \exp (- T^{1 / 2} \exp (- X / 4))$
Variables used in $\hat{ψ}$ :		entire V

Censoring and missing rate:		δ = 0	R = 0	R = 1, δ = 1		R = 1, δ = 0		R = 0, δ = 1		R = 0, δ = 0
Censoring and missing rate:		0.370	0.294	0.434		0.272		0.195		0.098
Unconditional quantile of T :		25%	50%	75%
Unconditional quantile of T :		0.090	0.240	0.573

Open in a new tab

4. Real Data Example

We analyze the Non-Small Cell Lung Cancer (NSCLC) data set introduced in Example 1 of Section 1. We focus on 1642 stage III patients who had private insurance and were diagnosed during 2004–2006 with 70–80 years of age at the time of diagnosis. The research interest is the overall survival of patients with stage III NSCLC under adjustment of age, gender, tumor stage, and two treatments: the Stereotactic Body Radiation Therapy (SBRT) and surgery. Surgery is the standard first line treatment for operable NSCLC. For stage III NSCLC patients whose lung tumor can not be easily or cleanly removed due to size, location, and nodal involvement around the lung, however, SBRT is a new promising radiation therapy that can be a good alternative (Kumar et al., 2017). Older patients may also opt for non-surgical interventions due to complications commonly associated with surgery.

In our data, all patients were diagnosed with stage III at baseline, but only about 40% patients had more accurate tumor stage recorded as either stage IIIA or stage IIIB. Thus, we treat the tumor stage as covariate X having missing values. As we argue in Example 1 of the introduction section, it is reasonable to assume that missingness of stage record X is related to survival time T and possibly other covariates, Z₁ = the treatment choice of SBRT or surgery, Z₂ = age (treated as continuous), and Z₃ = gender, where Z = (Z₁,Z₂,Z₃) is all observed; on the other hand, we think missing X is unlikely to be related to C given T and Z.

We apply the CC, SWE, DWE, and CWE as previously described in Section 3. The propensity is estimated by the product kernel regression based on (T ∧ C,Z) for SWE, and (T,Z) for DWE and CWE. We use the entire Z in estimating the propensity and entire V in estimating ψ, as the simulation results in Section 3 indicate that overfitting is not a problem.

The results of estimating coefficients of covariates in Cox regression and some of their differences are given in Table 4. The SEs are provided using 1000 bootstrap replications. The marginal quantiles of observed survival and follow-up times, and censoring and missing rates are also included in Table 4. We can draw the following conclusions based on these results.

Table 4.

Estimates of coefficients of covariates in Cox regression with stage III NSCLC data

		X = stage record 0 = stage IIIA 1 = stage IIIB		Z₁ = treatment 0 = surgery 1 = SBRT		Z₂ = age		Z₃ = gender 0 = male 1 = female
Method		Estimate	SE	Estimate	SE	Estimate	SE	Estimate	SE
CC		0.318	0.112	0.662	0.143	0.025	0.016	−0.435	0.099
SWE		0.297	0.112	0.679	0.139	0.042	0.012	−0.436	0.077
DWE		0.354	0.126	0.298	0.163	0.008	0.025	−0.602	0.119
CWE		0.278	0.105	0.573	0.150	0.049	0.017	−0.270	0.091

CC–SWE		0.021	0.021	−0.016	0.018	−0.017	0.004	0.001	0.023
SWE–CWE		0.019	0.070	0.106	0.058	−0.007	0.022	−0.166	0.065

Censoring and missing rate
δ = 0	R = 0	R = 1, δ = 1		R = 1, δ = 0		R = 0, δ = 1		R = 0, δ = 0
0.385	0.594	0.262		0.144		0.353		0.241

				25%		50%		75%
Observed marginal quantile of T when δ = 1				18.1		38.6		66.1
Observed marginal quantile of T ∧ C				27.3		62.4		91.5

Open in a new tab

Estimates by SWE are very close to those by CC except for the coefficient of covariate age Z₂. In fact, the sample correlation coefficients between CC and SWE based on 1000 bootstrap replicates ranges from 97% to 99% for four covariate components. For X or Z₂, the difference between CWE and SWE is close to 0, whereas the difference is beyond 2×SE for Z₃ and slightly smaller than 2×SE for Z₁.
Generally, CC, SWE and CWE have similar SE, while DWE has the largest SE. This is consistent with the simulation results in Section 3. DWE estimates are very different from others. Since DWE is less stable than CWE according to our simulation results, we believe that results from DWE are not reliable in this example.
Overall, the tumor stage, treatment, age, and gender all have significant association with the survival time. The difference between the proposed CWE and SWE or CC is in the magnitude of covariate effects.

5. Discussion

Under Cox proportional hazard model (1) with right censoring and survival-time-dependent missing baseline covariate values, we propose two consistent and asymptotically normal estimators of the Cox regression parameters, based on inverse probability weighting. The first is an initial estimator and the second estimator is more efficient and recommended.

In general, assumption (3) and assumption (4) do not have definite relationship, except that they are the same when there is no censoring. Both assumptions are special cases of the following assumption,

R ⊥ X (T, C, Z) .

(21)

Because both T and C are in the conditioning in (21) and they are not observed simultaneously, inference is difficult to make without any further assumption on the missingness or censoring mechanism. Some more discussions about missingness or censoring mechanism can be found in Rathouz (2007).

In Section 2, R = 1 if and only if X is completely observed. This means that some incompletely observed data for a multivariate X are not used in our DWE and CWE. Here, we extend our method to the case of components of X have item missingness. To illustrate, consider the case where X = (X₁, X₂) is bivariate. Define R^(1,1) as the indicator of observing both X₁ and X₂, R^(1,0) as the indicator of observing X₁ but not X₂, and R^(0,1) as the indicator of observing X₂ but not X₁. We still assume (4) with R = (R^(1,1), R^(1,0), R^(0,1)). Let R_i be R from subject i. Then an extended DWE is obtained by solving (9) with R_i replaced by $R_{i}^{(k, l)}$ , $π_{i} (T_{i}, Z_{i})$ replaced by ${\hat{π}}_{1}^{(k, l)} (T_{i}, Z_{i})$ , and $Σ_{i = 1}^{n}$ replaced by $Σ_{i = 1}^{n} Σ_{(k, l)}$ with $(k, l) = (1, 1), (1, 0), (0, 1)$ , where ${\hat{π}}_{1}^{(k, l)} (T, Z)$ is defined by (11) with R_i replaced by $R_{i}^{(k, l)}$ . An extended CWE can be obtained similarly. For a general q-dimensional X with item missingness, there are 2^q − 1 missingness patterns and our DWE and CWE can be extended with R_i defined to be the (2^q − 1)-dimensional indicator vector for missingness patterns. Of course, this extension may be infeasible if q is large or there are not enough subjects in one particular missingness pattern, in which case more research is needed.

Our approach is inverse probability weighting, and our contribution is to overcome the difficulty in constructing appropriate weights under assumption (4). An imputation approach under assumption (4) may be applied but further research is needed. A likelihood approach may not work under assumption (4) without an additional assumption such as a parametric form on the missingness propensity, because assumption (4) is not a missing at random assumption due to censoring. Under a missing at random assumption such as (3), a likelihood approach may be applied but it requires a correct specification of covariate distribution (Chen and Little, 1999) or a stronger assumption on censoring (e.g., Lipsitz and Ibrahim, 1998; Herring and Ibrahim, 2001).

If Z has high dimensional continuous components, dimension reduction or variable selection can be applied prior to the kernel estimation (11). For example, if all components of Z are continuous, then applying an existing dimension reduction or variable selection method leads to $π_{1} (T, Z) = π_{1} (T, ϕ (Z))$ with ϕ(Z) being a low dimensional linear function or subset of Z. Then (11) can be applied with Z replaced by ϕ(Z).

When V in (1)–(2) is replaced by a time-varying covariate vector V (t) = (X(t),Z(t)), limited publications about handling missing time-varying X(t) values can be found, under very strong assumptions on missingness. For example, Lin and Ying (1993) assumed missing X(t) is completely at random; Paik and Tsai (1997) assume that

R (t) ⊥ X (t) | (T \land C, Z (t), δ),

(22)

where R(t) is the indicator of whether X(t) is completely observed at time t. Assumption (22) is strong because it ignores the possible effect of Z(s), s < t, on R(t).

We now discuss the possibility of extension of our work to time-dependent covariates. If V (t) = (X,Z(t)), i.e., the always observed Z(t) is time-dependent but X having missing values is time-independent, then our DWE and CWE can be extended under two assumptions. The first one is to replace (4) by

R ⊥ (X, {Z (t), t > 0}, C) | (T, Z (0)),

i.e., the missingness of baseline X depends only on T and the baseline Z(0). The second one is the main assumption (1) in Robins and Finkelstein (2000), i.e., conditional on the recorded history V (s), s ⩽ t, the hazard of censoring C at time t does not further depend on the possibly unobserved T. If the covariate vector with missing values is time-dependent, then the missingess mechanism can be very complicated, since missing X(t) at time t may depend on all history information up to time t on covariates, survival, and censoring. One possible modification of assumption (4) is

R (t) ⊥ ({X (s), s ⩾ 0}, C) | (T, {Z (s), s ⩾ 0}) .

Along with the censoring assumption (1) in Robins and Finkelstein (2000), estimation equations (9) and (14) are still valid with π₁(T_i,Z_i), ψ(T_i,V_i), and π₀(C_i,V_i) replaced by $π_{1} (T_{i}, {Z_{i} (s), s ⩾ 0})$ , $ψ (T_{i}, V_{i} (t))$ , and $π_{0} (C_{i}, X_{i} (t), {Z_{i} (s), s ⩾ 0})$ , respectively. Obviously the survival-time-dependent time-varying missingness propensity $π_{1} (T_{i}, {Z_{i} (s), s ⩾ 0})$ is difficult to estimate although it is an interesting problem. A more careful research is needed.

Supplementary Material

Supp info

NIHMS1051869-supplement-Supp_info.pdf^{(291.9KB, pdf)}

Supp code

NIHMS1051869-supplement-Supp_code.zip^{(7.8KB, zip)}

Acknowledgement

We are grateful to the associate editor and four referees for comments and suggestions that led to significant improvements of the paper. The authors’ research was partially supported by the National Natural Science Foundation of China grant 11831008, the U.S. National Science Foundation grants DMS-1612873 and DMS-1914411, the University of Wisconsin Carbone Cancer Center Support Grant [P30 CA014520] from the U.S. National Institute of Health (NIH), and the University of Wisconsin Head and Neck Specialized Program of Research Excellence grant [P50 DE026787] from NIH. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Footnotes

Supporting Information

In Example 1 of Section 1, the data was obtained from The National Cancer Database: https://www.facs.org/quality-programs/cancer/ncdb, and The American Joint Committee on Cancer: https://cancerstaging.org/references-tools/quickreferences/Documents/LungMedium.pdf. The proof of the fact that (5) implies (4) and (6), the proofs of Theorems 1-2, the figures referenced in Section 3, and R codes for numerical work are available with this paper at the Biometrics website on Wiley Online Library.

References

Andersen PK and Gill RD. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist 10, 1100–1120. [Google Scholar]
Chen HY. (2002).Double-semiparametric method for missing covariates in cox regression models. Journal of the American Statistical Association 97, 565–576. [Google Scholar]
Chen HY and Little RJA. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association 94, 896–908. [Google Scholar]
Cook VJ, Hu XJ, and Swartz TB. (2011). Cox regression with covariates missing not at random. Statistics in Biosciences 3, 208–222. [Google Scholar]
Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276. [Google Scholar]
Fleming TR and Harrington D. (1991). Counting Processes & Survival Analysis, volume 157 John Wiley & Sons, Inc. [Google Scholar]
Herring AH and Ibrahim JG. (2001). Likelihood-based methods for missing covariates in the cox proportional hazards model. Journal of the American Statistical Association 96, 292–302. [Google Scholar]
Kumar SS, Higgins KA, and McGarry RC. (2017). Emerging therapies for stage iii non-small cell lung cancer: Stereotactic body radiation therapy and immunotherapy. Frontiers in Oncology 7, 197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin DY and Ying Z. (1993). Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88, 1341–1349. [Google Scholar]
Lipsitz SR and Ibrahim JG. (1998). Estimating equations with incomplete categorical covariates in the cox model. Biometrics 54, 1002–1013. [PubMed] [Google Scholar]
Paik MC and Tsai W-Y. (1997). On using the cox proportional hazards model with missing covariates. Biometrika 84, 579–593. [DOI] [PubMed] [Google Scholar]
Qi L, Wang CY, and Prentice RL. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association 100, 1250–1263. [Google Scholar]
Racine J and Li Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics 119, 99–130. [Google Scholar]
Rathouz PJ. (2007). Identifiability assumptions for missing covariate data in failure time regression models. Biostatistics 8, 345–356. [DOI] [PubMed] [Google Scholar]
Robins JM. (1997). Non-response models for the analysis of non-monotone non-ignorable missing data. Statistics in Medicine 16, 21–37. [DOI] [PubMed] [Google Scholar]
Robins JM and Finkelstein DM. (2000). Correcting for noncompliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) log-rank tests. Biometrics 56, 779–788. [DOI] [PubMed] [Google Scholar]
Rubin DB. (1976). Inference and missing data. Biometrika 63, 581–592. [Google Scholar]
Shao J. (2003). Mathematical Statistics Springer. [Google Scholar]
Teran MD and Brock MV. (2014). Staging lymph node metastases from lung cancer in the mediastinum. Journal of Thoracic Disease 6,. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang CY and Chen HY. (2001). Augmented inverse probability weighted estimator for cox missing covariate regression. Biometrics 57, 414–419. [DOI] [PubMed] [Google Scholar]
Xu Q, Paik MC, Luo X, and Tsai W-Y. (2009). Reweighting estimators for cox regression with missing covariates. Journal of the American Statistical Association 104, 1155–1167. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp info

NIHMS1051869-supplement-Supp_info.pdf^{(291.9KB, pdf)}

Supp code

NIHMS1051869-supplement-Supp_code.zip^{(7.8KB, zip)}

[R1] Andersen PK and Gill RD. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist 10, 1100–1120. [Google Scholar]

[R2] Chen HY. (2002).Double-semiparametric method for missing covariates in cox regression models. Journal of the American Statistical Association 97, 565–576. [Google Scholar]

[R3] Chen HY and Little RJA. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association 94, 896–908. [Google Scholar]

[R4] Cook VJ, Hu XJ, and Swartz TB. (2011). Cox regression with covariates missing not at random. Statistics in Biosciences 3, 208–222. [Google Scholar]

[R5] Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276. [Google Scholar]

[R6] Fleming TR and Harrington D. (1991). Counting Processes & Survival Analysis, volume 157 John Wiley & Sons, Inc. [Google Scholar]

[R7] Herring AH and Ibrahim JG. (2001). Likelihood-based methods for missing covariates in the cox proportional hazards model. Journal of the American Statistical Association 96, 292–302. [Google Scholar]

[R8] Kumar SS, Higgins KA, and McGarry RC. (2017). Emerging therapies for stage iii non-small cell lung cancer: Stereotactic body radiation therapy and immunotherapy. Frontiers in Oncology 7, 197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Lin DY and Ying Z. (1993). Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88, 1341–1349. [Google Scholar]

[R10] Lipsitz SR and Ibrahim JG. (1998). Estimating equations with incomplete categorical covariates in the cox model. Biometrics 54, 1002–1013. [PubMed] [Google Scholar]

[R11] Paik MC and Tsai W-Y. (1997). On using the cox proportional hazards model with missing covariates. Biometrika 84, 579–593. [DOI] [PubMed] [Google Scholar]

[R12] Qi L, Wang CY, and Prentice RL. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association 100, 1250–1263. [Google Scholar]

[R13] Racine J and Li Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics 119, 99–130. [Google Scholar]

[R14] Rathouz PJ. (2007). Identifiability assumptions for missing covariate data in failure time regression models. Biostatistics 8, 345–356. [DOI] [PubMed] [Google Scholar]

[R15] Robins JM. (1997). Non-response models for the analysis of non-monotone non-ignorable missing data. Statistics in Medicine 16, 21–37. [DOI] [PubMed] [Google Scholar]

[R16] Robins JM and Finkelstein DM. (2000). Correcting for noncompliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) log-rank tests. Biometrics 56, 779–788. [DOI] [PubMed] [Google Scholar]

[R17] Rubin DB. (1976). Inference and missing data. Biometrika 63, 581–592. [Google Scholar]

[R18] Shao J. (2003). Mathematical Statistics Springer. [Google Scholar]

[R19] Teran MD and Brock MV. (2014). Staging lymph node metastases from lung cancer in the mediastinum. Journal of Thoracic Disease 6,. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Wang CY and Chen HY. (2001). Augmented inverse probability weighted estimator for cox missing covariate regression. Biometrics 57, 414–419. [DOI] [PubMed] [Google Scholar]

[R21] Xu Q, Paik MC, Luo X, and Tsai W-Y. (2009). Reweighting estimators for cox regression with missing covariates. Journal of the American Statistical Association 104, 1155–1167. [Google Scholar]

PERMALINK

Cox Regression with Survival-Time-Dependent Missing Covariate Values

Yanyao Yi

Ting Ye

Menggang Yu

Jun Shao

Summary:

1. Introduction

Figure 1.

2. Method

2.1. Doubly Weighted Estimator

2.2. Compositely Weighted Estimator

3. Simulation

3.1. Simulation Settings

Table 1.

Table 3.

3.2. Computation

3.3. Simulation Results

Table 2.

4. Real Data Example

Table 4.

5. Discussion

Supplementary Material

Acknowledgement

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Cox Regression with Survival-Time-Dependent Missing Covariate Values

Yanyao Yi

Ting Ye

Menggang Yu

Jun Shao

Summary:

1. Introduction

Figure 1.

2. Method

2.1. Doubly Weighted Estimator

2.2. Compositely Weighted Estimator

3. Simulation

3.1. Simulation Settings

Table 1.

Table 3.

3.2. Computation

3.3. Simulation Results

Table 2.

4. Real Data Example

Table 4.

5. Discussion

Supplementary Material

Acknowledgement

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases