A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Soyoung Kim; Jae-Kwang Kim; Kwang Woo Ahn

doi:10.1007/s10985-021-09542-4

. Author manuscript; available in PMC: 2023 Apr 1.

Published in final edited form as: Lifetime Data Anal. 2022 Jan 16;28(2):169–193. doi: 10.1007/s10985-021-09542-4

A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Soyoung Kim ^1,^*, Jae-Kwang Kim ², Kwang Woo Ahn ³

PMCID: PMC8977246 NIHMSID: NIHMS1782160 PMID: 35034213

Abstract

Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set.

Keywords: Bayesian computation, Cox model, Missing data, Posterior distribution, Survival data

1. Introduction

The proportional hazards (PH) model [9] is widely used to evaluate covariate effects on survival outcome. In many biomedical studies, covariate information is often incompletely observed for some subjects due to loss of follow-up, loss of hospital records, or study design. For example, Dreger el al. [11] studied 1,394 patients who were aged 18 years or older, had relapsed diffuse large B-cell lymphoma (DLBCL), and had undergone their first non-myeloablative or reduced-intensity conditioning allogeneic stem cell transplantation between 2008 and 2015. They compared the effect of the following donor types on overall survival: haploidentical family donors using post-transplant cyclophosphamide (PTCy), matched sibling donors (MSD) / matched unrelated donors (MUD) with or without T-cell depletion. There were missing records in hematopoietic cell transplant-comorbidity index. In addition, the PH assumption was not valid for remission status at time of transplant.

One simple way, but a commonly used method, to handle such incomplete data is excluding missing data from the analysis, which is called the complete-case (CC) method. However, the CC method may lead to biased parameter estimation when the missing mechanism is related to outcome variables. A more sensible way to handle missing data is using the propensity score weighting, which is based on a model for the response probability. Once the response probabilities are estimated, the inverse of the estimated response probability is applied as the weight for estimating parameters of interest.

Extensive work for the PH model with missing covariates has been done under the frequentist framework. Lin and Ying [23] proposed a pseudo-likelihood score function to handle missing covariates under the missing-completely-at-random assumption. Zhou and Pepe [44] and Chen and Little [5] proposed an estimated partial-likelihood method using auxiliary covariates and a nonparametric maximum likelihood method, respectively. Pugh et al. [29] proposed an inverse-probability-weighted (IPW) estimating equation when the missing mechanism is missing at random (MAR) in the sense of Rubin [31]. Wang and Chen [39] and Xu et al. [41] considered an augmented inverse probability weighting (AIPW) scheme [30] under the MAR assumption. Herring and Ibrahim [14] and Herring et al. [15] introduced Monte-Carlo-Expectation-Maximization-algorithm-based methods to handle missing covariates under the MAR assumption and non-ignorable missing, respectively. Multiple imputation for the PH model has also been explored [24,40,3]. However, all of these methods are limited to the non-stratified PH model. Because it is common that the PH assumption is not satisfied for some covariates [11,38,27] or stratified sampling is used for data collection [18,19], it is crucial to study the stratified PH model for missing data. In addition, many of these existing methods rely on Taylor expansion to study the asymptotic properties of the estimators and thus obtaining variance estimates of the estimators can be complicated in practice.

Besides the frequentist approaches, several Bayesian approaches were also proposed to handle missing covariates for survival data. Chen et al. [6] proposed a general class of informative priors for semi-parametric survival models incorporating a cure fraction. Yoo and Lee [42] extended the Bayesian adaptive B-spline estimation method of Sharef et al. [34] to the clustered survival data with missing covariates. Hemming and Hutton [13] considered Markov Chain Monte Carlo (MCMC) to handle missing covariates for the accelerated failure time model under the MAR assumption. For the PH model, Ibrahim et al. [17] and Bradshaw et al. [4] studied variable selection and non-ignorably missing time-varying covariates, respectively, under the MAR assumption. Chen et al. [8] considered covariates with detection limit. In general, the current Bayesian literature for missing covariates for the PH model requires complicated MCMC algorithms that generate correlated samples and the burn-in period. Finding straightforward and less computationally intensive methods for survival outcomes with missing data is desirable in practice. Furthermore, like the frequentist approaches discussed previously, these Bayesian methods did not consider the stratified PH model.

Sang and Kim [32] recently proposed an approximate Bayesian method to handle unit nonresponse for outcomes. Their Bayesian method is calibrated to the frequentist inference in that the credible region obtained from the posterior distribution asymptotically matches the frequentist confidence interval. Although their method is an approximate method based on the asymptotic normality of the score function, it is attractive in practice due to its simplicity and good performance in parameter estimation. More specifically, their algorithm generates independent samples for the posterior distribution of parameters and does not require MCMC for posterior computation.

Survival data often have multiple missing covariates. For example, Pidala et al. [26] studied the impact of DPB1 T-cell epitope matching on the hematopoietic cell transplantation outcomes for patients with leukemia and myelodysplastic syndrome. There were two important variables with missing values: DPB1 T-cell epitope matching and Karnofsky performance score (KPS). Existing literature considers one missing mechanism to handle these two missing covariates. However, the missing mechanism for DPB1 T-cell epitope matching could be different from one for KPS. In addition, cases having missing values in both variables could have a different missing mechanism from those with missing values in one of the two variables. This important aspect has been largely ignored in the current literature.

Motivated by Sang and Kim [32], we propose an approximate Bayesian approach for survival data with missing covariates. We consider the stratified PH model so that it can be used even for data with the non-proportional hazard structure. The proposed method uses two score equations, one from the stratified PH model and the other from the response model, to construct the likelihood part of the posterior distribution. With a flat prior, the credible interval from the posterior distribution asymptotically matches the confidence interval of the frequentist IPW approach. In contrast with the frequentist approach, the proposed method does not involve Taylor expansion. Sampling from the posterior distribution is straightforward to implement. We also propose new methods to take account of multiple covariates missing patterns. We study the asymptotic properties of the IPW estimators in frequentist inference for the stratified PH model with known and unknown missing mechanisms. Simulation studies and applications to the data of Dreger et al. [11] and Pidala et al. [26] are also provided.

2. Model and estimation

2.1. Model definitions and assumptions

Assume the full cohort consists of n subjects and there are L strata, where L is fixed. Let $T_{l i}$ be the failure time, $C_{l i}$ be the potential censoring time, and $Z_{l i} = {(Z_{l i 1}, \dots, Z_{l i p})}^{T}$ be a $p \times 1$ time-independent covariate vector for subject i in stratum $l$ for $l = 1, \dots, L$ , $i = 1, \dots, n_{l}$ , where n_l is the number of subjects in stratum $l$ . Let $X_{l i} = \min (T_{l i}, C_{l i})$ denote the observed time in the full cohort and $Δ_{l i} = I (T_{l i} \leq C_{l i})$ be an event indicator. The study period is $[0, τ]$ . We consider the stratified PH model: for subject i in stratum l, the hazard function $λ_{l i} (\cdot)$ associated with Z_li is

λ_{l i} (t ∣ Z_{l i}) = λ_{0 l} (t) e^{β_{0}^{T} Z_{l i}},

(1)

where $λ_{0 l} (t)$ is a baseline hazard function for stratum $l$ and $β_{0}$ is an unknown parameter vector. We assume $T_{l i}$ is independent of $C_{l i}$ given $Z_{l i}$ [9,10].

Next, we introduce notation and assumptions for missing covariates. Let $Z_{l i} = (Z_{l i}^{c}, Z_{l i}^{m})$ , where $Z_{l i}^{c}$ and $Z_{l i}^{m}$ are the complete covariate vector and the missing covariate vector for subject i in stratum l, respectively. Let $ξ_{l i}$ be the observation indicator for subject i in stratum $l : ξ_{l i} = 1$ if $Z_{l i}$ is fully observed and $ξ_{l i} = 0$ if some elements of $Z_{l i}$ are missing. Assume $(T_{l i}, C_{l i}, Z_{l i}, ξ_{l i})$ for $i = 1, \dots, n_{l}$ within stratum $l$ are independently and identically distributed. In addition, $(T_{l i}, C_{l i}, Z_{l i}, ξ_{l i})$ and $(T_{l^{'} i^{'}}, C_{l^{'} i^{'}}, Z_{l^{'} i^{'}}, ξ_{l^{'} i^{'}})$ are assumed to be independent when $l \neq l^{'}$ . The observed data for stratum $l$ is $(X_{l i}, Δ_{l i}, Z_{l i}^{c}, ξ_{l i})$ for $i = 1, \dots, n_{l}$ . Define $W_{l i} = (X_{l i}, Δ_{l i}, Z_{l i}^{c})$ for $i = 1, \dots, n_{l}$ and $l = 1, \dots, L$ . We assume that $Z_{l i}^{m}$ is missing at random in that the probability of observing missing covariates is conditionally independent of $Z_{l i}^{m}$ given $W_{l i}$ . Let $N_{l i} (t) = I (X_{l i} \leq t, Δ_{l i} = 1)$ be the counting process for the observed failure time, and $Y_{l i} (t) = I (X_{l i} \geq t)$ denote the at-risk indicator for subject i in stratum l, where $I (\cdot)$ is an indicator function.

2.2. Inverse Probability Weighted Estimators

The inverse probability weighted (IPW) estimator based on Horvitz and Thompson [16] adjusts for missing covariates by using the inverse of the response probability as the weight [39,41]. We assume the $ξ_{l i}$ within stratum l is independently generated from a Bernoulli distribution and allow a different intercept for each stratum:

π_{l i} = \Pr (ξ_{l i} = 1 ∣ W_{l i}) = \frac{\exp (ϕ^{T} ω_{l i})}{1 + \exp (ϕ^{T} ω_{l i})},

(2)

where $ω_{l i} = {(1, I (l = 2), \dots, I (l = L), W_{l i}^{T})}^{T}$ . Let $ϕ_{0}$ be the true parameter vector of $ϕ$ . To obtain the IPW estimator, we consider

U_{1, n} (β, ϕ) = \frac{1}{n} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}} {\int^{​}}_{0}^{τ} {Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} d N_{l i} (t) = 0,

(3)

U_{2, n} (ϕ) = \frac{1}{n} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} {ξ_{l i} - π_{l i}} ω_{l i}^{T} = 0,

(4)

where $S_{l}^{(d)} (β, t) = n^{- 1} \sum_{i = 1}^{n_{l}} ξ_{l i} π_{l i}^{- 1} Y_{l i} (t) Z_{l i}^{\otimes d} e^{β^{T} Z_{l i}}$ for $d = 0, 1$ and $a^{\otimes 0} = 1$ , $a^{\otimes 1} = a$ , $a^{\otimes 2} = a a^{T}$ . The two functions $U_{1, n} (β, ϕ)$ and $U_{2, n} (ϕ)$ are the score functions from the stratified PH model and the logistic regression, respectively. For the frequentist IPW approach under the MAR assumption, one first estimates response probabilities $π_{l i} ’ s$ by solving (4) and then plugs the estimated $π_{l i} ’ s$ into (3) to estimate β. Let the solution to (3) be $\hat{β}$ . In practice, one relies on Taylor expansion to obtain the asymptotic normality of $\hat{β}$ [41].

2.3. Approximate Bayesian approach

We propose an approximate Bayesian approach in this section. Define $θ = {(β^{T}, ϕ^{T})}^{T}$ and $U_{n} (θ) = {(U_{1, n}^{T} (β, ϕ), U_{2, n}^{T} (ϕ))}^{T}$ . Let $\hat{θ}$ be the solution to $U_{n} (θ) = 0$ . Instead of directly generating the posterior distribution $p (θ | data)$ , similarly to Soubeyrand and Haon-Lasportes [35], we use an approximation to $p (θ | data)$ , that is, $p (θ | \hat{θ})$ as follows:

p (θ ∣ \hat{θ}) \propto p (\hat{θ} ∣ θ) p (θ),

where $p (\hat{θ} | θ)$ is the sampling distribution of $\hat{θ}$ and $p (θ)$ is the prior distribution for $θ$ . However, studying $p (\hat{θ} | θ)$ requires Taylor expansion, as in Theorem 1.

To avoid Taylor expansion, instead of generating from $p (θ | \hat{θ})$ , we alternatively consider

p (θ ∣ U_{n}) \propto p (U_{n} (θ) ∣ θ) p (θ),

(5)

where $p (U_{n} (θ) | θ)$ is the sampling distribution of $U_{n} (θ)$ . To generate samples from (5), we consider a one-to-one transformation $T : θ \to η$ such that $η = E (U_{n} | θ)$ . Then, we generate $η^{*}$ from $p (η | U_{n})$ and obtain $θ^{*} = T^{- 1} (η^{*})$ as samples for the posterior distribution in (5). As in the next section, under some regularity conditions, the asymptotic distribution of $U_{n}$ is

\sqrt{n} {U_{n} (θ) - η (θ)} ∣ θ \overset{d}{\to} N (0, Σ),

(6)

where $\overset{d}{\to}$ is convergence in distribution and $Σ$ is the asymptotic covariance matrix of the joint score functions. Since the transformation $T : θ \to η$ is one-to-one, (6) is equivalent to

\sqrt{n} {U_{n} (T^{- 1} η) - η} ∣ η \overset{d}{\to} N (0, Σ) .

(7)

Then, the posterior distribution of η given U_n is

p (η ∣ U_{n}) \propto p (U_{n} ∣ η) p (η),

(8)

where $p (U_{n} | η)$ is the density of the limiting distribution in (7). Equation (8) shows an important relationship between the frequentist IPW approach and the Bayesian approach. Under a flat prior for $p (η)$ and the sufficient conditions for (7), we can approximate the posterior distribution $p (η | U_{n})$ as follows:

p (η ∣ U_{n}) \sim N (U_{n}, Σ / n) .

(9)

As shown in the Appendix, the estimator of Σ, $\hat{Σ}$ , can be obtained by

\frac{\hat{Σ}}{n} = (\begin{matrix} \hat{V a r} (U_{1}) & \hat{C o v} (U_{1}, U_{2}) \\ \hat{C o v} (U_{1}, U_{2}) & \hat{V a r} (U_{2}) \end{matrix}), \hat{V a r} {U_{1} (β, ϕ)} = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}^{2}} \int_{0}^{τ} {[{Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} \hat{M_{l i}} (t)]}^{\otimes 2}, \hat{V a r} (U_{2}) = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} π_{l i} (1 - π_{l i}) ω_{l i} ω_{l i}^{T}, d {\hat{Λ}}_{0 l} (t) = \sum_{i = 1}^{n_{l}} d N_{l i} (t) / {n_{l} S_{l}^{(0)} (β, t)}, \hat{C o v} (U_{1}, U_{2}) = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i} (1 - π_{l i})}{π_{l i}} \int_{0}^{τ} {Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} d M_{l i} (t) \times ω_{l i}^{T}, {\hat{Σ}}_{c} = \hat{V a r} (U_{1}) - \hat{C o v} (U_{1}, U_{2}) {\hat{V a r}}^{- 1} (U_{2}) \hat{C o v} {(U_{1}, U_{2})}^{T}, d {\hat{M}}_{l i} (t) = d N_{l i} (t) - Y_{l i} (t) e^{β^{T} Z_{li}} d {\hat{Λ}}_{0 l} (t) .

Under the flat prior for $p (η)$ , we propose the following algorithm to generate samples from the posterior distribution $p (θ | \hat{θ})$ as follows:

Generate $η_{2}^{*}$ from the approximate posterior distribution $p (η_{2} | U_{2, n} = 0)$ , that is, $N (0, \hat{V a r} (U_{2}))$ .
Solve $U_{2, n} (ϕ) = η_{2}^{*}$ with respect to $ϕ$ to obtain $ϕ^{*}$ .
Generate $η_{1}^{*}$ from the approximate posterior distribution $p (η_{1} | U_{1, n} (ϕ^{*}) = 0)$ , that is, $N (0, {\hat{Σ}}_{c})$ .
Solve $U_{1, n} (β) = η_{1}^{*}$ with respect to $β$ to obtain $β^{*}$ .
Repeat above steps.

The Newton-Raphson algorithm or a root-finding algorithm can be used to solve $U_{n} (θ) = η^{*}$ in Step 2 and Step 4. Thus, the algorithm is straightforward to implement. Based on our simulation results, 1000 repetitions appear to be enough for statistical inference. Using the above algorithm, independent samples from the approximate posterior distribution $p (θ | \hat{θ})$ are generated and thus there is no burn-in period.

2.4. Estimators with multiple missing covariates patterns

The methods in Sections 2.2 and 2.3 consider a single missing mechanism. There are not directly applicable when there are multiple missing covariates having different missing patterns. We propose estimators for survival data with multiple missing patterns in this section. We describe the proposed method for the stratified PH model with two missing covariates and two strata for simplicity. Suppose two covariates, $Z_{1 i}^{m}$ and $Z_{2 i}^{m}$ , are subject to missingness for individual i. Let $η_{k i} = 1$ if $Z_{k i}^{m}$ is observed and $η_{k i} = 0$ if $Z_{k i}^{m}$ is missing for k = 1, 2. Denote $Z = {(Z_{1}^{m}, Z_{2}^{m})}^{T}$ and $O = {(Δ, X, {(Z^{c})}^{T})}^{T}$ , where $Z^{c}$ is a completely observed covariate vector. We divide the data into 4 groups: i) both $Z_{1}^{m}$ and $Z_{2}^{m}$ are observed; ii) $Z_{1}^{m}$ is observed and $Z_{2}^{m}$ is missing; iii) $Z_{1}^{m}$ is missing and $Z_{2}^{m}$ is observed; and iv) both $Z_{1}^{m}$ and $Z_{2}^{m}$ are missing. Let $ω_{10} = {(1, Z_{1}^{m}, O^{T})}^{T}$ , $ω_{01} = {(1, Z_{2}^{m}, O^{T})}^{T}$ , and $ω_{00} = {(1, O^{T})}^{T}$ . Thus, $ω_{10}$ , $ω_{01}$ , and $ω_{00}$ correspond to ii), iii), and iv), respectively. Define

\begin{matrix} r_{11} (Z, O) = 1, if η_{1} = 1, η_{2} = 1, \\ r_{10} (Z, O) = \exp (ϕ_{10}^{T} ω_{10}), if η_{1} = 1, η_{2} = 0, \\ r_{01} (Z, O) = \exp (ϕ_{01}^{T} ω_{01}), if η_{1} = 0, η_{2} = 1, \\ r_{00} (Z, O) = \exp (ϕ_{00}^{T} ω_{00}), if η_{1} = 0, η_{2} = 0, \end{matrix}

(10)

where

ϕ_{10}^{T} ω_{10} = ϕ_{10, 0} + ϕ_{10, 1} Z_{1}^{m} + ϕ_{10, 2} Δ + ϕ_{10, 3} X + ϕ_{10, 4}^{T} Z^{c} + ϕ_{10, 5}^{T} I (stratum = 2), ϕ_{01}^{T} ω_{01} = ϕ_{01, 0} + ϕ_{01, 1} Z_{2}^{m} + ϕ_{01, 2} Δ + ϕ_{01, 3} X + ϕ_{01, 4}^{T} Z^{c} + ϕ_{01, 5}^{T} I (stratum = 2), ϕ_{00}^{T} ω_{00} = ϕ_{00, 0} + ϕ_{00, 1} Δ + ϕ_{00, 2} X + ϕ_{00, 3}^{T} Z^{c} + ϕ_{00, 4}^{T} I (stratum = 2) .

Equation (10) satisfies the MAR assumption. Model (10) is a model for the ratio of the missing probability to the baseline:

r_{a b} (Z, O) = \frac{P (η_{1} = a, η_{2} = b ∣ Z, O)}{P (η_{1} = 1, η_{2} = 1 ∣ Z, O)} .

A similar idea has been considered in Sun and Tchetgen Tchetgen [36]. Then, the propensity score is

π_{a b} = P (η_{1} = a, η_{2} = b ∣ Z, O) = \frac{r_{a b} (Z, O)}{\sum_{a = 0}^{1} \sum_{b = 0}^{1} r_{a b} (Z, O)} .

Let $ϕ_{0}$ be the true parameter vector of $ϕ = {(ϕ_{10}^{T}, ϕ_{01}^{T}, ϕ_{00}^{T})}^{T}$ . To obtain the IPW estimator, we consider

U_{1, n}^{m} (β_{m}, ϕ) = \frac{1}{n} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{11, l i}} \int_{0}^{τ} {Z_{l i} - \frac{S_{l, m}^{(1)} (β_{m}, t)}{S_{l, m}^{(0)} (β_{m}, t)}} d N_{l i} (t) = 0,

(11)

U_{2, n}^{m} (ϕ) = {U_{10, n}^{T} (ϕ_{10}), U_{01, n}^{T} (ϕ_{01}), U_{00, n}^{T} (ϕ_{00})}^{T} = 0,

(12)

U_{a b, n} (ϕ_{a b}) = \frac{1}{N_{a b}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} η_{a b, l i} {ξ_{l i} - \frac{π_{11, l i}}{π_{11, l i} + π_{a b, l i}}} ω_{a b, l i}^{T},

where $ξ_{l i} = I (η_{1} = 1, η_{2} = 1)$ , $η_{a b, l i} = I (η_{1 l i} = a, η_{2 l i} = b or η_{1 l i} = 1, η_{2 l i} = 1)$ , $S_{l, m}^{(k)} (β_{m}, t) = n_{l}^{- 1} \sum_{i = 1}^{n_{l}} ξ_{l i} π_{11, l i}^{- 1} Y_{l i} (t) Z_{l i}^{\otimes k} e^{β_{l, m}^{T} Z_{l i}}$ , and $N_{a b}$ is the subgroup sample size with $η_{1} = a$ and $η_{2} = b$ . Let the solution to (11) be ${\hat{β}}_{m}$ . Similarly to Section 2.3, the estimator of $Σ^{m}$ , ${\hat{Σ}}^{m}$ , can be obtained by

\frac{{\hat{Σ}}^{m}}{n} = (\begin{matrix} \hat{V a r} (U_{1}^{m}) & \hat{C o v} (U_{1}^{m}, U_{2}^{m}) \\ \hat{C o v} (U_{1}^{m}, U_{2}^{m}) & \hat{V a r} (U_{2}^{m}) \end{matrix}), \hat{V a r} {U_{1}^{m} (β, ϕ)} = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{11, l i}^{2}} \int_{0}^{τ} {[{Z_{l i} - \frac{S_{l, m}^{(1)} (β, t)}{S_{l, m}^{(0)} (β, t)}} d {\hat{M}}_{l i} (t)]}^{\otimes 2}, \hat{V a r} (U_{a b}) = \frac{1}{N_{a b}^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} η_{a b, l i} {(ξ_{l i} - \frac{π_{11, l i}}{π_{11, l i} + π_{a b, l i}})}^{2} ω_{a b, l i} ω_{a b, l i}^{T}, d {\hat{Λ}}_{l 0} (t) = \sum_{i = 1}^{n_{l}} d N_{l i} (t) / {n S_{l, m}^{(0)} (β, t)}, d {\hat{M}}_{l i} (t) = d N_{l i} (t) - Y_{l i} (t) e^{β^{T} Z_{li}} d {\hat{Λ}}_{l 0} (t) \hat{C o v} (U_{1}^{m}, U_{2}^{m}) = {U_{1, n}^{m}}^{T} U_{2, n}^{m} .

Step 1 - Step 5 in Section 2.3 can be similarly applicable to the approximate Bayesian approach for multiple missing patterns.

3. Asymptotic properties

We now study the asymptotic properties of $p (θ | \hat{θ})$ in this section. To establish the consistency and the asymptotic normality of the IPW estimator for the stratified PH model, we assume the following conditions:

C1 $P {Y_{l i} (t) = 1} > 0$ for $t \in [0, τ]$ , $l = 1, \dots, L$ and $i = 1, \dots, n_{l}$ ;

C2 $| Z_{l i k} (0) | + \int_{0}^{τ} | d Z_{l i k} (t) | < D_{z} < \infty$ , $l = 1, \dots, L$ , $i = 1, \dots, n_{l}$ and $k = 1, \dots, p$ almost surely where $D_{z}$ is a constant;

C3 For d = 0, 1, 2, there exists a neighborhood $B$ of $β_{0}$ such that $s_{l}^{(d)} (β, t)$ is continuous and $\sup_{t \in [0, τ], β \in B} ‖ S_{l}^{(d)} (β, t) - s_{l}^{(d)} (β, t) ‖ \overset{p}{\to} 0$ for $l = 1, \dots, L$ , where $\overset{p}{\to}$ denotes convergence in probability;

C4 The matrix $I_{l} (β) = \int_{0}^{τ} v_{l} (β, t) s_{l}^{(0)} (β, t) λ_{0 l} (t) d t$ is positive definite for $l = 1, \dots, L$ , where $v_{l} (β, t) = s_{l}^{(2)} (β, t) / s_{l}^{(0)} (β, t) - e_{l} {(β, t)}^{\otimes 2}$ and $e_{l} (β, t) = s_{l}^{(1)} (β, t) / s_{l}^{(0)} (β, t)$ ;

C5 The matrix $V_{l}^{ϕ}$ is positive definite and $π_{l i} \geq ϵ > 0$ for $i = 1, \dots, n_{l}$ and $l = 1, \dots, L$ , where $V_{l}^{ϕ} = E {(ξ_{l 1} - π_{l 1}) ω_{l 1}^{T}}^{\otimes 2}$ ;

C6 For all $β \in B$ , $t \in [0, τ]$ , $S_{l}^{(1)} (β, t) = \partial S_{l}^{(0)} (β, t) / \partial β$ , and $S_{l}^{(2)} (β, t) = \partial^{2} S_{l}^{(0)} (β, t) / \partial β \partial β^{T}$ , where $S_{l}^{(d)} (β, t)$ , $d = 0, 1, 2$ are continuous functions of $β \in B$ uniformly in $t \in [0, τ]$ and are bounded on $B \times [0, τ]$ , $s_{l}^{(0)}$ is bounded away from zero on $B \times [0, τ]$ ;

C7 $\int_{0}^{τ} λ_{0 l} (t) d t < \infty$ for $l = 1, \dots, L$ ;

C8 $\lim_{n \to \infty} n_{l} / n = q_{l}$ , where $q_{l} \in (0, 1)$ for all $l = 1, \dots, L$ ;

C9 As $n \to \infty$ , $\sup_{θ \in Θ} ‖ U_{n} (θ) - η (θ) ‖ \overset{p}{\to} 0$ , where $Θ$ is the parameter space;

C10 The map $θ \mapsto U_{n} (θ)$ is continuous and has exactly one zero $\hat{θ}$ with probability one;

C11 Equation $η (θ) = 0$ has exactly one root at $θ = θ_{0}$ ;

C12 There exists a neighborhood of $θ_{0}$ , denoted by $J_{n} (θ_{0})$ , on which with probability one all $U_{n} (θ)$ are continuously differentiable and the Jacobian $\partial U_{n} (θ) / \partial θ$ converges uniformly to a non-stochastic limit which is non-singular. Here, $J_{n} (θ_{0})$ is a ball with center $θ_{0}$ and radius $r_{n}$ satisfies $r_{n} \to \infty$ and $r_{n} \sqrt{n} \to \infty$ ;

C13 For any $θ \in J_{n} (θ_{0})$ , given $θ$ :

\sqrt{n} {U_{n} (θ) - η (θ)} \overset{d}{\to} N (0, Σ (θ))

holds for some $Σ (θ) = V a r {\sqrt{n} U_{n} (θ) | θ}$ that is positive definite and independent of n.

Conditions C1–C8 are the standard conditions for the consistency and asymptotic normality of $\hat{β}$ [1,41]. Conditions C9–C13 are needed for the asymptotic properties of the joint IPW estimator. More specifically, as long as the samples satisfy some moment conditions, condition C9 holds. Conditions C10 and C11 ensure the existence and uniqueness of the solutions to $U_{n} = 0$ . Condition C12 regulates the derivatives of $U_{n}$ and ensures its covariance converges. Condition C13 provides the asymptotic distribution for the estimating equation. The proof of C13 can be found in Theorem 6 of Yuan and Jennrich (1998) [43], where Yuan and Jennrich (1998) [43] studied the large sample properties including the existence, strong consistency, and asymptotic normality of the estimators generated from samples that are not necessarily identically distributed under very general assumptions. Under Conditions C1-C13 we can show $\hat{θ}$ is a consistent estimator for $θ$ and asymptotically normally distributed with mean 0 and covariance matrix $B^{- 1} (θ_{0}) Σ (θ_{0}) B^{- 1} (θ_{0})$ where $B (θ) = \partial η (θ) / \partial θ$ .

Following Sang and Kim [32], we assume the following conditions to establish the posterior consistency and asymptotic normality.

C14 The prior $η \mapsto π (η)$ is positive and Lipschitz continuous over the parameter space;

C15 For $θ \in J_{n} (θ_{0})$ , the variance estimator $\hat{Σ} (θ)$ satisfies $\hat{Σ} (θ) = Σ (θ) {1 + o_{p} (1)}$ where $\hat{Σ} (θ)$ is provided in the Appendix;

C16 For any $θ \in J_{n} (θ_{0})$ , the mapping $θ \mapsto {| Σ (θ) |}^{- 1}$ is Lipschitz continuous. Also, the mapping $θ \mapsto x^{T} {Σ (θ)}^{- 1} x$ is Lipschitz continuous in the sense that there exists a constant $C (x)$ satisfying $‖ x^{T} {Σ (θ_{1})}^{- 1} x - x^{T} {Σ (θ_{2})}^{- 1} x ‖ \leq C (x) ‖ θ_{1} - θ_{2} ‖$ , for any $θ_{1}, θ_{2} \in J_{n} (θ_{0})$ , for all $x \in R^{p}$ , where $p = d i m (Z)$ . And $C (x)$ is also Lipschitz continuous;

C17 $θ \mapsto U_{n} (θ)$ and $θ \mapsto η (θ)$ are one-to-one functions for any $θ \in J_{n} (θ_{0})$ . Also $θ \mapsto η (θ)$ is Lipschitz continuous.

Condition C14 is a standard assumption for the prior and the flat prior satisfies this condition. Condition C15 implies the covariance estimator should be consistent. Conditions C16 to C17 are the sufficient conditions for the posterior distribution to be approximated by the proposed method. Soubeyrand and Haon-Lasportes [35] also used similar conditions to C14 and C16 to justify their approximate Bayesian computation methods. All the conditions can be easily satisfied if we assume covariance estimator is Lipschitz continuous in θ and has bounded eigenvalues as discussed in Sang and Kim [32].

Similarly to Xu et al. [41], we can establish the following asymptotic property for $\hat{β}$ under the stratified PH model:

Theorem 1 Assume Conditions C1-C8 in Section 3.

1. Assume $π_{l i}$ is unknown and correctly specified. Then, $\hat{β}$ is consistent for $β$ , and $\sqrt{n} (\hat{β} - β_{0})$ is asymptotically normally distributed with mean 0 and covariance matrix

V_{β} = {I (β_{0})}^{- 1} {Σ^{β_{0}} - Σ^{ϕ_{0} β_{0}}} {I (β_{0})}^{- 1},

(13)

where

I (β) = \sum_{l = 1}^{L} q_{l} I_{l} (β), I_{l} (β) = \int_{0}^{τ} v_{l} (β, t) s_{l}^{(0)} (β, t) λ_{0 l} (t) d t, v_{l} (β, t) = s_{l}^{(2)} (β, t) / s_{l}^{(0)} (β, t) - e_{l} {(β, t)}^{\otimes 2}, e_{l} (β, t) = s_{l}^{(2)} (β, t) / s_{l}^{(0)} (β, t), s_{l}^{(d)} (β, t) = E {S_{l}^{(d)} (β, t)} f o r d = 0, 1, 2, Σ^{β} = \sum_{l = 1}^{L} q_{l} E {[\frac{ξ_{l 1}}{π_{l 1}} \int_{0}^{τ} {Z_{l 1} - e_{l 1} (β, t) d M_{l 1} (t)}]}^{\otimes 2}, Σ^{ϕ β} = \sum_{l = 1}^{L} q_{l} V_{l}^{ϕ β} V_{l}^{ϕ} {(V_{l}^{ϕ β})}^{T}, V_{l}^{ϕ β} = E [\frac{ξ_{l 1}}{π_{l 1}^{2}} \int_{0}^{τ} {Z_{l 1} - e_{l 1} (β, t) d M_{l 1} (t)} \frac{\partial}{\partial ϕ^{T}} π_{l 1}], V_{l}^{ϕ} = E {(ξ_{l 1} - π_{l 1}) ω_{l 1}^{T}}^{\otimes 2}, d M_{l i} (t) = d N_{l i} (t) - Y_{l i} (t) \exp (β^{T} Z_{l i}) d Λ_{0 l} (t), \lim_{n \to \infty} n_{l} / n = q_{l} .

2. If π_li is known, $\sqrt{n} (\hat{β} - β_{0})$ is asymptotically normally distributed with mean 0 and covariance matrix ${I (β_{0})}^{- 1} Σ^{β_{0}} {I (β_{0})}^{- 1}$ .

Its proof is a straightforward extension of Theorem 2 of Xu et al. [41] to the stratified PH model and thus omitted. Using (13), one can develop a plug-in variance estimator of $\hat{β}$ , but it can involve a quite computation.

Next we have the following asymptotic property for estimators from the stratified PH model with two missing covariates having multiple missing patterns as in Section 2.4.

Theorem 2 Assume Conditions C1-C8 in Section 3.

1. Assume $π_{a b, l i}$ is unknown and correctly specified. Then, ${\hat{β}}_{m}$ is consistent for $β_{m}$ , and $\sqrt{n} ({\hat{β}}_{m} - β_{m 0})$ is asymptotically normally distributed with mean 0 and covariance matrix

V_{β}^{m} = {I_{m} (β_{m 0})}^{- 1} {Σ_{m}^{β_{m 0}} - Σ_{m}^{ϕ_{m 0} β_{m 0}}} {I_{m} (β_{m 0})}^{- 1},

where

I_{m} (β) = \sum_{l = 1}^{L} q_{l} \int_{0}^{τ} v_{l m} (β, t) s_{l, m}^{(0)} (β, t) λ_{l 0} (t) d t, v_{l m} (β, t) = s_{l, m}^{(2)} (β, t) / s_{l, m}^{(0)} (β, t) - e_{l, m} {(β, t)}^{\otimes 2}, e_{l, m} (β, t) = s_{l, m}^{(1)} (β, t) / s_{l, m}^{(0)} (β, t), s_{l, m}^{(d)} (β, t) = E {S_{l, m}^{(d)} (β, t)} f o r d = 0, 1, 2, Σ_{m}^{β} = \sum_{l = 1}^{L} q_{l} E {[\frac{ξ_{l 1}}{π_{11, l 1}} \int_{0}^{τ} {Z_{l 1} - e_{l, m} (β, t) d M_{l 1} (t)}]}^{\otimes 2}, Σ_{m}^{ϕ β} = V_{m}^{ϕ β} V_{m}^{ϕ} {(V_{m}^{ϕ β})}^{T}, V_{m}^{ϕ β} = (V_{10}^{ϕ β^{T}}, V_{01}^{ϕ β^{T}}, V_{00}^{ϕ β^{T}})^{T}, V_{a b}^{ϕ β} = \sum_{l = 1}^{L} q_{l} E [\frac{ξ_{l 1}}{π_{11, l 1}} \int_{0}^{τ} {Z_{l 1} - e_{l, m} (β, t) d M_{l 1} (t)} δ_{a b, l 1} {ξ_{l 1} - \frac{π_{11, l 1}}{π_{11, l 1} + π_{a b, l 1}}} ω_{a b, l 1}^{T}], V_{m}^{ϕ} = (\begin{matrix} V_{10, 10}^{ϕ} & V_{10, 01}^{ϕ} & V_{10, 00}^{ϕ} \\ V_{01, 10}^{ϕ} & V_{01, 01}^{ϕ} & V_{01, 00}^{ϕ} \\ V_{00, 10}^{ϕ} & V_{00, 01}^{ϕ} & V_{00, 00}^{ϕ} \end{matrix}), V_{a b, a^{'} b^{'}}^{ϕ} = \sum_{l = 1}^{L} q_{l} E (δ_{a b, l 1} δ_{a^{'} b^{'}, l 1} {ξ_{l 1} - \frac{π_{11, l 1}}{π_{11, l 1} + π_{a b, l 1}}} {ξ_{l 1} - \frac{π_{11, l 1}}{π_{11, l 1} + π_{a^{'} b^{'}, l 1}}} \times ω_{a b, l 1}^{T} ω_{a^{'} b^{'}, l 1}), d M_{l i} (t) = d N_{l i} (t) - Y_{l i} (t) \exp (β^{T} Z_{l i}) d Λ_{l 0} (t), q_{l} = \lim_{n \to \infty} n_{l} / n .

2. If $π_{a b, i}$ is known, $\sqrt{n} ({\hat{β}}_{m} - β_{m 0})$ is asymptotically normally distributed with mean 0 and covariance matrix ${I_{m} (β_{m 0})}^{- 1} Σ_{m}^{β_{m 0}} {I_{m} (β_{m 0})}^{- 1}$ .

The proof of Theorem 2 is similar to one of Theorem 1 and thus omitted. The asymptotics for more than two missing covariates can be similarly established. Now we have the main theorem on $p (θ | \hat{θ})$ as follows:

Theorem 3 Let $\hat{θ}$ be the solution to $U_{n} (θ) = 0$ . Under Conditions C1-C17, the posterior distribution $p (θ | \hat{θ})$ , generated by the two-step method above, satisfies

p (θ ∣ \hat{θ}) \to ψ_{\hat{θ}, V a r (\hat{θ})} (θ),

(14)

p (\lim_{n \to \infty} \int_{J_{n} (θ_{0})} ψ_{\hat{θ}, V a r (\hat{θ})} (θ) d θ) = 1,

(15)

where $ψ_{\hat{θ}, V a r (\hat{θ})} (\cdot)$ is the density of normal distribution with mean $\hat{θ}$ and variance $V a r (\hat{θ})$ .

Its proof is similar to the proof of Theorem 4.1 of Sang and Kim [32] and thus is omitted. Results (14) and (15) show the convergence of the posterior distribution to a normal distribution and the posterior consistency, respectively. In particular, (14) implies the confidence region from the proposed Bayesian method is asymptotically equivalent to the frequentist confidence region based on asymptotic normality of θ. Thus, our proposed Bayesian method is calibrated to frequentist inference.

The proposed Bayesian estimators for ϕ and β can be obtained by the medians of the draws from the approximated posterior distribution. Because the posterior distribution is approximately normal by Theorem 3, one can construct the confidence region using the equal-tailed credible interval (ETI) or the level-α Bayesian High Posterior Density credible region using $j^{*}$ defined as $C^{*} (α) = {θ : P (θ | \hat{θ}) \geq j^{*} (α)}$ [7].

4. Simulation

We conducted two simulation studies to investigate the finite sample properties of the approximate Bayesian method and the IPW method for stratified data. We compared them with the CC method.

In the first simulation, we considered a stratified PH model with two strata, i.e., L = 2. Two covariates were generated for each stratum: Z₁₁ from the Bernoulli distribution with probability 0.4 and Z₁₂ from the standard normal distribution for stratum 1; Z₂₁ from the Bernoulli distribution with probability 0.6 and Z₂₂ from the normal distribution with mean 1 and standard deviation 0.7 for stratum 2. Event times were generated based on the stratified PH model (1). We considered $λ_{10} (t) = 4 t^{3}$ for stratum 1 and $λ_{20} (t) = 2 / 3 t^{- 2 / 3}$ for stratum 2. We set $β = {(β_{1}, β_{2})}^{T} = {\log (2), \log (2)}^{T}$ . Independent of event times, censoring times were generated from a uniform distribution. Two overall event probabilities were examined: 50% and 70%. Some values of $Z_{l 1}$ were missing and $Z_{l 2}$ for l = 1, 2 was fully observed. The observation indicator $ξ_{l i} ’ s$ were independently generated from the Bernoulli distribution with probability $π_{l i} = \exp {ϕ_{0} + ϕ_{1} I (l = 2) + ϕ_{2} Z_{l 2} + ϕ_{3} Δ_{l i}} / {1 + \exp (ϕ_{0} + ϕ_{1} I (l = 2) + ϕ_{2} Z_{l 2} + ϕ_{3} Δ_{l i}}$ for $l = 1, 2$ , where ${(ϕ_{0}, ϕ_{1}, ϕ_{2}, ϕ_{3})}^{T} = {(1.2, - 1.5, 0.5, - 1)}^{T}$ . The missing rates were approximately 60% for stratum 1 and 40% for stratum 2, respectively. Thus, the overall rate of missingness was 50%.

Four sample sizes were considered: n = 500, 1000, and 2000. Table 1 summarizes the simulation results based on B = 1000 Monte Carlo samples. For the proposed method, we obtained 1000 posterior medians and calculated the bias of the average of the 1000 medians, their standard deviations (SD), and the average percentage that 95% ETIs include the true parameters (CR_E).s For IPW, CC methods, the bias of the average of mean, their average of standard errors (SE), and 95% coverage rates (CR) were calculated. As seen in Table 1, the average of the posterior medians from the approximate Bayesian method and the average of the IPW estimators were close to the true values. All standard deviations of the approximate Bayesian method are close to the average of standard errors of the IPW method. The range of the average percentages that 95% ETIs include the true parameters and the coverage rates of the IPW method are between 93% and 96%. These results are consistent with Theorem 3. Standard deviations and the average of standard errors are larger as event rate is lower or sample size is smaller. In contrast, the CC method has biases and low coverage rates ranged from 71% to 90% for completely observed covariate Z^c, which are well below 95%. Furthermore, this phenomenon becomes more severe as the sample size increases.

Table 1.

Simulation results for stratified survival data

		Event	Proposed method			IPW Method			CC method
	n	rate	bias	SD	CR_E	bias	SE	CR	bias	SE	CR
Z^m	500	50%	0.010	0.238	0.95	0.007	0.231	0.95	0.076	0.247	0.95
		70%	0.003	0.200	0.95	0.003	0.197	0.94	0.055	0.200	0.95
	1000	50%	0.009	0.165	0.95	0.008	0.162	0.95	0.077	0.172	0.94
		70%	0.000	0.139	0.94	0.000	0.138	0.94	0.052	0.139	0.93
	2000	50%	0.003	0.114	0.94	0.002	0.113	0.95	0.070	0.120	0.92
		70%	0.000	0.097	0.95	0.000	0.096	0.96	0.050	0.097	0.92
Z^c	500	50%	0.027	0.175	0.93	0.025	0.169	0.93	0.115	0.177	0.90
		70%	0.015	0.133	0.93	0.016	0.130	0.93	0.097	0.130	0.88
	1000	50%	0.006	0.121	0.94	0.005	0.118	0.94	0.094	0.122	0.88
		70%	0.001	0.092	0.95	0.001	0.091	0.95	0.085	0.090	0.85
	2000	50%	0.005	0.085	0.94	0.004	0.083	0.94	0.090	0.085	0.82
		70%	0.003	0.065	0.95	0.003	0.064	0.95	0.085	0.063	0.71

Open in a new tab

SD, standard deviations; SE, average of standard errors; CR_E: the equal-tailed credible interval confidence region; CR, 95% coverage rates; IPW, inverse-probability-weighted; CC, complete-case.

We also conducted simulations for correlated covariates, different β’s, and missing rates. Table S1 of the Supplementary Material summarizes the results that are similar to Table 1. When magnitude of β is larger and missing rate is higher, the CC method performed worse.

In the second simulation, we considered a stratified PH model (L = 2) with two missing covariates. We compared the proposed methods of Section 2.4 with those in Section 2.3, and the CC method. Two covariates, Z₁ and Z₂ were independently generated from the Bernoulli distribution with probabilities 0.4 and 0.5 for stratum 1 and with probabilities 0.6 and 0.4 for stratum 2. One covariate Z₃ was generated from a uniform distribution on [0, 1]. Event times were generated from the stratified PH model (1). We considered constant baseline hazard $λ_{10} (t) = 1$ for stratum 1 and $λ_{20} (t) = 2$ for stratum 2. We set $β = {(β_{1}, β_{2})}^{T} = {(0.3, 0.3, - 0.3)}^{T}$ and (0.7, 0.7, −0.7)^T. Independent of event times, censoring times were generated from a uniform distribution. The overall event probability was 55%: 47% for stratum 1 and 63% for stratum 2. Two covariates Z₁ and Z₂ were subject to missing. Define η₁ and η₂ be the observation indicator for Z₁ and Z₂, respectively. There were four possible missing categories: 1) two covariates are fully observed $(v_{11} = I (η_{1} = 1, η_{2} = 1)), 2)$ only Z₂ is missing $(v_{10} = I (η_{1} = 1, η_{2} = 0)), 3)$ only Z₁ is missing $(v_{01} = I (η_{1} = 0, η_{2} = 1))$ , 4) both Z₁ and Z₂ are missing $(v_{00} = I (η_{1} = 0, η_{2} = 0))$ . Missing indicator v_ab’s were independently generated from the multinomial distribution with probability $π_{a b, i} = \exp {ϕ_{a b}^{T} ω_{a b}} / (1 + \sum_{a = 0}^{1} \sum_{b = 0}^{1} \exp {ϕ_{a b}^{T} ω_{a b}})$ for $a, b = 0, 1$ , where $ϕ_{10} = {(ϕ_{10, 0}, ϕ_{10, 1}, ϕ_{10, 2}, ϕ_{10, 3}, ϕ_{10, 4})}^{T} = {(- 3, 2, 3, - 2, - 2)}^{T}$ , $ϕ_{01} = {(ϕ_{01, 0}, ϕ_{01, 1}, ϕ_{01, 2}, ϕ_{01, 3}, ϕ_{01, 4})}^{T} = {(- 1.2, - 2, 2, - 0.2, 0.1)}^{T}$ , $ϕ_{00} = {(ϕ_{00, 0}, ϕ_{00, 1}, ϕ_{00, 2}, ϕ_{00, 3})}^{T} = {(0.3, - 1.5, - 0.9, 0.1)}^{T}$ , $ω_{10} = {(1, Δ, Z_{1}, Z_{3}, I (l = 2))}^{T}$ , $ω_{01} = {(1, Δ, Z_{2}, Z_{3}, I (l = 2))}^{T}$ , and $ω_{00} = {(1, Δ, Z_{3}, I (l = 2))}^{T}$ . Then, overall missing probabilities for $v_{11}$ , $v_{10}$ , $v_{01}$ , and $v_{00}$ were 17%, 13%, 20%, and 50%, respectively: 17%, 14%, 29%, and 40% for stratum 1 and 17%, 11%, 12%, and 60% for stratum 2. Two sample sizes were examined: n = 1000 and 1500.

Table 2 shows that estimates for the proposed approximate Bayesian and IPW methods for multiple missing patterns are approximately unbiased. The average percentages of 95% ETIs and coverage rates of the IPW method are close to nominal level 95%. However, the CC method and the approximate Bayesian method/IPW for a single missing pattern have biases, where the coverage rates for the missing covariates $Z_{1}^{m}$ and $Z_{2}^{m}$ are from 60% to 87%. When the sample size increases, coverage rates become lower and further away from 95%.

Table 2.

Simulation results for multiple missing patterns

			MM Proposed method			MM IPW			CC method
n	β	Covariates	bias	SD	CR_E	bias	SE	CR	bias	SE	CR
1000	0.3	$Z_{1}^{m}$	0.001	0.118	0.95	0.001	0.118	0.95	−0.122	0.103	0.78
	0.3	$Z_{2}^{m}$	0.005	0.117	0.95	0.005	0.116	0.95	0.048	0.099	0.93
	−0.3	Z^c	−0.006	0.207	0.93	−0.006	0.199	0.93	0.041	0.171	0.94
	0.7	$Z_{1}^{m}$	0.007	0.121	0.94	0.007	0.119	0.94	−0.144	0.106	0.72
	0.7	$Z_{2}^{m}$	0.012	0.119	0.94	0.012	0.118	0.94	0.015	0.102	0.96
	−0.7	Z^c	−0.012	0.207	0.94	−0.012	0.200	0.94	0.067	0.174	0.92
1500	0.3	$Z_{1}^{m}$	0.003	0.095	0.94	0.003	0.097	0.95	−0.123	0.084	0.69
	0.3	$Z_{2}^{m}$	0.002	0.097	0.94	0.002	0.096	0.94	0.047	0.081	0.91
	−0.3	Z^c	−0.005	0.166	0.95	−0.005	0.164	0.96	0.041	0.138	0.94
	0.7	$Z_{1}^{m}$	0.004	0.100	0.93	0.004	0.098	0.94	−0.148	0.086	0.60
	0.7	$Z_{2}^{m}$	0.002	0.099	0.94	0.002	0.097	0.94	0.012	0.083	0.94
	−0.7	Z^c	−0.012	0.172	0.94	−0.011	0.165	0.94	0.067	0.141	0.92

			SM Proposed method			SM IPW
n	β	Covariates	bias	SD	CR_E	bias	SE	CR
1000	0.3	$Z_{1}^{m}$	−0.150	0.115	0.72	−0.150	0.113	0.75
	0.3	$Z_{2}^{m}$	0.102	0.111	0.83	0.101	0.107	0.84
	−0.3	Z^c	0.069	0.197	0.99	0.069	0.186	0.92
	0.7	$Z_{1}^{m}$	−0.143	0.115	0.74	−0.143	0.114	0.75
	0.7	$Z_{2}^{m}$	0.085	0.109	0.88	0.085	0.107	0.87
	−0.7	Z^c	0.063	0.194	0.99	0.063	0.186	0.92
1500	0.3	$Z_{1}^{m}$	−0.151	0.093	0.61	−0.151	0.093	0.63
	0.3	$Z_{2}^{m}$	0.102	0.090	0.78	0.102	0.088	0.79
	−0.3	Z^c	0.072	0.155	0.99	0.072	0.152	0.91
	0.7	$Z_{1}^{m}$	−0.146	0.096	0.64	−0.147	0.093	0.64
	0.7	$Z_{2}^{m}$	0.081	0.090	0.85	0.080	0.088	0.84
	−0.7	Z^c	0.067	0.159	0.99	0.067	0.152	0.92

Open in a new tab

MM, multiple missing patterns; SM, a single missing pattern; IPW, inverse-probability-weighted; CC, complete-case; SD, standard deviations; SE, average of standard errors; CR_E: the equal-tailed credible interval confidence region; CR, 95% coverage rates.

5. Real data application

We applied the proposed approximate Bayesian and IPW methods to the following two registry data sets: 1) the stem cell transplantation (HCT) data which Dreger et al. [11] analyzed to study patients with DLBCL; 2) the HCT data which Pidala et al. [26] investigated to study patients with myelodysplastic syndrome (MDS). The DLBCL data and the MDS data are for the stratified PH model with a single missing covariate and with two missing covariates, respectively. Thus, we applied the methods of Section 2.2 and 2.3 to the DLBCL data, and we used the methods of Section 2.4 for the MDS data.

5.1. Stratified PH model with a single missing covariate

The DLBCL data [11] consisted of 1,394 adult patients. Overall survival was an outcome of interest for the analysis. The number of patients who died and were censored are 725 (52%) and 669 (48%), respectively. Among 1394 patients, there are 127 patients (9%) with (haplo-HCT); 509 patients (37%) with MSD; 488 patients (28%) with MUD with T-cell depletion; and 370 patients (26%) with MUD without T-cell depletion. In HCT studies, clinicians are often interested in evaluating the effects of remission status at time of HCT (Complete, partial, refractory), age groups (18–49, 50–59, >60), year of transplant (2008–2010, 2011–2012, 2013–2015), hematopoietic cell transplant-comorbidity index (HCT-CI) (0, 1–2, ≥3), and Karnofsky performance score (<90, ≥90) on the outcome due to their clinical importance (Kumar et al. [21], Papanicolaou et al. [25], and Ustun et al. [37]). Thus, we adjusted these five covariates in the model. Four hundred eighty four patients (35%) have missing values in HCT-CI. We tested the PH assumption for each covariate by testing whether the coefficient of log t × Z is equal to zero for each variable [20]. Remission status at time of transplant did not satisfy the PH assumption at a significant level 0.05 (p-value = 0.0047). Thus, we stratified the PH model according to remission status.

We fitted the logistic regression to obtain the propensity score by allowing a different intercept for each stratum. Six variables including the stratum variable, year of transplant, age group, donor type, death indicator, and time to death were statistically significant at a significance level 0.05. We used these six variables to calculate propensity scores. We fitted the approximate Bayesian method, IPW method, and the CC method.

Table 3 reports the analysis result including i) the posterior median ${\tilde{β}}^{m}$ and 95% ETI (ETI) for the approximate Bayesian method; ii) $\hat{β}$ and its 95% confidence intervals for the IPW method and the CC method. As expected, the results from the approximate Bayesian method and the IPW method were similar. However, the 95% confidence intervals of the IPW method are slightly wider than the 95% ETIs of the approximate Bayesian method in general. On the other hand, the effects of donor group, HCT-CI score, and Karnofsky score from the approximate Bayesian method and the IPW method were similar to those from the CC method. However, the effects of year of transplant were different: based on the approximate Bayesian method and the IPW method, patients who had HCT from 2008 to 2010 were more likely to die after HCT than those who had HCT from 2013 to 2015. The results from the approximate Bayesian/IPW methods and CC methods are different in donor group and year of transplant. In particular, although year of transplant did not reach statistical significance in the model from CC methods, the parameter estimates for 2008-2012 were negative. Thus, the results from CC methods imply patients who got transplant in recent years (2013-2015) experienced worse survival than earlier years (2008-2012). It is common in DLBCL studies that the progress of patients who got HCT in recent years was better than those who got HCT in earlier years [22,2,33]. Thus, the results on year of transplant from the CC method are counter-intuitive. In contrast with these, the year of transplant effects from the approximate Bayesian method and the IPW method are consistent with the current medical literature. The effects of year of transplant and HCT-CI contradict the current HCT literature.

Table 3.

Stem cell transplantation data analysis for stratified survival data with a single missing covariate

	Approximate Bayesian		IPW method		CC method
Variable	${\tilde{β}}^{m}$	95 % ETI	$\hat{β}$	95 % CI	$\hat{β}$	95 % CI
Donor group
HD (ref)	0		0		0
MSD	−0.141	( −0.434 , 0.193 )	−0.143	( −0.450 , 0.164 )	−0.134	( −0.400 , 0.132 )
MUD WTD	0.052	( −0.262 , 0.401 )	0.053	( −0.282 , 0.387 )	0.038	( −0.236 , 0.313 )
MUD WOTD	−0.092	( −0.385 , 0.252 )	−0.088	( −0.411 , 0.235 )	−0.132	( −0.407 , 0.144 )
Year of transplant
2013-2015 (ref)	0		0		0
2008-2010	0.231	( 0.012 , 0.443 )	0.222	( 0.009 , 0.435 )	−0.036	( −0.222 , 0.150 )
2011-2012	0.158	( −0.070 , 0.378 )	0.152	( −0.076 , 0.381 )	−0.043	( −0.232 , 0.146 )
Age group
18-49 (ref)	0		0		0
50-59	0.104	( −0.127 , 0.336 )	0.105	( −0.130 , 0.339 )	0.130	( −0.055 , 0.316 )
>60	0.202	( −0.027 , 0.436 )	0.202	( −0.028 , 0.433 )	0.205	( 0.014 , 0.396 )
HCT-CI
0 (ref)	0		0		0
1-2	0.029	( −0.204 , 0.270 )	0.030	( −0.207 , 0.267 )	0.041	( −0.178 , 0.259 )
≥ 3	0.215	( 0.001 , 0.441 )	0.217	( −0.004 , 0.438 )	0.244	( 0.032 , 0.455 )
Karnofsky score
≥ 90 (ref)	0		0		0
< 90	0.198	( 0.010 , 0.387 )	0.199	( 0.013 , 0.385 )	0.274	( 0.120 , 0.429 )

Open in a new tab

IPW, inverse-probability-weighted; CC, complete-case; ref, reference group; ETI, equal-tailed credible interval; HD, Haploidentical donors; MSD, matched sibling donors; MUD WTD, matched unrelated donors with T-cell depletion; MUD WOTD, matched unrelated donors without T-cell depletion.

We conducted a sensitive analysis by examining various propensity score models to investigate the missing-not-at-random assumption. We observed that the progress of patients who got HCT in recent years was worse than those who got HCT in earlier years which is inconsistent with the current HCT literature. Thus, the MAR assumption appears to be reasonable for this data set.

5.2. Stratified PH model with two missing covariates

The MDS data [26] for the analysis consisted of 787 adults or children with diagnoses of MDS, who underwent first myeloablative-unrelated bone marrow or peripheral blood stem cell transplantation conducted between 1999 and 2011 patients. An outcomes of interest in this analysis was an overall survival. The number of events and censoring are 418 (53%) and 369 (47%), respectively. Two covariates including HLA-DPB1 classification according to T-cell epitope grouping (HLAD) and KPS have missing values. The original analysis which Pidala et al. [26] conducted excluded patients with HLAD missing from the analysis. The number of patients who had missing values in both HLAD and KPS, only HLAD, only KPS is 17, 330, and 45, respectively. The overall missing rate is about 50% (= 392/787 × 100). Among 787 patients, there are 64 patients (8%) with fully matched HLAD; 229 patients (29%) with Permissive HLAD; 67 patients (9%) with GvH non-permissive HLAD; 80 patients (10%) with HvG non-permissive HLAD; and 347 patients (44%) with missing HLAD. The covariates of interest include graft type (Bone marrow, Peripheral blood), race (Caucasian, others), age groups (< 20, 20—49,>50), year of transplant (1999–2002, 2003–2006, 2007–2011), and KPS (<90, ≥90). We tested the PH assumption for each covariate by testing whether the coefficient of log t × Z is equal to zero for each variable [20]. Graft type and year of transplant were not satisfied the PH assumption at a significant level 0.05. Thus, we fitted stratified PH model. We used the approximate Bayesian method/IPW method for multiple missing patterns of Section 2.4 and compared with the approximate Bayesian method/IPW method for a single missing pattern of Section 2.2 and 2.3, and the CC method. While none of covariates were significant for two missing categories for 1) only KPS missing; 2) both HLAD and KPS in the propensity score model at the significance level 0.05, three variables including year of transplant, death indicator, and time to death were significant in the propensity score model for only HLAD missing. Thus, applying the method for multiple missing patterns appeared to be more appropriate than using that for a single missing pattern.

Table 4 reports the analysis results including i) the posterior median ${\tilde{β}}^{m}$ and its 95% ETI (ETI) for the approximate Bayesian method with multiple missing patterns and a single missing pattern; ii) $\hat{β}$ and its 95% confidence intervals for the IPW method, and the CC method. Results of approximate Bayesian/IPW methods with multiple missing patterns and a single missing pattern for KPS and race were similar. However, HLAD and age group show difference in results between the approaches for multiple missing patterns and those for a single missing pattern. While the effect of permissive classification group compared with fully matched classification group is positive from the approximate Bayesian/IPW methods for multiple missing patterns, it is negative from the approximate Bayesian/IPW methods for single missing. The results from the approximate Bayesian/IPW methods for multiple missing pattern show that the 95% CI and ETI of age 20-49 group did not contain 0, but the 95% CI and ETI of age 20-49 group contained 0 when considering a single missing pattern. The results for the CC method show the 95% CI and ETI of race other group did not contain 0 while those from the approximate Bayesian/IPW methods for multiple missing patterns contained 0.

Table 4.

Stem cell transplantation data analysis for multiple missing patterns

	MM Approximate Bayesian		MM IPW method		CC method
Variable	${\tilde{β}}^{m}$	95 % ETI	$\hat{β}$	95% CI	$\hat{β}$	95% CI
HLAD
Fully matched (ref)	0		0		0
Permissive	0.076	( −0.297 , 0.536 )	0.081	( −0.317 , 0.480 )	−0.049	( −0.445 , 0.347 )
GvH non-permissive	0.354	( −0.182 , 0.845 )	0.344	( −0.151 , 0.839 )	0.286	( −0.182 , 0.753 )
HvG non-permissive	0.080	( −0.465 , 0.595 )	0.081	( −0.419 , 0.581 )	0.063	( −0.396 , 0.522 )
Karnofsky score
90 - 100% (ref)	0		0		0
< 90%	0.450	( 0.123 , 0.738 )	0.446	( 0.136 , 0.755 )	0.443	( 0.157 , 0.728 )
Race
Caucasian	0		0		0
others	0.440	( −0.235 , 0.898 )	0.438	( −0.088 , 0.963 )	0.483	( 0.018 , 0.948 )
Age group
< 20 (ref)	0		0		0
20-49	0.520	( 0.085 , 1.079 )	0.522	( 0.054 , 0.990 )	0.582	( 0.122 , 1.042 )
> 50	1.039	( 0.627 , 1.606 )	1.034	( 0.546 , 1.522 )	1.058	( 0.586 , 1.530 )

	SM Approximate Bayesian		SM IPW method
Variable	${\tilde{β}}^{m}$	95 % ETI	$\hat{β}$	95% CI
HLAD
Fully matched (ref)	0		0
Permissive	−0.014	( −0.506 , 0.582 )	−0.036	( −0.580 , 0.509 )
GvH non-permissive	0.270	( −0.369 , 0.997 )	0.309	( −0.348 , 0.965 )
HvG non-permissive	0.061	( −0.791 , 0.723 )	0.062	( −0.608 , 0.732 )
Karnofsky score
90 - 100% (ref)	0		0
< 90%	0.454	( 0.107 , 0.782 )	0.446	( 0.023 , 0.870 )
Race
Caucasian	0		0
others	0.537	( −0.553 , 1.062 )	0.466	( −0.272 , 1.205 )
Age group
< 20 (ref)	0		0
20-49	0.558	( −0.072 , 1.174 )	0.560	( −0.024 , 1.145 )
> 50	1.099	( 0.491 , 1.692 )	1.035	( 0.408 , 1.661 )

Open in a new tab

MM, multiple missing pattern; SM, single missing pattern; IPW, inverse-probability-weighted; CC, complete-case; ETI, equal-tailed credible interval; TX, transplant; HLAD, DPB1 classification according to T-cell epitope grouping; ref, reference group.

6. Concluding Remarks

We have proposed new approximate Bayesian and IPW methods for the stratified PH model with incomplete covariate information. In particular, we studied multiple missing patterns, which is largely ignored in the current literature. Using the flat prior, the proposed Bayesian method is asymptotically equivalent to the frequentist IPW inference using Taylor linearization. The proposed Bayesian method can improve its performance if the prior is informative. In this case, it may be more efficient than the frequentist IPW method. The approximate Bayesian method can be further improved by adding an augmented term to the score function for the stratified PH model similarly to Wang and Chen [39] and Xu et al. [41].

The scheme of the proposed methods can also be applied to competing risks data. Exploring the cause-specific hazards model [28] and the proportional subdistribution hazards model [12] will be an interesting research topic. We only studied missing covariates in this article. In HCT studies, it is common that a portion of either time to event or outcome indicators are also missing. Handling missingness in such outcomes would be an important research problem. For causal inference, the IPW approach is widely used to adjust for the probability of treatment assignments in observational studies. Applying the proposed Bayesian method to causal inference would be a worthy future topic. The proposed methods require specifying the propensity score model correctly. One can consider nonparametric regression for the propensity score estimation or a doubly robust estimator [30] for robust estimation. Pursuing this direction would be an important research topic in the future.

Supplementary Material

1782160_Sup_info

NIHMS1782160-supplement-1782160_Sup_info.pdf^{(104.8KB, pdf)}

Acknowledgements

We would like to thank the Associate Editor and two reviewers for their constructive comments which significantly improved the paper. This work was supported in part by the Medical College of Wisconsin Cancer Center, the Advancing a Healthier Wisconsin Endowment (Project # 5520461), and the US National Cancer Institute (U24CA076518).

Appendix

We derive Σ and its estimator in the Appendix. Let $d M_{l i} (t) = d N_{l i} (t) - Y_{l i} (t) \exp {β^{T} Z_{l i}} d Λ_{l} (t)$ . The posterior distribution is

p (η ∣ U_{n}) \sim N [(\begin{array}{l} 0 \\ 0 \end{array}), \frac{Σ}{n} = (\begin{matrix} V a r (U_{1}) & C o v (U_{1}, U_{2}) \\ C o v (U_{1}, U_{2}) & V a r (U_{2}) \end{matrix})],

(16)

where

V a r {U_{1} (β, ϕ)} = V a r [n^{- 1} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}} \int_{0}^{τ} {Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} d M_{l i} (t)] = V a r [n^{- 1} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}} \int_{0}^{τ} {Z_{l i} - e_{l} (β, t)} d M_{l i} (t)] = E {[n^{- 2} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}} \int_{0}^{τ} {Z_{l i} - e_{l} (β, t)} d M_{l i} (t)]}^{\otimes 2} .

We can estimate $V a r {U_{1} (β, ϕ)}$ given $β$ and $ϕ$ as follows:

\hat{V a r} {U_{1} (β, ϕ)} = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}^{2}} \int_{0}^{τ} {[{Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} {d N_{l i} (t) - d {\hat{Λ}}_{0 l} (t)}]}^{\otimes 2},

where $d {\hat{Λ}}_{0 l} (t) = \sum_{i = 1}^{n_{l}} d N_{l i} (t) / n_{l} S_{l}^{(0)} (β, t)$ .

We can obtain $\hat{V a r} (U_{2})$ given $β$ and $ϕ$ as follows:

\hat{V a r} (U_{2}) = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} π_{l i} (1 - π_{l i}) ω_{l i} ω_{l i}^{T} .

Next, $C o v (U_{1}, U_{2})$ given $β$ and $ϕ$ can be estimated by

\hat{C o v} (U_{1}, U_{2}) = \hat{C o v} [\frac{1}{n} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i}}{π_{l i}} \int_{0}^{τ} {Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} d M_{l i} (t), \frac{1}{n} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} {ξ_{l i} - π_{l i}} ω_{l i}^{T}] = \frac{1}{n^{2}} \sum_{l = 1}^{L} \sum_{i = 1}^{n_{l}} \frac{ξ_{l i} (1 - π_{l i})}{π_{l i}} \int_{0}^{τ} {Z_{l i} - \frac{S_{l}^{(1)} (β, t)}{S_{l}^{(0)} (β, t)}} d M_{l i} (t) \times ω_{l i}^{T} .

The estimator $\hat{Σ}$ is

\frac{\hat{Σ}}{n} = (\begin{matrix} \hat{V a r} (U_{1}) & \hat{C o v} (U_{1}, U_{2}) \\ \hat{C o v} (U_{1}, U_{2}) & \hat{V a r} (U_{2}) \end{matrix}) .

(17)

Footnotes

Publisher's Disclaimer: This AM is a PDF file of the manuscript accepted for publication after peer review, when applicable, but does not reflect post-acceptance improvements, or any corrections. Use of this AM is subject to the publisher’s embargo period and AM terms of use.

Supplementary material

We have provided additional simulation results in the Supplementary material.

Conflict of interest

The authors declare that they have no conflict of interest.

Contributor Information

Soyoung Kim, Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226-0509.

Jae-Kwang Kim, Department of Statistics, Iowa State University, 2438 Osborn Dr Ames, IA 50011-1090.

Kwang Woo Ahn, Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226-0509.

References

1.Andersen PK, Gill RD: Cox’s regression model for counting processes: a large sample study. The annals of statistics pp. 1100–1120 (1982) [Google Scholar]
2.Bacher U, Klyuchnikov E, Le-Rademacher J, Carreras J, Armand P, Bishop M, Bredeson C, Cairo M, Fenske T, Freytes CO, Gale R, Gibson J, Isola L, Inwards D, Laport G, Lazarus H, Maziarz R, Wiernik P, Schouten H, Slavin S, Smith S, Vose J, Waller E, Hari P: Conditioning regimens for allotransplants for diffuse large B-cell lymphoma: myeloablative or reduced intensity? Blood 120(20), 4256–62 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bartlett JW, Seaman SR, White IR, Carpenter JR, Initiative* ADN: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical methods in medical research 24(4), 462–487 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bradshaw PT, Ibrahim JG, Gammon MD: A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Statistics in medicine 29(29), 3017–3029 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Chen H, Little R: Proportional hazards regression with missing covariates. Journal of the American Statistical Associations 94, 896–908 (1999) [Google Scholar]
6.Chen MH, Ibrahim JG, Lipsitz SR: Bayesian methods for missing covariates in cure rate models. Lifetime Data Analysis 8(2), 117–146 (2002) [DOI] [PubMed] [Google Scholar]
7.Chen MH, Shao QM: Monte carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics 8(1), 69–92 (1999) [Google Scholar]
8.Chen Q, Wu H, Ware LB, Koyama T: A Bayesian approach for the Cox proportional hazards model with covariates subject to detection limit. International journal of statistics in medical research 3(1), 32 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Cox DR: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–202 (1972) [Google Scholar]
10.Cox DR: Partial likelihood. Biometrika 62, 269 (1975) [Google Scholar]
11.Dreger P, Sureda A, Ahn KW, Eapen M, Litovich C, Finel H, Boumendil A, Gopal A, Herrera AF, Schmid C, et al. : PTCy-based haploidentical vs matched related or unrelated donor reduced-intensity conditioning transplant for DLBCL. Blood Advances 3(3), 360–369 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Fine JP, Gray RJ: A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association 94(446), 496–509 (1999) [Google Scholar]
13.Hemming K, Hutton JL: Bayesian sensitivity models for missing covariates in the analysis of survival data. Journal of evaluation in clinical practice 18(2), 238–246 (2012) [DOI] [PubMed] [Google Scholar]
14.Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association 96(453), 292–302 (2001) [Google Scholar]
15.Herring AH, Ibrahim JG, Lipsitz SR: Non-ignorable missing covariate data in survival analysis: a case-study of an international breast cancer study group trial. Journal of the Royal Statistical Society: Series C (Applied Statistics) 53(2), 293–310 (2004) [Google Scholar]
16.Horvitz DG, Thompson DJ: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952) [Google Scholar]
17.Ibrahim J, Chen M, Kim S: Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Analysis 14, 496–520 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kim S, Cai J, Couper D: Improving the efficiency of estimation in the additive hazards model for stratified case–cohort design with multiple diseases. Statistics in medicine 35(2), 282–293 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kim S, Zeng D, Cai J: Analysis of multiple survival events in generalized case-cohort designs. Biometrics 74(4), 1250–1260 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Klein JP, Moschberger ML: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York, NY. (2003) [Google Scholar]
21.Kumar AJ, Kim S, Hemmer MT, Arora M, Spellman SR, Pidala JA, Couriel DR, Alousi AM, Aljurf MD, Cahn JY, et al. : Graft-versus-host disease in recipients of male unrelated donor compared with parous female sibling donor transplants. Blood advances 2(9), 1022–1031 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lazarus HM, Zhang M, Carreras J, Hayes-Lattin BM, Ataergin AS, Bitran J, Bolwell BJ, Freytes CO, Gale RP, Goldstein SC, Hale GA, Inwards DJ, Klumpp TR, Marks DI, Maziarz RT, McCarthy P, Pavlovsky S, Rizzo J, Shea T, Schouten H, Slavin S, Winter JN, Besien K.v., Vose JM, Hari PN: A comparison of HLA-identical sibling allogeneic versus autologous transplantation for diffuse large B cell lymphoma: a report from the CIBMTR. Biology of Blood and Marrow Transplantation 16(1), 35–45 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Lin D, Ying Z: Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88(424), 1341–1349 (1993) [Google Scholar]
24.Paik MC: Multiple imputation for the Cox proportional hazards model with missing covariates. Lifetime Data Analysis 3(3), 289–298 (1997) [DOI] [PubMed] [Google Scholar]
25.Papanicolaou GA, Ustun C, Young JAH, Chen M, Kim S, Woo Ahn K, Komanduri K, Lindemans C, Auletta JJ, Riches ML, et al. : Bloodstream infection due to vancomycin-resistant enterococcus is associated with increased mortality after hematopoietic cell transplantation for acute leukemia and myelodysplastic syndrome: a multicenter, retrospective cohort study. Clinical Infectious Diseases 69(10), 1771–1779 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A, et al. : Nonpermissive hla-dpb1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood, The Journal of the American Society of Hematology 124(16), 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Viña MF, Gratwohl A, et al. : Non-permissive-DPB1 mismatch among otherwise HLA-matched donor-recipient pairs results in increased overall mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation for hematologic malignancies. Blood 124, 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Prentice RL, Kalbfleisch JD Jr., A.V.P., Flournoy N, Farewell VT, Breslow NE: The analysis of failure times in the presence of competing risks. Biometrics 34, 541–554 (1978) [PubMed] [Google Scholar]
29.Pugh MG, Robins J, Lipsitz S, Harrington D: Inference in the Cox proportional hazards model with missing covariate data. Ph.D. thesis, Harvard School of Public Health Boston, MA: (1993) [Google Scholar]
30.Robins JM, Rotnitzky A, Zhao LP: Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89(427), 846–866 (1994) [Google Scholar]
31.Rubin DB: Inference and missing data. Biometrika 63(3), 581–592 (1976) [Google Scholar]
32.Sang H, Kwang Kim J: An approximate bayesian inference on propensity score estimation under unit nonresponse. Canadian Journal of Statistics (2017) [Google Scholar]
33.Shah NN, Ahn KW, Litovich C, Sureda A, Kharfan-Dabaja MA, Awan FT, Ganguly S, Gergis U, Inwards D, Karmali R, et al. : Allogeneic transplantation in elderly patients ≥ 65 years with non-hodgkin lymphoma: a time-trend analysis. Blood cancer journal 9(12), 1–10 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Sharef E, Strawderman R, Ruppert D, Cowen M, Halasyamani L: Bayesian adaptive B-spline estimation in proportional hazards frailty models. Electronic Journal of Statistics 4, 606–642 (2010) [Google Scholar]
35.Soubeyrand S, Haon-Lasportes E: Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC. Statistics & Probability Letters 107, 84–92 (2015) [Google Scholar]
36.Sun B, Tchetgen Tchetgen EJ: On inverse probability weighting for nonmonotone missing at random data. Journal of the American Statistical Association 113, 369–379 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Ustun C, Kim S, Chen M, Beitinjaneh AM, Brown VI, Dahi PB, Daly A, Diaz MA, Freytes CO, Ganguly S, et al. Increased overall and bacterial infections following myeloablative allogeneic hct for patients with aml in cr1. Blood advances 3(17), 2525–2536 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Fernandez-Vina MA, Gajewski J, Pidala J, Munker R, et al. : HLA mismatch is associated with worse outcomes after unrelated donor reduced-intensity conditioning hematopoietic cell transplantation: an analysis from the Center for International Blood and Marrow Transplant Research. Biology of Blood and Marrow Transplantation 21(10), 1783–1789 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Wang C, Chen HY: Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57(2), 414–419 (2001) [DOI] [PubMed] [Google Scholar]
40.White IR, Royston P: Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15), 1982–1998 (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Xu Q, Paik MC, Luo X, Tsai WY: Reweighting estimators for Cox regression with missing covariates. Journal of the American Statistical Association 104(487), 1155–1167 (2009) [Google Scholar]
42.Yoo H, Lee JW: Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation. Communications for Statistical Applications and Methods 25(2), 159–172 (2018) [Google Scholar]
43.Yuan KH, Jennrich RI: Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis 65(2), 245–260 (1998) [Google Scholar]
44.Zhou H, Pepe MS: Auxiliary covariate data in failure time regression. Biometrika 82(1), 139–149 (1995) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1782160_Sup_info

NIHMS1782160-supplement-1782160_Sup_info.pdf^{(104.8KB, pdf)}

[R1] 1.Andersen PK, Gill RD: Cox’s regression model for counting processes: a large sample study. The annals of statistics pp. 1100–1120 (1982) [Google Scholar]

[R2] 2.Bacher U, Klyuchnikov E, Le-Rademacher J, Carreras J, Armand P, Bishop M, Bredeson C, Cairo M, Fenske T, Freytes CO, Gale R, Gibson J, Isola L, Inwards D, Laport G, Lazarus H, Maziarz R, Wiernik P, Schouten H, Slavin S, Smith S, Vose J, Waller E, Hari P: Conditioning regimens for allotransplants for diffuse large B-cell lymphoma: myeloablative or reduced intensity? Blood 120(20), 4256–62 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Bartlett JW, Seaman SR, White IR, Carpenter JR, Initiative* ADN: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical methods in medical research 24(4), 462–487 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Bradshaw PT, Ibrahim JG, Gammon MD: A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Statistics in medicine 29(29), 3017–3029 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Chen H, Little R: Proportional hazards regression with missing covariates. Journal of the American Statistical Associations 94, 896–908 (1999) [Google Scholar]

[R6] 6.Chen MH, Ibrahim JG, Lipsitz SR: Bayesian methods for missing covariates in cure rate models. Lifetime Data Analysis 8(2), 117–146 (2002) [DOI] [PubMed] [Google Scholar]

[R7] 7.Chen MH, Shao QM: Monte carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics 8(1), 69–92 (1999) [Google Scholar]

[R8] 8.Chen Q, Wu H, Ware LB, Koyama T: A Bayesian approach for the Cox proportional hazards model with covariates subject to detection limit. International journal of statistics in medical research 3(1), 32 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Cox DR: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–202 (1972) [Google Scholar]

[R10] 10.Cox DR: Partial likelihood. Biometrika 62, 269 (1975) [Google Scholar]

[R11] 11.Dreger P, Sureda A, Ahn KW, Eapen M, Litovich C, Finel H, Boumendil A, Gopal A, Herrera AF, Schmid C, et al. : PTCy-based haploidentical vs matched related or unrelated donor reduced-intensity conditioning transplant for DLBCL. Blood Advances 3(3), 360–369 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Fine JP, Gray RJ: A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association 94(446), 496–509 (1999) [Google Scholar]

[R13] 13.Hemming K, Hutton JL: Bayesian sensitivity models for missing covariates in the analysis of survival data. Journal of evaluation in clinical practice 18(2), 238–246 (2012) [DOI] [PubMed] [Google Scholar]

[R14] 14.Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association 96(453), 292–302 (2001) [Google Scholar]

[R15] 15.Herring AH, Ibrahim JG, Lipsitz SR: Non-ignorable missing covariate data in survival analysis: a case-study of an international breast cancer study group trial. Journal of the Royal Statistical Society: Series C (Applied Statistics) 53(2), 293–310 (2004) [Google Scholar]

[R16] 16.Horvitz DG, Thompson DJ: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952) [Google Scholar]

[R17] 17.Ibrahim J, Chen M, Kim S: Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Analysis 14, 496–520 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Kim S, Cai J, Couper D: Improving the efficiency of estimation in the additive hazards model for stratified case–cohort design with multiple diseases. Statistics in medicine 35(2), 282–293 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Kim S, Zeng D, Cai J: Analysis of multiple survival events in generalized case-cohort designs. Biometrics 74(4), 1250–1260 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Klein JP, Moschberger ML: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York, NY. (2003) [Google Scholar]

[R21] 21.Kumar AJ, Kim S, Hemmer MT, Arora M, Spellman SR, Pidala JA, Couriel DR, Alousi AM, Aljurf MD, Cahn JY, et al. : Graft-versus-host disease in recipients of male unrelated donor compared with parous female sibling donor transplants. Blood advances 2(9), 1022–1031 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Lazarus HM, Zhang M, Carreras J, Hayes-Lattin BM, Ataergin AS, Bitran J, Bolwell BJ, Freytes CO, Gale RP, Goldstein SC, Hale GA, Inwards DJ, Klumpp TR, Marks DI, Maziarz RT, McCarthy P, Pavlovsky S, Rizzo J, Shea T, Schouten H, Slavin S, Winter JN, Besien K.v., Vose JM, Hari PN: A comparison of HLA-identical sibling allogeneic versus autologous transplantation for diffuse large B cell lymphoma: a report from the CIBMTR. Biology of Blood and Marrow Transplantation 16(1), 35–45 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Lin D, Ying Z: Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88(424), 1341–1349 (1993) [Google Scholar]

[R24] 24.Paik MC: Multiple imputation for the Cox proportional hazards model with missing covariates. Lifetime Data Analysis 3(3), 289–298 (1997) [DOI] [PubMed] [Google Scholar]

[R25] 25.Papanicolaou GA, Ustun C, Young JAH, Chen M, Kim S, Woo Ahn K, Komanduri K, Lindemans C, Auletta JJ, Riches ML, et al. : Bloodstream infection due to vancomycin-resistant enterococcus is associated with increased mortality after hematopoietic cell transplantation for acute leukemia and myelodysplastic syndrome: a multicenter, retrospective cohort study. Clinical Infectious Diseases 69(10), 1771–1779 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A, et al. : Nonpermissive hla-dpb1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood, The Journal of the American Society of Hematology 124(16), 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Viña MF, Gratwohl A, et al. : Non-permissive-DPB1 mismatch among otherwise HLA-matched donor-recipient pairs results in increased overall mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation for hematologic malignancies. Blood 124, 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Prentice RL, Kalbfleisch JD Jr., A.V.P., Flournoy N, Farewell VT, Breslow NE: The analysis of failure times in the presence of competing risks. Biometrics 34, 541–554 (1978) [PubMed] [Google Scholar]

[R29] 29.Pugh MG, Robins J, Lipsitz S, Harrington D: Inference in the Cox proportional hazards model with missing covariate data. Ph.D. thesis, Harvard School of Public Health Boston, MA: (1993) [Google Scholar]

[R30] 30.Robins JM, Rotnitzky A, Zhao LP: Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89(427), 846–866 (1994) [Google Scholar]

[R31] 31.Rubin DB: Inference and missing data. Biometrika 63(3), 581–592 (1976) [Google Scholar]

[R32] 32.Sang H, Kwang Kim J: An approximate bayesian inference on propensity score estimation under unit nonresponse. Canadian Journal of Statistics (2017) [Google Scholar]

[R33] 33.Shah NN, Ahn KW, Litovich C, Sureda A, Kharfan-Dabaja MA, Awan FT, Ganguly S, Gergis U, Inwards D, Karmali R, et al. : Allogeneic transplantation in elderly patients ≥ 65 years with non-hodgkin lymphoma: a time-trend analysis. Blood cancer journal 9(12), 1–10 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Sharef E, Strawderman R, Ruppert D, Cowen M, Halasyamani L: Bayesian adaptive B-spline estimation in proportional hazards frailty models. Electronic Journal of Statistics 4, 606–642 (2010) [Google Scholar]

[R35] 35.Soubeyrand S, Haon-Lasportes E: Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC. Statistics & Probability Letters 107, 84–92 (2015) [Google Scholar]

[R36] 36.Sun B, Tchetgen Tchetgen EJ: On inverse probability weighting for nonmonotone missing at random data. Journal of the American Statistical Association 113, 369–379 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Ustun C, Kim S, Chen M, Beitinjaneh AM, Brown VI, Dahi PB, Daly A, Diaz MA, Freytes CO, Ganguly S, et al. Increased overall and bacterial infections following myeloablative allogeneic hct for patients with aml in cr1. Blood advances 3(17), 2525–2536 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Fernandez-Vina MA, Gajewski J, Pidala J, Munker R, et al. : HLA mismatch is associated with worse outcomes after unrelated donor reduced-intensity conditioning hematopoietic cell transplantation: an analysis from the Center for International Blood and Marrow Transplant Research. Biology of Blood and Marrow Transplantation 21(10), 1783–1789 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Wang C, Chen HY: Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57(2), 414–419 (2001) [DOI] [PubMed] [Google Scholar]

[R40] 40.White IR, Royston P: Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15), 1982–1998 (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Xu Q, Paik MC, Luo X, Tsai WY: Reweighting estimators for Cox regression with missing covariates. Journal of the American Statistical Association 104(487), 1155–1167 (2009) [Google Scholar]

[R42] 42.Yoo H, Lee JW: Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation. Communications for Statistical Applications and Methods 25(2), 159–172 (2018) [Google Scholar]

[R43] 43.Yuan KH, Jennrich RI: Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis 65(2), 245–260 (1998) [Google Scholar]

[R44] 44.Zhou H, Pepe MS: Auxiliary covariate data in failure time regression. Biometrika 82(1), 139–149 (1995) [Google Scholar]

PERMALINK

A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Soyoung Kim

Jae-Kwang Kim

Kwang Woo Ahn

Abstract

1. Introduction

2. Model and estimation

2.1. Model definitions and assumptions

2.2. Inverse Probability Weighted Estimators

2.3. Approximate Bayesian approach

2.4. Estimators with multiple missing covariates patterns

3. Asymptotic properties

4. Simulation

Table 1.

Table 2.

5. Real data application

5.1. Stratified PH model with a single missing covariate

Table 3.

5.2. Stratified PH model with two missing covariates

Table 4.

6. Concluding Remarks

Supplementary Material

Acknowledgements

Appendix

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Soyoung Kim

Jae-Kwang Kim

Kwang Woo Ahn

Abstract

1. Introduction

2. Model and estimation

2.1. Model definitions and assumptions

2.2. Inverse Probability Weighted Estimators

2.3. Approximate Bayesian approach

2.4. Estimators with multiple missing covariates patterns

3. Asymptotic properties

4. Simulation

Table 1.

Table 2.

5. Real data application

5.1. Stratified PH model with a single missing covariate

Table 3.

5.2. Stratified PH model with two missing covariates

Table 4.

6. Concluding Remarks

Supplementary Material

Acknowledgements

Appendix

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases