Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: Lifetime Data Anal. 2022 Jan 16;28(2):169–193. doi: 10.1007/s10985-021-09542-4

A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Soyoung Kim 1,*, Jae-Kwang Kim 2, Kwang Woo Ahn 3
PMCID: PMC8977246  NIHMSID: NIHMS1782160  PMID: 35034213

Abstract

Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set.

Keywords: Bayesian computation, Cox model, Missing data, Posterior distribution, Survival data

1. Introduction

The proportional hazards (PH) model [9] is widely used to evaluate covariate effects on survival outcome. In many biomedical studies, covariate information is often incompletely observed for some subjects due to loss of follow-up, loss of hospital records, or study design. For example, Dreger el al. [11] studied 1,394 patients who were aged 18 years or older, had relapsed diffuse large B-cell lymphoma (DLBCL), and had undergone their first non-myeloablative or reduced-intensity conditioning allogeneic stem cell transplantation between 2008 and 2015. They compared the effect of the following donor types on overall survival: haploidentical family donors using post-transplant cyclophosphamide (PTCy), matched sibling donors (MSD) / matched unrelated donors (MUD) with or without T-cell depletion. There were missing records in hematopoietic cell transplant-comorbidity index. In addition, the PH assumption was not valid for remission status at time of transplant.

One simple way, but a commonly used method, to handle such incomplete data is excluding missing data from the analysis, which is called the complete-case (CC) method. However, the CC method may lead to biased parameter estimation when the missing mechanism is related to outcome variables. A more sensible way to handle missing data is using the propensity score weighting, which is based on a model for the response probability. Once the response probabilities are estimated, the inverse of the estimated response probability is applied as the weight for estimating parameters of interest.

Extensive work for the PH model with missing covariates has been done under the frequentist framework. Lin and Ying [23] proposed a pseudo-likelihood score function to handle missing covariates under the missing-completely-at-random assumption. Zhou and Pepe [44] and Chen and Little [5] proposed an estimated partial-likelihood method using auxiliary covariates and a nonparametric maximum likelihood method, respectively. Pugh et al. [29] proposed an inverse-probability-weighted (IPW) estimating equation when the missing mechanism is missing at random (MAR) in the sense of Rubin [31]. Wang and Chen [39] and Xu et al. [41] considered an augmented inverse probability weighting (AIPW) scheme [30] under the MAR assumption. Herring and Ibrahim [14] and Herring et al. [15] introduced Monte-Carlo-Expectation-Maximization-algorithm-based methods to handle missing covariates under the MAR assumption and non-ignorable missing, respectively. Multiple imputation for the PH model has also been explored [24,40,3]. However, all of these methods are limited to the non-stratified PH model. Because it is common that the PH assumption is not satisfied for some covariates [11,38,27] or stratified sampling is used for data collection [18,19], it is crucial to study the stratified PH model for missing data. In addition, many of these existing methods rely on Taylor expansion to study the asymptotic properties of the estimators and thus obtaining variance estimates of the estimators can be complicated in practice.

Besides the frequentist approaches, several Bayesian approaches were also proposed to handle missing covariates for survival data. Chen et al. [6] proposed a general class of informative priors for semi-parametric survival models incorporating a cure fraction. Yoo and Lee [42] extended the Bayesian adaptive B-spline estimation method of Sharef et al. [34] to the clustered survival data with missing covariates. Hemming and Hutton [13] considered Markov Chain Monte Carlo (MCMC) to handle missing covariates for the accelerated failure time model under the MAR assumption. For the PH model, Ibrahim et al. [17] and Bradshaw et al. [4] studied variable selection and non-ignorably missing time-varying covariates, respectively, under the MAR assumption. Chen et al. [8] considered covariates with detection limit. In general, the current Bayesian literature for missing covariates for the PH model requires complicated MCMC algorithms that generate correlated samples and the burn-in period. Finding straightforward and less computationally intensive methods for survival outcomes with missing data is desirable in practice. Furthermore, like the frequentist approaches discussed previously, these Bayesian methods did not consider the stratified PH model.

Sang and Kim [32] recently proposed an approximate Bayesian method to handle unit nonresponse for outcomes. Their Bayesian method is calibrated to the frequentist inference in that the credible region obtained from the posterior distribution asymptotically matches the frequentist confidence interval. Although their method is an approximate method based on the asymptotic normality of the score function, it is attractive in practice due to its simplicity and good performance in parameter estimation. More specifically, their algorithm generates independent samples for the posterior distribution of parameters and does not require MCMC for posterior computation.

Survival data often have multiple missing covariates. For example, Pidala et al. [26] studied the impact of DPB1 T-cell epitope matching on the hematopoietic cell transplantation outcomes for patients with leukemia and myelodysplastic syndrome. There were two important variables with missing values: DPB1 T-cell epitope matching and Karnofsky performance score (KPS). Existing literature considers one missing mechanism to handle these two missing covariates. However, the missing mechanism for DPB1 T-cell epitope matching could be different from one for KPS. In addition, cases having missing values in both variables could have a different missing mechanism from those with missing values in one of the two variables. This important aspect has been largely ignored in the current literature.

Motivated by Sang and Kim [32], we propose an approximate Bayesian approach for survival data with missing covariates. We consider the stratified PH model so that it can be used even for data with the non-proportional hazard structure. The proposed method uses two score equations, one from the stratified PH model and the other from the response model, to construct the likelihood part of the posterior distribution. With a flat prior, the credible interval from the posterior distribution asymptotically matches the confidence interval of the frequentist IPW approach. In contrast with the frequentist approach, the proposed method does not involve Taylor expansion. Sampling from the posterior distribution is straightforward to implement. We also propose new methods to take account of multiple covariates missing patterns. We study the asymptotic properties of the IPW estimators in frequentist inference for the stratified PH model with known and unknown missing mechanisms. Simulation studies and applications to the data of Dreger et al. [11] and Pidala et al. [26] are also provided.

2. Model and estimation

2.1. Model definitions and assumptions

Assume the full cohort consists of n subjects and there are L strata, where L is fixed. Let Tli be the failure time, Cli be the potential censoring time, and Zli=(Zli1,,Zlip)T be a p×1 time-independent covariate vector for subject i in stratum l for l=1,,L, i=1,,nl, where nl is the number of subjects in stratum l. Let Xli=min(Tli,Cli) denote the observed time in the full cohort and Δli=I(TliCli) be an event indicator. The study period is [0,τ]. We consider the stratified PH model: for subject i in stratum l, the hazard function λli() associated with Zli is

λli(tZli)=λ0l(t)eβ0TZli, (1)

where λ0l(t) is a baseline hazard function for stratum l and β0 is an unknown parameter vector. We assume Tli is independent of Cli given Zli [9,10].

Next, we introduce notation and assumptions for missing covariates. Let Zli=(Zlic,Zlim), where Zlic and Zlim are the complete covariate vector and the missing covariate vector for subject i in stratum l, respectively. Let ξli be the observation indicator for subject i in stratum l:ξli=1 if Zli is fully observed and ξli=0 if some elements of Zli are missing. Assume (Tli,Cli,Zli,ξli) for i=1,,nl within stratum l are independently and identically distributed. In addition, (Tli,Cli,Zli,ξli) and (Tli,Cli,Zli,ξli) are assumed to be independent when ll. The observed data for stratum l is (Xli,Δli,Zlic,ξli) for i=1,,nl. Define Wli=(Xli,Δli,Zlic) for i=1,,nl and l=1,,L. We assume that Zlim is missing at random in that the probability of observing missing covariates is conditionally independent of Zlim given Wli. Let Nli(t)=I(Xlit,Δli=1) be the counting process for the observed failure time, and Yli(t)=I(Xlit) denote the at-risk indicator for subject i in stratum l, where I() is an indicator function.

2.2. Inverse Probability Weighted Estimators

The inverse probability weighted (IPW) estimator based on Horvitz and Thompson [16] adjusts for missing covariates by using the inverse of the response probability as the weight [39,41]. We assume the ξli within stratum l is independently generated from a Bernoulli distribution and allow a different intercept for each stratum:

πli=Pr(ξli=1Wli)=exp(ϕTωli)1+exp(ϕTωli), (2)

where ωli=(1,I(l=2),,I(l=L),WliT)T. Let ϕ0 be the true parameter vector of ϕ. To obtain the IPW estimator, we consider

U1,n(β,ϕ)=1nl=1Li=1nlξliπli0τ{ZliSl(1)(β,t)Sl(0)(β,t)}dNli(t)=0, (3)
U2,n(ϕ)=1nl=1Li=1nl{ξliπli}ωliT=0, (4)

where Sl(d)(β,t)=n1i=1nlξliπli1Yli(t)ZlideβTZli for d=0,1 and a0=1, a1=a, a2=aaT. The two functions U1,n(β,ϕ) and U2,n(ϕ) are the score functions from the stratified PH model and the logistic regression, respectively. For the frequentist IPW approach under the MAR assumption, one first estimates response probabilities πlis by solving (4) and then plugs the estimated πlis into (3) to estimate β. Let the solution to (3) be β^. In practice, one relies on Taylor expansion to obtain the asymptotic normality of β^ [41].

2.3. Approximate Bayesian approach

We propose an approximate Bayesian approach in this section. Define θ=(βT,ϕT)T and Un(θ)=(U1,nT(β,ϕ),U2,nT(ϕ))T. Let θ^ be the solution to Un(θ)=0. Instead of directly generating the posterior distribution p(θ|data), similarly to Soubeyrand and Haon-Lasportes [35], we use an approximation to p(θ|data), that is, p(θ|θ^) as follows:

p(θθ^)p(θ^θ)p(θ),

where p(θ^|θ) is the sampling distribution of θ^ and p(θ) is the prior distribution for θ. However, studying p(θ^|θ) requires Taylor expansion, as in Theorem 1.

To avoid Taylor expansion, instead of generating from p(θ|θ^), we alternatively consider

p(θUn)p(Un(θ)θ)p(θ), (5)

where p(Un(θ)|θ) is the sampling distribution of Un(θ). To generate samples from (5), we consider a one-to-one transformation T:θη such that η=E(Un|θ). Then, we generate η from p(η|Un) and obtain θ=T1(η) as samples for the posterior distribution in (5). As in the next section, under some regularity conditions, the asymptotic distribution of Un is

n{Un(θ)η(θ)}θdN(0,Σ), (6)

where d is convergence in distribution and Σ is the asymptotic covariance matrix of the joint score functions. Since the transformation T:θη is one-to-one, (6) is equivalent to

n{Un(T1η)η}ηdN(0,Σ). (7)

Then, the posterior distribution of η given Un is

p(ηUn)p(Unη)p(η), (8)

where p(Un|η) is the density of the limiting distribution in (7). Equation (8) shows an important relationship between the frequentist IPW approach and the Bayesian approach. Under a flat prior for p(η) and the sufficient conditions for (7), we can approximate the posterior distribution p(η|Un) as follows:

p(ηUn)N(Un,Σ/n). (9)

As shown in the Appendix, the estimator of Σ, Σ^, can be obtained by

Σ^n=(Var^(U1)Cov^(U1,U2)Cov^(U1,U2)Var^(U2)),Var^{U1(β,ϕ)}=1n2l=1Li=1nlξliπli20τ[{ZliSl(1)(β,t)Sl(0)(β,t)}Mli^(t)]2,Var^(U2)=1n2l=1Li=1nlπli(1πli)ωliωliT,dΛ^0l(t)=i=1nldNli(t)/{nlSl(0)(β,t)},Cov^(U1,U2)=1n2l=1Li=1nlξli(1πli)πli0τ{ZliSl(1)(β,t)Sl(0)(β,t)}dMli(t)×ωliT,Σ^c=Var^(U1)Cov^(U1,U2)Var^1(U2)Cov^(U1,U2)T,dM^li(t)=dNli(t)Yli(t)eβTZlidΛ^0l(t).

Under the flat prior for p(η), we propose the following algorithm to generate samples from the posterior distribution p(θ|θ^) as follows:

  1. Generate η2 from the approximate posterior distribution p(η2|U2,n=0), that is, N(0,Var^(U2)).

  2. Solve U2,n(ϕ)=η2 with respect to ϕ to obtain ϕ.

  3. Generate η1 from the approximate posterior distribution p(η1|U1,n(ϕ)=0), that is, N(0,Σ^c).

  4. Solve U1,n(β)=η1 with respect to β to obtain β.

  5. Repeat above steps.

The Newton-Raphson algorithm or a root-finding algorithm can be used to solve Un(θ)=η in Step 2 and Step 4. Thus, the algorithm is straightforward to implement. Based on our simulation results, 1000 repetitions appear to be enough for statistical inference. Using the above algorithm, independent samples from the approximate posterior distribution p(θ|θ^) are generated and thus there is no burn-in period.

2.4. Estimators with multiple missing covariates patterns

The methods in Sections 2.2 and 2.3 consider a single missing mechanism. There are not directly applicable when there are multiple missing covariates having different missing patterns. We propose estimators for survival data with multiple missing patterns in this section. We describe the proposed method for the stratified PH model with two missing covariates and two strata for simplicity. Suppose two covariates, Z1im and Z2im, are subject to missingness for individual i. Let ηki=1 if Zkim is observed and ηki=0 if Zkim is missing for k = 1, 2. Denote Z=(Z1m,Z2m)T and O=(Δ,X,(Zc)T)T, where Zc is a completely observed covariate vector. We divide the data into 4 groups: i) both Z1m and Z2m are observed; ii) Z1m is observed and Z2m is missing; iii) Z1m is missing and Z2m is observed; and iv) both Z1m and Z2m are missing. Let ω10=(1,Z1m,OT)T, ω01=(1,Z2m,OT)T, and ω00=(1,OT)T. Thus, ω10, ω01, and ω00 correspond to ii), iii), and iv), respectively. Define

r11(Z,O)=1,ifη1=1,η2=1,r10(Z,O)=exp(ϕ10Tω10),ifη1=1,η2=0,r01(Z,O)=exp(ϕ01Tω01),ifη1=0,η2=1,r00(Z,O)=exp(ϕ00Tω00),ifη1=0,η2=0, (10)

where

ϕ10Tω10=ϕ10,0+ϕ10,1Z1m+ϕ10,2Δ+ϕ10,3X+ϕ10,4TZc+ϕ10,5TI(stratum=2),ϕ01Tω01=ϕ01,0+ϕ01,1Z2m+ϕ01,2Δ+ϕ01,3X+ϕ01,4TZc+ϕ01,5TI(stratum=2),ϕ00Tω00=ϕ00,0+ϕ00,1Δ+ϕ00,2X+ϕ00,3TZc+ϕ00,4TI(stratum=2).

Equation (10) satisfies the MAR assumption. Model (10) is a model for the ratio of the missing probability to the baseline:

rab(Z,O)=P(η1=a,η2=bZ,O)P(η1=1,η2=1Z,O).

A similar idea has been considered in Sun and Tchetgen Tchetgen [36]. Then, the propensity score is

πab=P(η1=a,η2=bZ,O)=rab(Z,O)a=01b=01rab(Z,O).

Let ϕ0 be the true parameter vector of ϕ=(ϕ10T,ϕ01T,ϕ00T)T. To obtain the IPW estimator, we consider

U1,nm(βm,ϕ)=1nl=1Li=1nlξliπ11,li0τ{ZliSl,m(1)(βm,t)Sl,m(0)(βm,t)}dNli(t)=0, (11)
U2,nm(ϕ)={U10,nT(ϕ10),U01,nT(ϕ01),U00,nT(ϕ00)}T=0, (12)
Uab,n(ϕab)=1Nabl=1Li=1nlηab,li{ξliπ11,liπ11,li+πab,li}ωab,liT,

where ξli=I(η1=1,η2=1), ηab,li=I(η1li=a,η2li=borη1li=1,η2li=1), Sl,m(k)(βm,t)=nl1i=1nlξliπ11,li1Yli(t)Zlikeβl,mTZli, and Nab is the subgroup sample size with η1=a and η2=b. Let the solution to (11) be β^m. Similarly to Section 2.3, the estimator of Σm, Σ^m, can be obtained by

Σ^mn=(Var^(U1m)Cov^(U1m,U2m)Cov^(U1m,U2m)Var^(U2m)),Var^{U1m(β,ϕ)}=1n2l=1Li=1nlξliπ11,li20τ[{ZliSl,m(1)(β,t)Sl,m(0)(β,t)}dM^li(t)]2,Var^(Uab)=1Nab2l=1Li=1nlηab,li(ξliπ11,liπ11,li+πab,li)2ωab,liωab,liT,dΛ^l0(t)=i=1nldNli(t)/{nSl,m(0)(β,t)},dM^li(t)=dNli(t)Yli(t)eβTZlidΛ^l0(t)Cov^(U1m,U2m)={U1,nm}TU2,nm.

Step 1 - Step 5 in Section 2.3 can be similarly applicable to the approximate Bayesian approach for multiple missing patterns.

3. Asymptotic properties

We now study the asymptotic properties of p(θ|θ^) in this section. To establish the consistency and the asymptotic normality of the IPW estimator for the stratified PH model, we assume the following conditions:

C1 P{Yli(t)=1}>0 for t[0,τ], l=1,,L and i=1,,nl;

C2 |Zlik(0)|+0τ|dZlik(t)|<Dz<, l=1,,L, i=1,,nl and k=1,,p almost surely where Dz is a constant;

C3 For d = 0, 1, 2, there exists a neighborhood B of β0 such that sl(d)(β,t) is continuous and supt[0,τ],βBSl(d)(β,t)sl(d)(β,t)p0 for l=1,,L, where p denotes convergence in probability;

C4 The matrix Il(β)=0τvl(β,t)sl(0)(β,t)λ0l(t)dt is positive definite for l=1,,L, where vl(β,t)=sl(2)(β,t)/sl(0)(β,t)el(β,t)2 and el(β,t)=sl(1)(β,t)/sl(0)(β,t);

C5 The matrix Vlϕ is positive definite and πliϵ>0 for i=1,,nl and l=1,,L, where Vlϕ=E{(ξl1πl1)ωl1T}2;

C6 For all βB, t[0,τ], Sl(1)(β,t)=Sl(0)(β,t)/β, and Sl(2)(β,t)=2Sl(0)(β,t)/ββT, where Sl(d)(β,t), d=0,1,2 are continuous functions of βB uniformly in t[0,τ] and are bounded on B×[0,τ], sl(0) is bounded away from zero on B×[0,τ];

C7 0τλ0l(t)dt< for l=1,,L;

C8 limnnl/n=ql, where ql(0,1) for all l=1,,L;

C9 As n, supθΘUn(θ)η(θ)p0, where Θ is the parameter space;

C10 The map θUn(θ) is continuous and has exactly one zero θ^ with probability one;

C11 Equation η(θ)=0 has exactly one root at θ=θ0;

C12 There exists a neighborhood of θ0, denoted by Jn(θ0), on which with probability one all Un(θ) are continuously differentiable and the Jacobian Un(θ)/θ converges uniformly to a non-stochastic limit which is non-singular. Here, Jn(θ0) is a ball with center θ0 and radius rn satisfies rn and rnn;

C13 For any θJn(θ0), given θ:

n{Un(θ)η(θ)}dN(0,Σ(θ))

holds for some Σ(θ)=Var{nUn(θ)|θ} that is positive definite and independent of n.

Conditions C1–C8 are the standard conditions for the consistency and asymptotic normality of β^ [1,41]. Conditions C9–C13 are needed for the asymptotic properties of the joint IPW estimator. More specifically, as long as the samples satisfy some moment conditions, condition C9 holds. Conditions C10 and C11 ensure the existence and uniqueness of the solutions to Un=0. Condition C12 regulates the derivatives of Un and ensures its covariance converges. Condition C13 provides the asymptotic distribution for the estimating equation. The proof of C13 can be found in Theorem 6 of Yuan and Jennrich (1998) [43], where Yuan and Jennrich (1998) [43] studied the large sample properties including the existence, strong consistency, and asymptotic normality of the estimators generated from samples that are not necessarily identically distributed under very general assumptions. Under Conditions C1-C13 we can show θ^ is a consistent estimator for θ and asymptotically normally distributed with mean 0 and covariance matrix B1(θ0)Σ(θ0)B1(θ0) where B(θ)=η(θ)/θ.

Following Sang and Kim [32], we assume the following conditions to establish the posterior consistency and asymptotic normality.

C14 The prior ηπ(η) is positive and Lipschitz continuous over the parameter space;

C15 For θJn(θ0), the variance estimator Σ^(θ) satisfies Σ^(θ)=Σ(θ){1+op(1)} where Σ^(θ) is provided in the Appendix;

C16 For any θJn(θ0), the mapping θ|Σ(θ)|1 is Lipschitz continuous. Also, the mapping θxT{Σ(θ)}1x is Lipschitz continuous in the sense that there exists a constant C(x) satisfying xT{Σ(θ1)}1xxT{Σ(θ2)}1xC(x)θ1θ2, for any θ1,θ2Jn(θ0), for all xRp, where p=dim(Z). And C(x) is also Lipschitz continuous;

C17 θUn(θ) and θη(θ) are one-to-one functions for any θJn(θ0). Also θη(θ) is Lipschitz continuous.

Condition C14 is a standard assumption for the prior and the flat prior satisfies this condition. Condition C15 implies the covariance estimator should be consistent. Conditions C16 to C17 are the sufficient conditions for the posterior distribution to be approximated by the proposed method. Soubeyrand and Haon-Lasportes [35] also used similar conditions to C14 and C16 to justify their approximate Bayesian computation methods. All the conditions can be easily satisfied if we assume covariance estimator is Lipschitz continuous in θ and has bounded eigenvalues as discussed in Sang and Kim [32].

Similarly to Xu et al. [41], we can establish the following asymptotic property for β^ under the stratified PH model:

Theorem 1 Assume Conditions C1-C8 in Section 3.

1. Assume πli is unknown and correctly specified. Then, β^ is consistent for β, and n(β^β0) is asymptotically normally distributed with mean 0 and covariance matrix

Vβ={I(β0)}1{Σβ0Σϕ0β0}{I(β0)}1, (13)

where

I(β)=l=1LqlIl(β),Il(β)=0τvl(β,t)sl(0)(β,t)λ0l(t)dt,vl(β,t)=sl(2)(β,t)/sl(0)(β,t)el(β,t)2,el(β,t)=sl(2)(β,t)/sl(0)(β,t),sl(d)(β,t)=E{Sl(d)(β,t)}ford=0,1,2,Σβ=l=1LqlE[ξl1πl10τ{Zl1el1(β,t)dMl1(t)}]2,Σϕβ=l=1LqlVlϕβVlϕ(Vlϕβ)T,Vlϕβ=E[ξl1πl120τ{Zl1el1(β,t)dMl1(t)}ϕTπl1],Vlϕ=E{(ξl1πl1)ωl1T}2,dMli(t)=dNli(t)Yli(t)exp(βTZli)dΛ0l(t),limnnl/n=ql.

2. If πli is known, n(β^β0) is asymptotically normally distributed with mean 0 and covariance matrix {I(β0)}1Σβ0{I(β0)}1.

Its proof is a straightforward extension of Theorem 2 of Xu et al. [41] to the stratified PH model and thus omitted. Using (13), one can develop a plug-in variance estimator of β^, but it can involve a quite computation.

Next we have the following asymptotic property for estimators from the stratified PH model with two missing covariates having multiple missing patterns as in Section 2.4.

Theorem 2 Assume Conditions C1-C8 in Section 3.

1. Assume πab,li is unknown and correctly specified. Then, β^m is consistent for βm, and n(β^mβm0) is asymptotically normally distributed with mean 0 and covariance matrix

Vβm={Im(βm0)}1{Σmβm0Σmϕm0βm0}{Im(βm0)}1,

where

Im(β)=l=1Lql0τvlm(β,t)sl,m(0)(β,t)λl0(t)dt,vlm(β,t)=sl,m(2)(β,t)/sl,m(0)(β,t)el,m(β,t)2,el,m(β,t)=sl,m(1)(β,t)/sl,m(0)(β,t),sl,m(d)(β,t)=E{Sl,m(d)(β,t)}ford=0,1,2,Σmβ=l=1LqlE[ξl1π11,l10τ{Zl1el,m(β,t)dMl1(t)}]2,Σmϕβ=VmϕβVmϕ(Vmϕβ)T,Vmϕβ=(V10ϕβT,V01ϕβT,V00ϕβT)T,Vabϕβ=l=1LqlE[ξl1π11,l10τ{Zl1el,m(β,t)dMl1(t)}δab,l1{ξl1π11,l1π11,l1+πab,l1}ωab,l1T],Vmϕ=(V10,10ϕV10,01ϕV10,00ϕV01,10ϕV01,01ϕV01,00ϕV00,10ϕV00,01ϕV00,00ϕ),Vab,abϕ=l=1LqlE(δab,l1δab,l1{ξl1π11,l1π11,l1+πab,l1}{ξl1π11,l1π11,l1+πab,l1}×ωab,l1Tωab,l1),dMli(t)=dNli(t)Yli(t)exp(βTZli)dΛl0(t),ql=limnnl/n.

2. If πab,i is known, n(β^mβm0) is asymptotically normally distributed with mean 0 and covariance matrix {Im(βm0)}1Σmβm0{Im(βm0)}1.

The proof of Theorem 2 is similar to one of Theorem 1 and thus omitted. The asymptotics for more than two missing covariates can be similarly established. Now we have the main theorem on p(θ|θ^) as follows:

Theorem 3 Let θ^ be the solution to Un(θ)=0. Under Conditions C1-C17, the posterior distribution p(θ|θ^), generated by the two-step method above, satisfies

p(θθ^)ψθ^,Var(θ^)(θ), (14)
p(limnJn(θ0)ψθ^,Var(θ^)(θ)dθ)=1, (15)

where ψθ^,Var(θ^)() is the density of normal distribution with mean θ^ and variance Var(θ^).

Its proof is similar to the proof of Theorem 4.1 of Sang and Kim [32] and thus is omitted. Results (14) and (15) show the convergence of the posterior distribution to a normal distribution and the posterior consistency, respectively. In particular, (14) implies the confidence region from the proposed Bayesian method is asymptotically equivalent to the frequentist confidence region based on asymptotic normality of θ. Thus, our proposed Bayesian method is calibrated to frequentist inference.

The proposed Bayesian estimators for ϕ and β can be obtained by the medians of the draws from the approximated posterior distribution. Because the posterior distribution is approximately normal by Theorem 3, one can construct the confidence region using the equal-tailed credible interval (ETI) or the level-α Bayesian High Posterior Density credible region using j defined as C(α)={θ:P(θ|θ^)j(α)} [7].

4. Simulation

We conducted two simulation studies to investigate the finite sample properties of the approximate Bayesian method and the IPW method for stratified data. We compared them with the CC method.

In the first simulation, we considered a stratified PH model with two strata, i.e., L = 2. Two covariates were generated for each stratum: Z11 from the Bernoulli distribution with probability 0.4 and Z12 from the standard normal distribution for stratum 1; Z21 from the Bernoulli distribution with probability 0.6 and Z22 from the normal distribution with mean 1 and standard deviation 0.7 for stratum 2. Event times were generated based on the stratified PH model (1). We considered λ10(t)=4t3 for stratum 1 and λ20(t)=2/3t2/3 for stratum 2. We set β=(β1,β2)T={log(2),log(2)}T. Independent of event times, censoring times were generated from a uniform distribution. Two overall event probabilities were examined: 50% and 70%. Some values of Zl1 were missing and Zl2 for l = 1, 2 was fully observed. The observation indicator ξlis were independently generated from the Bernoulli distribution with probability πli=exp{ϕ0+ϕ1I(l=2)+ϕ2Zl2+ϕ3Δli}/{1+exp(ϕ0+ϕ1I(l=2)+ϕ2Zl2+ϕ3Δli} for l=1,2, where (ϕ0,ϕ1,ϕ2,ϕ3)T=(1.2,1.5,0.5,1)T. The missing rates were approximately 60% for stratum 1 and 40% for stratum 2, respectively. Thus, the overall rate of missingness was 50%.

Four sample sizes were considered: n = 500, 1000, and 2000. Table 1 summarizes the simulation results based on B = 1000 Monte Carlo samples. For the proposed method, we obtained 1000 posterior medians and calculated the bias of the average of the 1000 medians, their standard deviations (SD), and the average percentage that 95% ETIs include the true parameters (CRE).s For IPW, CC methods, the bias of the average of mean, their average of standard errors (SE), and 95% coverage rates (CR) were calculated. As seen in Table 1, the average of the posterior medians from the approximate Bayesian method and the average of the IPW estimators were close to the true values. All standard deviations of the approximate Bayesian method are close to the average of standard errors of the IPW method. The range of the average percentages that 95% ETIs include the true parameters and the coverage rates of the IPW method are between 93% and 96%. These results are consistent with Theorem 3. Standard deviations and the average of standard errors are larger as event rate is lower or sample size is smaller. In contrast, the CC method has biases and low coverage rates ranged from 71% to 90% for completely observed covariate Zc, which are well below 95%. Furthermore, this phenomenon becomes more severe as the sample size increases.

Table 1.

Simulation results for stratified survival data

Event Proposed method IPW Method CC method
n rate bias SD CRE bias SE CR bias SE CR
Zm 500 50% 0.010 0.238 0.95 0.007 0.231 0.95 0.076 0.247 0.95
70% 0.003 0.200 0.95 0.003 0.197 0.94 0.055 0.200 0.95
1000 50% 0.009 0.165 0.95 0.008 0.162 0.95 0.077 0.172 0.94
70% 0.000 0.139 0.94 0.000 0.138 0.94 0.052 0.139 0.93
2000 50% 0.003 0.114 0.94 0.002 0.113 0.95 0.070 0.120 0.92
70% 0.000 0.097 0.95 0.000 0.096 0.96 0.050 0.097 0.92
Zc 500 50% 0.027 0.175 0.93 0.025 0.169 0.93 0.115 0.177 0.90
70% 0.015 0.133 0.93 0.016 0.130 0.93 0.097 0.130 0.88
1000 50% 0.006 0.121 0.94 0.005 0.118 0.94 0.094 0.122 0.88
70% 0.001 0.092 0.95 0.001 0.091 0.95 0.085 0.090 0.85
2000 50% 0.005 0.085 0.94 0.004 0.083 0.94 0.090 0.085 0.82
70% 0.003 0.065 0.95 0.003 0.064 0.95 0.085 0.063 0.71

SD, standard deviations; SE, average of standard errors; CRE: the equal-tailed credible interval confidence region; CR, 95% coverage rates; IPW, inverse-probability-weighted; CC, complete-case.

We also conducted simulations for correlated covariates, different β’s, and missing rates. Table S1 of the Supplementary Material summarizes the results that are similar to Table 1. When magnitude of β is larger and missing rate is higher, the CC method performed worse.

In the second simulation, we considered a stratified PH model (L = 2) with two missing covariates. We compared the proposed methods of Section 2.4 with those in Section 2.3, and the CC method. Two covariates, Z1 and Z2 were independently generated from the Bernoulli distribution with probabilities 0.4 and 0.5 for stratum 1 and with probabilities 0.6 and 0.4 for stratum 2. One covariate Z3 was generated from a uniform distribution on [0, 1]. Event times were generated from the stratified PH model (1). We considered constant baseline hazard λ10(t)=1 for stratum 1 and λ20(t)=2 for stratum 2. We set β=(β1,β2)T=(0.3,0.3,0.3)T and (0.7, 0.7, −0.7)T. Independent of event times, censoring times were generated from a uniform distribution. The overall event probability was 55%: 47% for stratum 1 and 63% for stratum 2. Two covariates Z1 and Z2 were subject to missing. Define η1 and η2 be the observation indicator for Z1 and Z2, respectively. There were four possible missing categories: 1) two covariates are fully observed (v11=I(η1=1,η2=1)),2) only Z2 is missing (v10=I(η1=1,η2=0)),3) only Z1 is missing (v01=I(η1=0,η2=1)), 4) both Z1 and Z2 are missing (v00=I(η1=0,η2=0)). Missing indicator vab’s were independently generated from the multinomial distribution with probability πab,i=exp{ϕabTωab}/(1+a=01b=01exp{ϕabTωab}) for a,b=0,1, where ϕ10=(ϕ10,0,ϕ10,1,ϕ10,2,ϕ10,3,ϕ10,4)T=(3,2,3,2,2)T, ϕ01=(ϕ01,0,ϕ01,1,ϕ01,2,ϕ01,3,ϕ01,4)T=(1.2,2,2,0.2,0.1)T, ϕ00=(ϕ00,0,ϕ00,1,ϕ00,2,ϕ00,3)T=(0.3,1.5,0.9,0.1)T, ω10=(1,Δ,Z1,Z3,I(l=2))T, ω01=(1,Δ,Z2,Z3,I(l=2))T, and ω00=(1,Δ,Z3,I(l=2))T. Then, overall missing probabilities for v11, v10, v01, and v00 were 17%, 13%, 20%, and 50%, respectively: 17%, 14%, 29%, and 40% for stratum 1 and 17%, 11%, 12%, and 60% for stratum 2. Two sample sizes were examined: n = 1000 and 1500.

Table 2 shows that estimates for the proposed approximate Bayesian and IPW methods for multiple missing patterns are approximately unbiased. The average percentages of 95% ETIs and coverage rates of the IPW method are close to nominal level 95%. However, the CC method and the approximate Bayesian method/IPW for a single missing pattern have biases, where the coverage rates for the missing covariates Z1m and Z2m are from 60% to 87%. When the sample size increases, coverage rates become lower and further away from 95%.

Table 2.

Simulation results for multiple missing patterns

MM Proposed method MM IPW CC method
n β Covariates bias SD CRE bias SE CR bias SE CR
1000 0.3 Z1m 0.001 0.118 0.95 0.001 0.118 0.95 −0.122 0.103 0.78
0.3 Z2m 0.005 0.117 0.95 0.005 0.116 0.95 0.048 0.099 0.93
−0.3 Zc −0.006 0.207 0.93 −0.006 0.199 0.93 0.041 0.171 0.94
0.7 Z1m 0.007 0.121 0.94 0.007 0.119 0.94 −0.144 0.106 0.72
0.7 Z2m 0.012 0.119 0.94 0.012 0.118 0.94 0.015 0.102 0.96
−0.7 Zc −0.012 0.207 0.94 −0.012 0.200 0.94 0.067 0.174 0.92
1500 0.3 Z1m 0.003 0.095 0.94 0.003 0.097 0.95 −0.123 0.084 0.69
0.3 Z2m 0.002 0.097 0.94 0.002 0.096 0.94 0.047 0.081 0.91
−0.3 Zc −0.005 0.166 0.95 −0.005 0.164 0.96 0.041 0.138 0.94
0.7 Z1m 0.004 0.100 0.93 0.004 0.098 0.94 −0.148 0.086 0.60
0.7 Z2m 0.002 0.099 0.94 0.002 0.097 0.94 0.012 0.083 0.94
−0.7 Zc −0.012 0.172 0.94 −0.011 0.165 0.94 0.067 0.141 0.92
SM Proposed method SM IPW
n β Covariates bias SD CRE bias SE CR
1000 0.3 Z1m −0.150 0.115 0.72 −0.150 0.113 0.75
0.3 Z2m 0.102 0.111 0.83 0.101 0.107 0.84
−0.3 Zc 0.069 0.197 0.99 0.069 0.186 0.92
0.7 Z1m −0.143 0.115 0.74 −0.143 0.114 0.75
0.7 Z2m 0.085 0.109 0.88 0.085 0.107 0.87
−0.7 Zc 0.063 0.194 0.99 0.063 0.186 0.92
1500 0.3 Z1m −0.151 0.093 0.61 −0.151 0.093 0.63
0.3 Z2m 0.102 0.090 0.78 0.102 0.088 0.79
−0.3 Zc 0.072 0.155 0.99 0.072 0.152 0.91
0.7 Z1m −0.146 0.096 0.64 −0.147 0.093 0.64
0.7 Z2m 0.081 0.090 0.85 0.080 0.088 0.84
−0.7 Zc 0.067 0.159 0.99 0.067 0.152 0.92

MM, multiple missing patterns; SM, a single missing pattern; IPW, inverse-probability-weighted; CC, complete-case; SD, standard deviations; SE, average of standard errors; CRE: the equal-tailed credible interval confidence region; CR, 95% coverage rates.

5. Real data application

We applied the proposed approximate Bayesian and IPW methods to the following two registry data sets: 1) the stem cell transplantation (HCT) data which Dreger et al. [11] analyzed to study patients with DLBCL; 2) the HCT data which Pidala et al. [26] investigated to study patients with myelodysplastic syndrome (MDS). The DLBCL data and the MDS data are for the stratified PH model with a single missing covariate and with two missing covariates, respectively. Thus, we applied the methods of Section 2.2 and 2.3 to the DLBCL data, and we used the methods of Section 2.4 for the MDS data.

5.1. Stratified PH model with a single missing covariate

The DLBCL data [11] consisted of 1,394 adult patients. Overall survival was an outcome of interest for the analysis. The number of patients who died and were censored are 725 (52%) and 669 (48%), respectively. Among 1394 patients, there are 127 patients (9%) with (haplo-HCT); 509 patients (37%) with MSD; 488 patients (28%) with MUD with T-cell depletion; and 370 patients (26%) with MUD without T-cell depletion. In HCT studies, clinicians are often interested in evaluating the effects of remission status at time of HCT (Complete, partial, refractory), age groups (18–49, 50–59, >60), year of transplant (2008–2010, 2011–2012, 2013–2015), hematopoietic cell transplant-comorbidity index (HCT-CI) (0, 1–2, ≥3), and Karnofsky performance score (<90, ≥90) on the outcome due to their clinical importance (Kumar et al. [21], Papanicolaou et al. [25], and Ustun et al. [37]). Thus, we adjusted these five covariates in the model. Four hundred eighty four patients (35%) have missing values in HCT-CI. We tested the PH assumption for each covariate by testing whether the coefficient of log t × Z is equal to zero for each variable [20]. Remission status at time of transplant did not satisfy the PH assumption at a significant level 0.05 (p-value = 0.0047). Thus, we stratified the PH model according to remission status.

We fitted the logistic regression to obtain the propensity score by allowing a different intercept for each stratum. Six variables including the stratum variable, year of transplant, age group, donor type, death indicator, and time to death were statistically significant at a significance level 0.05. We used these six variables to calculate propensity scores. We fitted the approximate Bayesian method, IPW method, and the CC method.

Table 3 reports the analysis result including i) the posterior median β˜m and 95% ETI (ETI) for the approximate Bayesian method; ii) β^ and its 95% confidence intervals for the IPW method and the CC method. As expected, the results from the approximate Bayesian method and the IPW method were similar. However, the 95% confidence intervals of the IPW method are slightly wider than the 95% ETIs of the approximate Bayesian method in general. On the other hand, the effects of donor group, HCT-CI score, and Karnofsky score from the approximate Bayesian method and the IPW method were similar to those from the CC method. However, the effects of year of transplant were different: based on the approximate Bayesian method and the IPW method, patients who had HCT from 2008 to 2010 were more likely to die after HCT than those who had HCT from 2013 to 2015. The results from the approximate Bayesian/IPW methods and CC methods are different in donor group and year of transplant. In particular, although year of transplant did not reach statistical significance in the model from CC methods, the parameter estimates for 2008-2012 were negative. Thus, the results from CC methods imply patients who got transplant in recent years (2013-2015) experienced worse survival than earlier years (2008-2012). It is common in DLBCL studies that the progress of patients who got HCT in recent years was better than those who got HCT in earlier years [22,2,33]. Thus, the results on year of transplant from the CC method are counter-intuitive. In contrast with these, the year of transplant effects from the approximate Bayesian method and the IPW method are consistent with the current medical literature. The effects of year of transplant and HCT-CI contradict the current HCT literature.

Table 3.

Stem cell transplantation data analysis for stratified survival data with a single missing covariate

Approximate Bayesian IPW method CC method
Variable β˜m 95 % ETI β^ 95 % CI β^ 95 % CI
Donor group
HD (ref) 0 0 0
MSD −0.141 ( −0.434 , 0.193 ) −0.143 ( −0.450 , 0.164 ) −0.134 ( −0.400 , 0.132 )
MUD WTD 0.052 ( −0.262 , 0.401 ) 0.053 ( −0.282 , 0.387 ) 0.038 ( −0.236 , 0.313 )
MUD WOTD −0.092 ( −0.385 , 0.252 ) −0.088 ( −0.411 , 0.235 ) −0.132 ( −0.407 , 0.144 )
Year of transplant
2013-2015 (ref) 0 0 0
2008-2010 0.231 ( 0.012 , 0.443 ) 0.222 ( 0.009 , 0.435 ) −0.036 ( −0.222 , 0.150 )
2011-2012 0.158 ( −0.070 , 0.378 ) 0.152 ( −0.076 , 0.381 ) −0.043 ( −0.232 , 0.146 )
Age group
18-49 (ref) 0 0 0
50-59 0.104 ( −0.127 , 0.336 ) 0.105 ( −0.130 , 0.339 ) 0.130 ( −0.055 , 0.316 )
>60 0.202 ( −0.027 , 0.436 ) 0.202 ( −0.028 , 0.433 ) 0.205 ( 0.014 , 0.396 )
HCT-CI
0 (ref) 0 0 0
1-2 0.029 ( −0.204 , 0.270 ) 0.030 ( −0.207 , 0.267 ) 0.041 ( −0.178 , 0.259 )
≥ 3 0.215 ( 0.001 , 0.441 ) 0.217 ( −0.004 , 0.438 ) 0.244 ( 0.032 , 0.455 )
Karnofsky score
≥ 90 (ref) 0 0 0
< 90 0.198 ( 0.010 , 0.387 ) 0.199 ( 0.013 , 0.385 ) 0.274 ( 0.120 , 0.429 )

IPW, inverse-probability-weighted; CC, complete-case; ref, reference group; ETI, equal-tailed credible interval; HD, Haploidentical donors; MSD, matched sibling donors; MUD WTD, matched unrelated donors with T-cell depletion; MUD WOTD, matched unrelated donors without T-cell depletion.

We conducted a sensitive analysis by examining various propensity score models to investigate the missing-not-at-random assumption. We observed that the progress of patients who got HCT in recent years was worse than those who got HCT in earlier years which is inconsistent with the current HCT literature. Thus, the MAR assumption appears to be reasonable for this data set.

5.2. Stratified PH model with two missing covariates

The MDS data [26] for the analysis consisted of 787 adults or children with diagnoses of MDS, who underwent first myeloablative-unrelated bone marrow or peripheral blood stem cell transplantation conducted between 1999 and 2011 patients. An outcomes of interest in this analysis was an overall survival. The number of events and censoring are 418 (53%) and 369 (47%), respectively. Two covariates including HLA-DPB1 classification according to T-cell epitope grouping (HLAD) and KPS have missing values. The original analysis which Pidala et al. [26] conducted excluded patients with HLAD missing from the analysis. The number of patients who had missing values in both HLAD and KPS, only HLAD, only KPS is 17, 330, and 45, respectively. The overall missing rate is about 50% (= 392/787 × 100). Among 787 patients, there are 64 patients (8%) with fully matched HLAD; 229 patients (29%) with Permissive HLAD; 67 patients (9%) with GvH non-permissive HLAD; 80 patients (10%) with HvG non-permissive HLAD; and 347 patients (44%) with missing HLAD. The covariates of interest include graft type (Bone marrow, Peripheral blood), race (Caucasian, others), age groups (< 20, 20—49,>50), year of transplant (1999–2002, 2003–2006, 2007–2011), and KPS (<90, ≥90). We tested the PH assumption for each covariate by testing whether the coefficient of log t × Z is equal to zero for each variable [20]. Graft type and year of transplant were not satisfied the PH assumption at a significant level 0.05. Thus, we fitted stratified PH model. We used the approximate Bayesian method/IPW method for multiple missing patterns of Section 2.4 and compared with the approximate Bayesian method/IPW method for a single missing pattern of Section 2.2 and 2.3, and the CC method. While none of covariates were significant for two missing categories for 1) only KPS missing; 2) both HLAD and KPS in the propensity score model at the significance level 0.05, three variables including year of transplant, death indicator, and time to death were significant in the propensity score model for only HLAD missing. Thus, applying the method for multiple missing patterns appeared to be more appropriate than using that for a single missing pattern.

Table 4 reports the analysis results including i) the posterior median β˜m and its 95% ETI (ETI) for the approximate Bayesian method with multiple missing patterns and a single missing pattern; ii) β^ and its 95% confidence intervals for the IPW method, and the CC method. Results of approximate Bayesian/IPW methods with multiple missing patterns and a single missing pattern for KPS and race were similar. However, HLAD and age group show difference in results between the approaches for multiple missing patterns and those for a single missing pattern. While the effect of permissive classification group compared with fully matched classification group is positive from the approximate Bayesian/IPW methods for multiple missing patterns, it is negative from the approximate Bayesian/IPW methods for single missing. The results from the approximate Bayesian/IPW methods for multiple missing pattern show that the 95% CI and ETI of age 20-49 group did not contain 0, but the 95% CI and ETI of age 20-49 group contained 0 when considering a single missing pattern. The results for the CC method show the 95% CI and ETI of race other group did not contain 0 while those from the approximate Bayesian/IPW methods for multiple missing patterns contained 0.

Table 4.

Stem cell transplantation data analysis for multiple missing patterns

MM Approximate Bayesian MM IPW method CC method
Variable β˜m 95 % ETI β^ 95% CI β^ 95% CI
HLAD
Fully matched (ref) 0 0 0
Permissive 0.076 ( −0.297 , 0.536 ) 0.081 ( −0.317 , 0.480 ) −0.049 ( −0.445 , 0.347 )
GvH non-permissive 0.354 ( −0.182 , 0.845 ) 0.344 ( −0.151 , 0.839 ) 0.286 ( −0.182 , 0.753 )
HvG non-permissive 0.080 ( −0.465 , 0.595 ) 0.081 ( −0.419 , 0.581 ) 0.063 ( −0.396 , 0.522 )
Karnofsky score
90 - 100% (ref) 0 0 0
< 90% 0.450 ( 0.123 , 0.738 ) 0.446 ( 0.136 , 0.755 ) 0.443 ( 0.157 , 0.728 )
Race
Caucasian 0 0 0
others 0.440 ( −0.235 , 0.898 ) 0.438 ( −0.088 , 0.963 ) 0.483 ( 0.018 , 0.948 )
Age group
< 20 (ref) 0 0 0
20-49 0.520 ( 0.085 , 1.079 ) 0.522 ( 0.054 , 0.990 ) 0.582 ( 0.122 , 1.042 )
> 50 1.039 ( 0.627 , 1.606 ) 1.034 ( 0.546 , 1.522 ) 1.058 ( 0.586 , 1.530 )
SM Approximate Bayesian SM IPW method
Variable β˜m 95 % ETI β^ 95% CI
HLAD
Fully matched (ref) 0 0
Permissive −0.014 ( −0.506 , 0.582 ) −0.036 ( −0.580 , 0.509 )
GvH non-permissive 0.270 ( −0.369 , 0.997 ) 0.309 ( −0.348 , 0.965 )
HvG non-permissive 0.061 ( −0.791 , 0.723 ) 0.062 ( −0.608 , 0.732 )
Karnofsky score
90 - 100% (ref) 0 0
< 90% 0.454 ( 0.107 , 0.782 ) 0.446 ( 0.023 , 0.870 )
Race
Caucasian 0 0
others 0.537 ( −0.553 , 1.062 ) 0.466 ( −0.272 , 1.205 )
Age group
< 20 (ref) 0 0
20-49 0.558 ( −0.072 , 1.174 ) 0.560 ( −0.024 , 1.145 )
> 50 1.099 ( 0.491 , 1.692 ) 1.035 ( 0.408 , 1.661 )

MM, multiple missing pattern; SM, single missing pattern; IPW, inverse-probability-weighted; CC, complete-case; ETI, equal-tailed credible interval; TX, transplant; HLAD, DPB1 classification according to T-cell epitope grouping; ref, reference group.

6. Concluding Remarks

We have proposed new approximate Bayesian and IPW methods for the stratified PH model with incomplete covariate information. In particular, we studied multiple missing patterns, which is largely ignored in the current literature. Using the flat prior, the proposed Bayesian method is asymptotically equivalent to the frequentist IPW inference using Taylor linearization. The proposed Bayesian method can improve its performance if the prior is informative. In this case, it may be more efficient than the frequentist IPW method. The approximate Bayesian method can be further improved by adding an augmented term to the score function for the stratified PH model similarly to Wang and Chen [39] and Xu et al. [41].

The scheme of the proposed methods can also be applied to competing risks data. Exploring the cause-specific hazards model [28] and the proportional subdistribution hazards model [12] will be an interesting research topic. We only studied missing covariates in this article. In HCT studies, it is common that a portion of either time to event or outcome indicators are also missing. Handling missingness in such outcomes would be an important research problem. For causal inference, the IPW approach is widely used to adjust for the probability of treatment assignments in observational studies. Applying the proposed Bayesian method to causal inference would be a worthy future topic. The proposed methods require specifying the propensity score model correctly. One can consider nonparametric regression for the propensity score estimation or a doubly robust estimator [30] for robust estimation. Pursuing this direction would be an important research topic in the future.

Supplementary Material

1782160_Sup_info

Acknowledgements

We would like to thank the Associate Editor and two reviewers for their constructive comments which significantly improved the paper. This work was supported in part by the Medical College of Wisconsin Cancer Center, the Advancing a Healthier Wisconsin Endowment (Project # 5520461), and the US National Cancer Institute (U24CA076518).

Appendix

We derive Σ and its estimator in the Appendix. Let dMli(t)=dNli(t)Yli(t)exp{βTZli}dΛl(t). The posterior distribution is

p(ηUn)N[(00),Σn=(Var(U1)Cov(U1,U2)Cov(U1,U2)Var(U2))], (16)

where

Var{U1(β,ϕ)}=Var[n1l=1Li=1nlξliπli0τ{ZliSl(1)(β,t)Sl(0)(β,t)}dMli(t)]=Var[n1l=1Li=1nlξliπli0τ{Zliel(β,t)}dMli(t)]=E[n2l=1Li=1nlξliπli0τ{Zliel(β,t)}dMli(t)]2.

We can estimate Var{U1(β,ϕ)} given β and ϕ as follows:

Var^{U1(β,ϕ)}=1n2l=1Li=1nlξliπli20τ[{ZliSl(1)(β,t)Sl(0)(β,t)}{dNli(t)dΛ^0l(t)}]2,

where dΛ^0l(t)=i=1nldNli(t)/nlSl(0)(β,t).

We can obtain Var^(U2) given β and ϕ as follows:

Var^(U2)=1n2l=1Li=1nlπli(1πli)ωliωliT.

Next, Cov(U1,U2) given β and ϕ can be estimated by

Cov^(U1,U2)=Cov^[1nl=1Li=1nlξliπli0τ{ZliSl(1)(β,t)Sl(0)(β,t)}dMli(t),1nl=1Li=1nl{ξliπli}ωliT]=1n2l=1Li=1nlξli(1πli)πli0τ{ZliSl(1)(β,t)Sl(0)(β,t)}dMli(t)×ωliT.

The estimator Σ^ is

Σ^n=(Var^(U1)Cov^(U1,U2)Cov^(U1,U2)Var^(U2)). (17)

Footnotes

Publisher's Disclaimer: This AM is a PDF file of the manuscript accepted for publication after peer review, when applicable, but does not reflect post-acceptance improvements, or any corrections. Use of this AM is subject to the publisher’s embargo period and AM terms of use.

Supplementary material

We have provided additional simulation results in the Supplementary material.

Conflict of interest

The authors declare that they have no conflict of interest.

Contributor Information

Soyoung Kim, Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226-0509.

Jae-Kwang Kim, Department of Statistics, Iowa State University, 2438 Osborn Dr Ames, IA 50011-1090.

Kwang Woo Ahn, Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI 53226-0509.

References

  • 1.Andersen PK, Gill RD: Cox’s regression model for counting processes: a large sample study. The annals of statistics pp. 1100–1120 (1982) [Google Scholar]
  • 2.Bacher U, Klyuchnikov E, Le-Rademacher J, Carreras J, Armand P, Bishop M, Bredeson C, Cairo M, Fenske T, Freytes CO, Gale R, Gibson J, Isola L, Inwards D, Laport G, Lazarus H, Maziarz R, Wiernik P, Schouten H, Slavin S, Smith S, Vose J, Waller E, Hari P: Conditioning regimens for allotransplants for diffuse large B-cell lymphoma: myeloablative or reduced intensity? Blood 120(20), 4256–62 (2012) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bartlett JW, Seaman SR, White IR, Carpenter JR, Initiative* ADN: Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical methods in medical research 24(4), 462–487 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bradshaw PT, Ibrahim JG, Gammon MD: A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Statistics in medicine 29(29), 3017–3029 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chen H, Little R: Proportional hazards regression with missing covariates. Journal of the American Statistical Associations 94, 896–908 (1999) [Google Scholar]
  • 6.Chen MH, Ibrahim JG, Lipsitz SR: Bayesian methods for missing covariates in cure rate models. Lifetime Data Analysis 8(2), 117–146 (2002) [DOI] [PubMed] [Google Scholar]
  • 7.Chen MH, Shao QM: Monte carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics 8(1), 69–92 (1999) [Google Scholar]
  • 8.Chen Q, Wu H, Ware LB, Koyama T: A Bayesian approach for the Cox proportional hazards model with covariates subject to detection limit. International journal of statistics in medical research 3(1), 32 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cox DR: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–202 (1972) [Google Scholar]
  • 10.Cox DR: Partial likelihood. Biometrika 62, 269 (1975) [Google Scholar]
  • 11.Dreger P, Sureda A, Ahn KW, Eapen M, Litovich C, Finel H, Boumendil A, Gopal A, Herrera AF, Schmid C, et al. : PTCy-based haploidentical vs matched related or unrelated donor reduced-intensity conditioning transplant for DLBCL. Blood Advances 3(3), 360–369 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Fine JP, Gray RJ: A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association 94(446), 496–509 (1999) [Google Scholar]
  • 13.Hemming K, Hutton JL: Bayesian sensitivity models for missing covariates in the analysis of survival data. Journal of evaluation in clinical practice 18(2), 238–246 (2012) [DOI] [PubMed] [Google Scholar]
  • 14.Herring AH, Ibrahim JG: Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association 96(453), 292–302 (2001) [Google Scholar]
  • 15.Herring AH, Ibrahim JG, Lipsitz SR: Non-ignorable missing covariate data in survival analysis: a case-study of an international breast cancer study group trial. Journal of the Royal Statistical Society: Series C (Applied Statistics) 53(2), 293–310 (2004) [Google Scholar]
  • 16.Horvitz DG, Thompson DJ: A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47(260), 663–685 (1952) [Google Scholar]
  • 17.Ibrahim J, Chen M, Kim S: Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Analysis 14, 496–520 (2008) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kim S, Cai J, Couper D: Improving the efficiency of estimation in the additive hazards model for stratified case–cohort design with multiple diseases. Statistics in medicine 35(2), 282–293 (2016) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kim S, Zeng D, Cai J: Analysis of multiple survival events in generalized case-cohort designs. Biometrics 74(4), 1250–1260 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Klein JP, Moschberger ML: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York, NY. (2003) [Google Scholar]
  • 21.Kumar AJ, Kim S, Hemmer MT, Arora M, Spellman SR, Pidala JA, Couriel DR, Alousi AM, Aljurf MD, Cahn JY, et al. : Graft-versus-host disease in recipients of male unrelated donor compared with parous female sibling donor transplants. Blood advances 2(9), 1022–1031 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lazarus HM, Zhang M, Carreras J, Hayes-Lattin BM, Ataergin AS, Bitran J, Bolwell BJ, Freytes CO, Gale RP, Goldstein SC, Hale GA, Inwards DJ, Klumpp TR, Marks DI, Maziarz RT, McCarthy P, Pavlovsky S, Rizzo J, Shea T, Schouten H, Slavin S, Winter JN, Besien K.v., Vose JM, Hari PN: A comparison of HLA-identical sibling allogeneic versus autologous transplantation for diffuse large B cell lymphoma: a report from the CIBMTR. Biology of Blood and Marrow Transplantation 16(1), 35–45 (2010) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lin D, Ying Z: Cox regression with incomplete covariate measurements. Journal of the American Statistical Association 88(424), 1341–1349 (1993) [Google Scholar]
  • 24.Paik MC: Multiple imputation for the Cox proportional hazards model with missing covariates. Lifetime Data Analysis 3(3), 289–298 (1997) [DOI] [PubMed] [Google Scholar]
  • 25.Papanicolaou GA, Ustun C, Young JAH, Chen M, Kim S, Woo Ahn K, Komanduri K, Lindemans C, Auletta JJ, Riches ML, et al. : Bloodstream infection due to vancomycin-resistant enterococcus is associated with increased mortality after hematopoietic cell transplantation for acute leukemia and myelodysplastic syndrome: a multicenter, retrospective cohort study. Clinical Infectious Diseases 69(10), 1771–1779 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A, et al. : Nonpermissive hla-dpb1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood, The Journal of the American Society of Hematology 124(16), 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Viña MF, Gratwohl A, et al. : Non-permissive-DPB1 mismatch among otherwise HLA-matched donor-recipient pairs results in increased overall mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation for hematologic malignancies. Blood 124, 2596–2606 (2014) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Prentice RL, Kalbfleisch JD Jr., A.V.P., Flournoy N, Farewell VT, Breslow NE: The analysis of failure times in the presence of competing risks. Biometrics 34, 541–554 (1978) [PubMed] [Google Scholar]
  • 29.Pugh MG, Robins J, Lipsitz S, Harrington D: Inference in the Cox proportional hazards model with missing covariate data. Ph.D. thesis, Harvard School of Public Health Boston, MA: (1993) [Google Scholar]
  • 30.Robins JM, Rotnitzky A, Zhao LP: Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association 89(427), 846–866 (1994) [Google Scholar]
  • 31.Rubin DB: Inference and missing data. Biometrika 63(3), 581–592 (1976) [Google Scholar]
  • 32.Sang H, Kwang Kim J: An approximate bayesian inference on propensity score estimation under unit nonresponse. Canadian Journal of Statistics (2017) [Google Scholar]
  • 33.Shah NN, Ahn KW, Litovich C, Sureda A, Kharfan-Dabaja MA, Awan FT, Ganguly S, Gergis U, Inwards D, Karmali R, et al. : Allogeneic transplantation in elderly patients ≥ 65 years with non-hodgkin lymphoma: a time-trend analysis. Blood cancer journal 9(12), 1–10 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sharef E, Strawderman R, Ruppert D, Cowen M, Halasyamani L: Bayesian adaptive B-spline estimation in proportional hazards frailty models. Electronic Journal of Statistics 4, 606–642 (2010) [Google Scholar]
  • 35.Soubeyrand S, Haon-Lasportes E: Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC. Statistics & Probability Letters 107, 84–92 (2015) [Google Scholar]
  • 36.Sun B, Tchetgen Tchetgen EJ: On inverse probability weighting for nonmonotone missing at random data. Journal of the American Statistical Association 113, 369–379 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ustun C, Kim S, Chen M, Beitinjaneh AM, Brown VI, Dahi PB, Daly A, Diaz MA, Freytes CO, Ganguly S, et al. Increased overall and bacterial infections following myeloablative allogeneic hct for patients with aml in cr1. Blood advances 3(17), 2525–2536 (2019) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Fernandez-Vina MA, Gajewski J, Pidala J, Munker R, et al. : HLA mismatch is associated with worse outcomes after unrelated donor reduced-intensity conditioning hematopoietic cell transplantation: an analysis from the Center for International Blood and Marrow Transplant Research. Biology of Blood and Marrow Transplantation 21(10), 1783–1789 (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang C, Chen HY: Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57(2), 414–419 (2001) [DOI] [PubMed] [Google Scholar]
  • 40.White IR, Royston P: Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15), 1982–1998 (2009) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Xu Q, Paik MC, Luo X, Tsai WY: Reweighting estimators for Cox regression with missing covariates. Journal of the American Statistical Association 104(487), 1155–1167 (2009) [Google Scholar]
  • 42.Yoo H, Lee JW: Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation. Communications for Statistical Applications and Methods 25(2), 159–172 (2018) [Google Scholar]
  • 43.Yuan KH, Jennrich RI: Asymptotics of estimating equations under natural conditions. Journal of Multivariate Analysis 65(2), 245–260 (1998) [Google Scholar]
  • 44.Zhou H, Pepe MS: Auxiliary covariate data in failure time regression. Biometrika 82(1), 139–149 (1995) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1782160_Sup_info

RESOURCES