Abstract
This paper considers generalized linear quantile regression for competing risks data when the failure type may be missing. Two estimation procedures for the regression co-efficients, including an inverse probability weighted complete-case estimator and an augmented inverse probability weighted estimator, are discussed under the assumption that the failure type is missing at random. The proposed estimation procedures utilize supplemental auxiliary variables for predicting the missing failure type and for informing its distribution. The asymptotic properties of the two estimators are derived and their asymptotic efficiencies are compared. We show that the augmented estimator is more efficient and possesses a double robustness property against misspecification of either the model for missingness or for the failure type. The asymptotic covariances are estimated using the local functional linearity of the estimating functions. The finite sample performance of the proposed estimation procedures are evaluated through a simulation study. The methods are applied to analyze the ‘Mashi’ trial data for investigating the effect of formula-versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants born to HIV-infected mothers in Botswana.
Keywords: Augmented inverse probability weighted, Auxiliary variables, Competing risks, Double robustness, Efficient estimator, Estimating equation, Inverse probability weighted, Local functional linearity, Logistic regression, Mashi trial, Missing at random, Quantile regression
1 Introduction
This paper was motivated by the need to analyze the ‘Mashi’ trial data (mashi means milk in Setswana) for examining the effect of formula- versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants born to HIV-infected mothers in Botswana (Thior et al. (2006)). Whereas studies including the Mashi trial have shown that formula-feeding increases the overall risk of death while breast-feeding increases the risk of transmitting HIV (Dunn et al. (1992); Beaudry, Dufour, and Marcoux (1995)), the effect of feeding strategy on death due to HIV infection is unknown. Accordingly, it is of interest to assess the treatment effect on HIV-related death, with HIV-unrelated death considered as a competing risk. This analysis provides additional insight over the analysis of all-cause death by addressing whether the known beneficial effect of formula-feeding to prevent HIV infection leads to a beneficial effect to reduce mortality of HIV infected infants.
Of the 111 live-born infants who died in the Mashi trial, the cause of death is known for 50 and missing for 61. It is well-known that the analysis of only cases with complete information may lead to inefficient and/or biased estimates. To account for the missingness, a number of methods have been developed to estimate the covariate effects under different survival models for the cause-specific hazard functions, for instance the proportional hazards model (Goetghebeur and Ryan (1995), Lu and Tsiatis (2001)), the linear transformation model (Gao and Tsiatis (2005)) and the additive hazard model (Lu and Liang (2008)), among others.
In this paper, we consider a quantile regression model (Koenker and Bassett (1978)) that is a valuable complement to the Cox proportional hazards model (Cox (1972)) and the accelerated failure time model (Buckley and James (1979); Koul, Susarla, Van Ryzin (1981)). Quantile regression allows the covariate effects to vary at different tails of the event time distribution. Such important heterogeneity in the population may be overlooked by using the Cox model or the accelerated failure time model. General quantile regression methods in survival analysis are developed under the assumption that censoring is independent of failure time; see Ying, Jung, and Wei (1995), Bang and Tsiatis (2002), Portnoy (2003), Neocleous, Vanden Branden, and Portnoy (2006), Peng and Huang (2008), Wang and Wang (2009), among others.
Most relevant to this work, Peng and Fine (2009) studied the quantile for the cumulative incidence function, which is the distribution of the time to failure due to a particular cause of interest. However, their method does not account for missingness of failure cause. Its application based on the complete-cases may be invalid and misleading because of the high percentage of missing causes of death in the Mashi data. We consider generalized linear quantile regression for competing risks data when causes of failure may be missing. Two estimation procedures are discussed under the assumption that the failure cause is missing at random (Rubin (1976)). The first, following the idea of Horvitz and Thompson (1952), uses inverse probability weighting (IPW) of complete-cases, which leverages auxiliary predictors of whether cause of failure is observed. The second approach, adapting the theory of Robins, Rotnitzky, and Zhao (1994), augments the IPW complete-case estimator with auxiliary predictors of the cause of failure of interest.
This work fits in the general area of competing risks failure time analysis, wherein subjects are followed over time and may fail from one of many causes. The competing risks failure time can be represented by the minimum of the latent failure times, each of which is defined as the time to failure from a particular cause in the absence of all other competing risks. The existing quantile regression methods could be applied within this framework by considering quantile regression of the latent failure time for a particular cause while treating other latent failure times as censoring and by assuming mutual independence of the latent failure times (Tsiatis (1975)). This independence mutual assumption is untestable and is often dubious (because we expect positive correlation of the latent failure times), however, and Peng and Fine (2009) took a different approach that avoids this assumption. In particular, they studied the cumulative incidence function, which is the distribution of the time to failure due to a particular cause in the presence of the other competing risks. This approach evaluates “crude” effects on the cause-specific cumulative incidence, and hence caution is needed in the interpretation of the results (Prentice et al. (1978)). This is the dominant approach for assessing competing risks data given the fundamental non-identifiability of the latent failure times, and the methods developed here take this approach.
The rest of the paper is organized as follows. Two procedures for estimating the regression coefficients are proposed in Section 2. The asymptotic properties of these estimators are derived and their asymptotic efficiencies are compared in Section 3. Procedures for estimating the asymptotic covariances are given in Section 4. The finite-sample performance of the proposed estimation procedures are evaluated in Section 5 through a simulation study. The methods are applied to the Mashi data in Section 6 for investigating the effect of formula-versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants. All proofs are in Section 7.
2 Estimation procedures
2.1 Model descriptions and assumptions
Let T be the survival time of interest. Due to censoring, we only observe (X, δ), where X = min(T, C), δ = I(T ≤ C) and C is the censoring variable. Let J denote the failure type associated with the uncensored failure time T. The J is meaningless and undefined if T is censored. For convenience we let Z be the (p + 1)-dimensional concomitant variable including 1 as its first component corresponding to an intercept. A typical right-censored competing risks data set consists of independent and identically distributed (i.i.d.) copies (Xi, δi, δiJi, Zi), i = 1, …, n, of (X, δ, δ J, Z).
We consider J = 1 as the failure type of interest and set J = 2 for all other failure types. The type-1 cumulative incidence function is F1(t|Z) = P (T ≤ t, J = 1|Z), which represents the conditional probability of observing a type-1 failure by time t given the covariate Z. The τth type-1 conditional quantile given Z = z is defined as . Let ν be the end of follow-up time and satisfying the condition C1 given in the Appendix. For identifiability, we require that τ ≤ τ0 where τ0 = infz P (T ≤ ν, J = 1 | z). For τ ∈ [τL, τU] with 0 < τL ≤ τU < τ0, the τth generalized linear quantile regression is
| (2.1) |
where g(·) is a known monotone link function and β(τ) is a (p + 1)-dimensional coefficient vector depending on τ.
To help with understanding model (2.1) and the interpretation of β(τ), we consider the following scenario. Suppose Zi = (1, Zi,1), where Zi,1 is the indicator of gender for subject i, say Zi,1 = 1 for male and Zi,1 = 0 for female, and Ti is the time (age in years) to death. Suppose β(τ) = (β0(τ), β1(τ)), with β0(τ) = 70 and β1(τ) = −5 at τ = 0.3. Thus, conditional on gender, the age by which 30% of the population dies of type-1 failure is 70 − 5Zi,1 under the identity link function. That is, 30% of females die from type-1 failure before age 70 and 30% of males die from type-1 failure before age 65. The gender effect is 5 years – the age at which 30% of individuals die from type-1 failure is 5 years sooner for males than for females.
Let G(t|Zi) = P (Ci ≥ t|Zi) and let Ĝ(t|Zi) be a semiparametric or nonparametric consistent estimator of G(t|Zi). Peng and Fine (2009) proposed the following estimating equation for β(τ) based on fully observed competing risks data {Xi, Zi, δi, δi Ji, i = 1 …, n}:
| (2.2) |
In this paper we consider the quantile regression (2.1) based on the competing risks data with possibly missing failure type. Let Ri be the complete-case indicator: Ri = 1 either if δi = 0 or if δi = 1 and Ji is observed; and Ri = 0 otherwise. Auxiliary variables Ai may be helpful for predicting the missing failure type. Since the failure type is defined only for those who are observed to fail, only supplemental information for the observed failures are potentially useful for predicting missingness and for informing about the distribution of the failure type. As such, we denote available auxiliaries by δiAi.
We assume the censoring time Ci is conditionally independent of (Ti, Ji) given Zi. We also assume the failure type Ji is missing at random (Rubin (1976)); that is, given δi = 1 and Wi = (Ti, Zi, Ai), the probability that the failure type Ji is missing depends only on the observed Wi, not on the value of Ji; this assumption is expressed as
| (2.3) |
Let π(Qi) = P (Ri = 1|Qi), where Qi = (Wi, δi). Then
| (2.4) |
The observed data can be summarized as Oi = {Xi, Zi, δi, Ri, RiδiJi, δiAi}, i = 1 …, n. We assume that Oi’s are independent and identically distributed.
2.2 Inverse probability weighted estimator
First, following the idea of Horvitz and Thompson (1952), we propose a procedure for estimating β(τ) that uses inverse probability weighting (IPW) of complete-cases. We consider a parametric model r(Wi, ψ) for r(Wi) = P (Ri = 1|δi = 1, Wi), where ψ is an unknown vector of finite-dimensional parameters. For example, r(Wi, ψ) may be a logistic regression model with . The parameter ψ can be estimated by ψ̂ the maximizer of the observed-data likelihood
| (2.5) |
Therefore, we can estimate π(Qi, ψ) by π̂ (Qi) = π(Qi, ψ̂) = δir̂ (Wi) + (1 − δi) where r̂ (Wi) = r(Wi, ψ̂).
Modifying (2.2) to accommodate missing failure types leads to the IPW estimating equation for β(τ):
| (2.6) |
We can write , where ϑ̂1,i = RiδiI(Ji = 1)/{π̂ (Qi)Ĝ (Ti|Zi)}. We refer to the solution of (2.6) as the IPW estimator, denoted by β̂I(τ).
2.3 Augmented inverse probability weighted estimator
Because the IPW estimator obtained by solving (2.6) uses data from complete cases only, it is inefficient, and it is asymptotically consistent only if the missingness probability π(Wi, ψ) is correctly modeled. Adapting the theory of Robins, Rotnitzky, and Zhao (1994) to gain more efficiency and robustness against the misspecification of π(Wi, ψ), we propose an improved estimation procedure that augments the IPW complete-case estimator with auxiliary predictors of the failure type of interest.
Let ρ(Wi) = P (Ji = 1|δi = 1, Wi). The missing at random assumption (2.3) implies that Ji is independent of Ri given Qi:
| (2.7) |
Let ρ(Wi, φ) be a parametric model for ρ(Wi), where φ is a vector of unknown parameters. From (2.7), it follows that ρ(Wi) can be estimated from the complete cases with Ri = 1 and δi = 1. The maximum likelihood estimator of φ, φ̂, can be obtained by maximizing the likelihood
| (2.8) |
Denote ρ(Wi, φ̂) by ρ̂ (Wi). We consider the augmented IPW estimating equation for β(τ):
| (2.9) |
where
Let ϑ̂2,i = RiδiI(Ji = 1)/{π̂(Qi)Ĝ(Ti|Zi)} + δi[1 − {π̂(Qi)}−1 Ri] ρ̂(Wi)/Ĝ(Ti|Zi). Then . The solution to the augmented IPW estimating equation (2.9) is referred to as the AIPW estimator and denoted by β̂A(τ).
Replacing the estimates Ĝ (·), π̂ (·), and ρ̂ (·) in the estimating function S2,n(b, τ) by their estimands G(·), π(·), and ρ(·), we have E[S2,n{β(τ), τ}] = 0 if MAR holds and if one of the parametric models, r(Wi, ψ) and ρ(Wi, ψ), is correctly specified. In fact, under MAR (2.3) and consequently (2.7), E[S2,n{β(τ), τ}] = E (E [S2,n {β(τ), τ}|Qi, Ji]) = 0 if r(Wi, ψ) is correctly specified, and E [S2,n {β(τ), τ}] = E (E [S2,n {β(τ), τ}|Qi, δi = 1]) = 0 if ρ(Wi, ψ) is correctly specified. This leads to the double robustness property of the AIPW estimator that β̂A(τ) is consistent for β(τ) provided that at least one of r̂ (·) and ρ̂ (·) is a consistent estimator for r(·) and ρ(·). The missing at random assumption MAR is essential for r(Wi) and ρ(Wi) to be identifiable. Violation of MAR may result in inconsistent estimation of both r(·) and ρ(·), and thus render both the IPW and AIPW estimators inconsistent. This property is further demonstrated in our simulation study in Section 5.
The augmented estimating equation (2.9) follows the ideas of Robins, Rotnitzky and Zhao (1994) for efficient augmentation, whereas (Xi, δi, δiJi, Zi) is considered as the full data and the full data estimating equation is (2.2), as given by Peng and Fine (2009). It is interesting to note that Peng and Fine’s estimating equation (2.2) is, in turn, based on the inverse probability weighting for censoring of the estimating equation for the full data (Ti, Ji, Zi), while the observed data in their case is (Xi, δi, δiJi, Zi). It would be desirable to improve the efficiency of the Peng and Fine (2009) estimator with augmentation. By Robins, Rotnitzky, and Zhao (1994), the efficient augmentation of (2.2) requires the estimation of the conditional expectation , which is unobtainable since the conditional distribution of (Ti, Ji = 1) given (Ci, δi = 0, Zi) is not identifiable based on the observed competing risks data. Its implementation would require some untestable and perhaps unreasonable/conflicting assumptions, such as independence of (Ti, Ji = 1) and (Ci, δi = 0) given Zi.
The numerical procedure for solving equation (2.9) is equivalent to locating the minimizer of the function:
| (2.10) |
where M is a large positive number. Equivalency is due to the fact that U2,n(b, τ) is a convex function in b and its derivative is 2n1/2 S2,n(b, τ) when M exceeds and for all b within the compact parameter set of β(τ). Under (2.1), . It is necessary that holds for all i for some parameter vector b for β(τ) to be identifiable. Hence and . We further notice that |ϑ̂2,i| ≤ 3/{π̂ (Qi)Ĝ(Ti|Zi)}. For most practical applications, where the missingness probabilities are less than 0.9 and fewer than 90% of subjects are censored, it is reasonable to assume that . Then it suffices to take M ≥ 300n max1≤i≤n |g−1 (Xi)|. One can use a number greater than 300 in the lower bound for M in more extreme situations. Similarly, the estimating equation (2.6) can be solved by minimizing (2.10) with ϑ̂2,i replaced by ϑ̂1,i, and the same choice of M can be used in the minimization.
3 Asymptotic properties
Throughout the rest of the paper, we assume the censoring distribution does not depend on the covariates, i.e., G(t|Zi) = G(t), and use the Kaplan-Meier estimator Ĝ (t) to estimate G(t). The independence assumption for Ci and Zi can be relaxed, in which case the conditional Kaplan-Meier estimator (Beran (1981)) can be used to estimate G(t|Zi), and the asymptotic distributions for β̂I(τ) and β̂A(τ) need to be modified to accommodate the additional variations. This section derives the uniform consistency and weak convergence of the proposed estimators β̂I(τ) and β̂A(τ), for τ over the interval [τL, τU], under the conditions C1–C5 given in the Appendix. It also compares the asymptotic efficiency of the two estimators.
Under C5, n1/2(ψ̂−ψ) and n1/2(φ̂−φ) are asymptotically linear with influence functions ηi and ζi, respectively, such that
| (3.1) |
| (3.2) |
where {(ηi, ζi), i = 1, …, n} are i.i.d. random variables with Eηi = 0 and Eζi = 0. Under the logistic regression model for r(Wi, ψ), we can write , where Iψ is the asymptotic information matrix of the likelihood function (2.5).
Let , Yi(t) = I(Xi ≥ t), y(t) = P (Xi ≥ t) and , where λG(t) is the hazard function for the censoring variable C. Let . Under MAR and the independent censoring assumption, it is easy to see that . Let , and . Define ξ1,i(τ) = a1,i(τ) + bi(τ) + ci(τ) and ξ2,i(τ) = a2,i(τ) + bi(τ). Let β(τ) be the true regression coefficient at τ.
Theorem 3.1
Under C1–C5, given in the Appendix, we have limn→∞ supτ∈[τL, τU]||β̂I(τ) − β(τ)|| = 0 and limn→∞ supτ∈[τL,τU]||β̂A(τ) − β(τ)|| = 0 in probability, where ||·|| is the Euclidean norm.
We show in the Appendix that the asymptotic approximations hold for the IPW estimator and the AIPW estimator uniformly in τ ∈ [τL, τU] in probability:
| (3.3) |
| (3.4) |
where and f1(t|z) = ∂F1(t|z)/∂t. The approximations (3.3) and (3.4) lead to the following asymptotic results.
Theorem 3.2
Under C1–C5, given in the Appendix, we have
both n1/2{β̂I(τ) − β(τ)} and n1/2{β̂A(τ) − β(τ)} converge weakly to mean zero Guassian processes with covariance matrices Φ1(τ′, τ) = [A{β(τ′)}]−1Σ1(τ′, τ) [A{β(τ)}]−1 and Φ2(τ′, τ) = [A{β(τ′)}]−1Σ2(τ′, τ) [A{β(τ)}]−1 for τ, τ′ ∈ [τL, τU], respectively, where and ;
the AIPW estimator β̂A(τ) is more efficient than the IPW estimator β̂I(τ) with Σ1(τ′, τ) ≥ Σ2(τ′, τ).
4 Estimation of the covariance matrices
In quantile regression, the estimating functions are not smooth and the asymptotic covariances for the estimators of the regression coefficients involve a subdensity function, which poses difficulties for the estimation of the covariances. Huang (2002) proposed a novel variance estimation procedure for a calibration regression model using the local functional linearity of the estimating functions. Peng and Fine (2009) generalized this technique to the competing risks setting. Our estimators of the asymptotic covariances are constructed following the exposition of Peng and Fine (2009).
First we derive an estimator for Σ1(τ, τ). It is shown in the Appendix that
| (4.1) |
Let (Îψ)−1 be the estimator of the variance of ψ̂ and let . Based on (4.1), Σ1(τ, τ) can be consistently estimated by
| (4.2) |
Next, since ξ2,i(τ) = a2,i(τ)+ bi(τ), with similar arguments to the proof of (4.1) we obtain
| (4.3) |
Thus Σ2(τ, τ) can be consistently estimated by
| (4.4) |
The estimation of the covariance Φ1(τ′, τ) of n1/2{β̂I(τ′) − β(τ′)} and n1/2{β̂I(τ) − β(τ)} is outlined as follows.
Find a symmetric and nonsingular (p+1)×(p+1) matrix En(τ) ≡ {en,1(τ), …, en,p+1(τ)} such that Σ̂1(τ, τ) = {En(τ)}2.
- Calculate Dn(τ) = ([S1,n{en,1(τ), τ}]−1 − β̂I(τ), …, [S1,n{en,p+1(τ), τ}]−1 − β̂I(τ)), where {S1,n(e, τ)}−1 is the solution to S1,n(b, τ) − e = 0. Similar to (2.6) in Section 2.2, S1,n(b, τ) − e = 0 can be solved by minimizing
Estimate Φ1(τ′, τ) by Φ̂1(τ′, τ) = nDn(τ′){En(τ′)}−1 Σ′1(τ′, τ){En(τ)}−1 Dn(τ). In the special case of τ′ = τ, Φ̂1(τ, τ) = n{Dn(τ)}⊗2.
The estimation of the covariance Φ2(τ′, τ) of n1/2{β̂A(τ′) − β(τ′)} and n1/2{β̂A(τ) − β(τ)} follows the same procedure as above by replacing Σ̂1(τ′, τ) with Σ̂2(τ′, τ) and S1,n(e, τ) with S2,n(e, τ). The proof of the consistency of the variance estimators is similar to that in Peng and Fine (2009), and thus is omitted.
5 Simulation study
5.1 Assessment of estimation under correctly specified models
The simulation study examines finite-sample performance of the IPW estimator and the AIPW estimator, along with the omniscient estimator (Omni) that assumes complete knowledge of Ji for uncensored failure times, and the complete-case estimator (CC) that deletes observations with missing causes. The Omni and CC estimators are computed via Peng and Fine’s (2009) method.
Let Zi = (1, Zi,1, Zi,2), where Zi,1 is a uniform random variable on (0, 1) and Zi,2 is Bernoulli with probability of success equal to 0.5. The failure type Ji takes values of 1 and 2 with P(Ji = 1|Zi) = p0I(Zi,2 = 0) + p1I(Zi,2 = 1). The failure time Ti follows the conditional distributions P(Ti < t|Ji = 1, Zi) = Φ(log t − γTZi) and P (Ti < t|J = 2, Zi) = Φ(log t − αTZi), where Φ(·) denotes the cumulative distribution function of N (0, 1), γ = (γ0, γ1, γ2), and α = (α0, α1, α2). With this set-up, the underlying τth conditional quantile of Ti is
| (5.1) |
where β0(τ) = γ0 + Φ−1 (τ/p0), β1(τ) = γ1, and β2(τ) = γ2 + Φ−1(τ/p1) − Φ−1(τ/p0). The covariate Zi,2 has a varying effect on the cumulative incidence quantiles across different quantile levels, whereas Zi,1 has a constant effect.
Let the censoring time Ci follow a uniform distribution on (0, 8). We generated the missing failure type indicator Ri from the logistic model: , where Wi = (1, Zi,1, Zi,2, Xi, Ai)T, Xi = min(Ti, Ci), and Ai is a univariate auxiliary variable. The values ψ = (1, −0.9, −1, 2, 0)T and ψ = (1, −1.4, −1.5, 1, 0)T correspond to 20% and 40% missing failure types, respectively. Here we chose not to include Ai in the missingness model so that we could compare the IPW and AIPW estimators under different degrees of association between Ai and Ji, holding the degree of missingness fixed at the same rate. This set-up suggests that a stronger association between Ai and Ji yields a more efficient AIPW estimator under the same level of missingness.
We consider three different levels of association between Ai and Ji, which correspond to three different choices of ρ(Wi) for the AIPW estimator. In Case 1, the auxiliary variable Ai is independent of failure type Ji given Zi. For Cases 2 and 3, we let
where 0 ≤ θ ≤ 1. Case 2 corresponds to θ = 0.8 and Case 3 corresponds to θ = 0.95. A larger value of θ indicates stronger positive association between Ai and Ji given Zi. This model set-up results in a logistic regression model for ρ(Wi) with logit{ρ(Wi)} = φ0 + φ1Zi,1 + φ2Zi2 + φ3Ai. For Case 1, φ1 = φ3 = 0, φ0 = log{p0/(1 − p0)} and φ2 = log [p1(1 − p0)/{p0(1 − p1)}]. For Cases 2 and 3, φ0 = 3 log{θ/(1 − θ)}, φ1 = 0, φ2 = log[p1(1 − p0)/{p0(1 − p1)}] and φ3 = 2 log{(1 − θ)/θ}.
We set p0 = 0.8, p1 = 0.6, γ = (0, 0.5, −0.5)T, and α = (0, 0, −0.5)T. Under this setting, on average 55% of the subjects fail from type-1 failure, 25% fail from type-2 failure, and the remaining 20% are right-censored. The performances of the four estimators, Omni, CC, IPW and AIPW, for β(τ) at τ = 0.2 and 0.4 with sample sizes n = 200 and n = 500 and two missing-causes percentages are summarized in Tables 1–4. The tables report the bias, empirical standard deviation, mean estimated standard deviation, and empirical coverage probability of 95% Wald-type confidence intervals based on 500 simulated data sets.
Table 1.
The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.2, based on 500 simulated data sets with 20% missing causes.
| Method | Bias×103
|
EmpSD× 103
|
EstSD×103
|
CovP×102
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| n = 200 | ||||||||||||
| Omni | 5 | −24 | 10 | 274 | 438 | 245 | 304 | 486 | 277 | 92.4 | 92.0 | 92.6 |
| CC | 68 | 71 | 184 | 288 | 477 | 265 | 301 | 490 | 296 | 91.0 | 90.8 | 87.8 |
| IPW | −9 | −5 | 21 | 337 | 575 | 307 | 341 | 574 | 334 | 92.0 | 90.0 | 93.6 |
| AIPW (Case 1) | −8 | 0 | 24 | 316 | 531 | 281 | 334 | 547 | 314 | 91.0 | 90.4 | 92.6 |
| AIPW (Case 2) | −10 | −2 | 22 | 297 | 497 | 269 | 308 | 521 | 297 | 92.2 | 93.0 | 94.6 |
| AIPW (Case 3) | −1 | −21 | 17 | 284 | 461 | 246 | 303 | 499 | 280 | 91.6 | 91.4 | 94.2 |
|
|
|
|
|
|||||||||
| n = 500 | ||||||||||||
| Omni | 7 | −7 | 2 | 173 | 278 | 156 | 170 | 293 | 166 | 91.8 | 93.4 | 93.6 |
| CC | 83 | 64 | 163 | 184 | 297 | 165 | 178 | 293 | 174 | 87.0 | 91.2 | 79.8 |
| IPW | −2 | 8 | −5 | 196 | 338 | 192 | 207 | 347 | 203 | 91.6 | 91.2 | 92.0 |
| AIPW (Case 1) | 4 | 0 | 1 | 188 | 314 | 182 | 198 | 333 | 189 | 92.6 | 93.2 | 91.2 |
| AIPW (Case 2) | 2 | 3 | 2 | 181 | 304 | 171 | 194 | 314 | 182 | 93.8 | 93.2 | 93.4 |
| AIPW (Case 3) | 4 | 1 | 5 | 178 | 285 | 162 | 178 | 290 | 171 | 91.0 | 92.0 | 93.0 |
Table 4.
The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.4, based on 500 simulated data sets with 40% missing causes.
| Method | Bias×103
|
EmpSD×103
|
EstSD×103
|
CovP×102
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| n = 200 | ||||||||||||
| Omni | 6 | 15 | 8 | 307 | 514 | 308 | 343 | 566 | 338 | 92.4 | 93.2 | 93.6 |
| CC | 43 | 211 | 395 | 347 | 612 | 432 | 388 | 671 | 492 | 91.6 | 90.6 | 84.4 |
| IPW | −13 | 57 | 49 | 467 | 840 | 543 | 489 | 866 | 593 | 93.8 | 94.4 | 93.4 |
| AIPW (Case 1) | 22 | −4 | 52 | 420 | 766 | 505 | 438 | 786 | 508 | 94.6 | 95.0 | 93.0 |
| AIPW (Case 2) | 15 | −4 | 24 | 394 | 667 | 416 | 402 | 676 | 435 | 93.0 | 93.4 | 92.8 |
| AIPW (Case 3) | −1 | 11 | 19 | 338 | 539 | 364 | 341 | 594 | 354 | 92.0 | 95.4 | 91.8 |
|
|
|
|
|
|||||||||
| n = 500 | ||||||||||||
| Omni | 11 | −20 | 18 | 199 | 323 | 193 | 196 | 326 | 199 | 91.8 | 93.0 | 94.2 |
| CC | 53 | 174 | 389 | 212 | 367 | 264 | 227 | 382 | 274 | 91.4 | 90.8 | 72.8 |
| IPW | 0 | 14 | 26 | 267 | 494 | 310 | 276 | 494 | 322 | 92.2 | 92.0 | 93.8 |
| AIPW (Case 1) | 3 | −4 | 42 | 245 | 443 | 284 | 254 | 450 | 293 | 94.4 | 95.6 | 93.8 |
| AIPW (Case 2) | 11 | −21 | 33 | 234 | 410 | 257 | 237 | 413 | 269 | 92.6 | 93.8 | 95.4 |
| AIPW (Case 3) | 8 | −19 | 27 | 207 | 352 | 212 | 212 | 355 | 218 | 92.2 | 94.2 | 95.0 |
Note that the choices of ρ(Wi) do not change the IPW estimator. Only the results for the AIPW estimator are reported for Cases 1–3. The CC estimator had substantial bias for all scenarios. The IPW and AIPW estimators performed comparably to the Omni estimator with very small biases. In addition, the estimated standard deviations matched very well with the empirical ones, and the 95% confidence intervals had reasonable coverage probabilities, except for the CC estimator.
For the analysis of Mashi data presented in the next section, small values of τ = 0.005, 0.01, and 0.02 were considered due to small percentages of HIV-related and HIV-unrelated deaths. Furthermore, the Mashi analysis had a larger sample size. To mimic Mashi, additional simulations at τ = 0.01 with n = 1200 were conducted. The results, reported in Table S.1 of the Supplementary Material, show that the biases of the AIPW estimator remain small under 20% and 40% of missing causes. The biases of the IPW estimator are also small under 20% of missing causes. At 40% of missing causes, the biases of the IPW estimator are large compared to those for the AIPW estimator, but these biases for the slope coefficients are still smaller than those of the CC estimator.
Table 5 shows the Pitman relative efficiencies (ratios of variances) for the IPW and AIPW estimators with respect to the Omni estimator. By incorporating information from the missing failure types, AIPW improved efficiency over IPW, with greater improvement when there was a stronger association between the auxiliary variable Ai and Ji. For Case 3 with n = 500, the efficiencies of AIPW were comparable to those of the Omni estimator.
Table 5.
Pitman relative efficiencies of the IPW and AIPW estimators with respect to the Omni estimator based on 500 simulated data sets. The maximum standard error of the relative efficiencies is 0.07. MP stands for the missingness proportion of failure causes.
| n | IPW
|
AIPW(Case 1)
|
AIPW(Case 2)
|
AIPW(Case 3)
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| MP=20%, τ = 0.2 | ||||||||||||
|
|
|
|
|
|||||||||
| 200 | 0.66 | 0.58 | 0.64 | 0.75 | 0.68 | 0.76 | 0.85 | 0.78 | 0.83 | 0.93 | 0.90 | 0.99 |
| 500 | 0.78 | 0.68 | 0.66 | 0.85 | 0.79 | 0.74 | 0.91 | 0.84 | 0.84 | 0.95 | 0.95 | 0.93 |
|
|
|
|
|
|||||||||
| MP=20%, τ = 0.4 | ||||||||||||
| 200 | 0.74 | 0.67 | 0.75 | 0.81 | 0.74 | 0.76 | 0.86 | 0.81 | 0.78 | 0.95 | 0.93 | 0.92 |
| 500 | 0.83 | 0.79 | 0.79 | 0.87 | 0.84 | 0.76 | 0.91 | 0.88 | 0.86 | 0.94 | 0.93 | 0.96 |
|
|
|
|
|
|||||||||
| MP=40%, τ = 0.2 | ||||||||||||
| 200 | 0.41 | 0.33 | 0.32 | 0.57 | 0.49 | 0.44 | 0.65 | 0.58 | 0.55 | 0.92 | 0.84 | 0.85 |
| 500 | 0.46 | 0.36 | 0.37 | 0.62 | 0.49 | 0.49 | 0.70 | 0.58 | 0.60 | 0.97 | 0.90 | 0.85 |
|
|
|
|
|
|||||||||
| MP=40%, τ = 0.4 | ||||||||||||
| 200 | 0.43 | 0.37 | 0.32 | 0.53 | 0.45 | 0.37 | 0.61 | 0.59 | 0.55 | 0.83 | 0.91 | 0.72 |
| 500 | 0.56 | 0.43 | 0.39 | 0.66 | 0.53 | 0.46 | 0.72 | 0.62 | 0.56 | 0.92 | 0.84 | 0.82 |
5.2 On robustness of estimation
To assess how sensitive the proposed methods are to model misspecifications for r(Wi) and/or ρ(Wi), and to violations of the missing at random assumption, we consider four additional cases, namely, Cases 4–7. In Case 4, instead of a logistic model we generated the missing failure type indicator Ri from the probit model: , where ψ = (1, −0.9, −1.4, 2, 0)T and Wi = (1, Zi,1, Zi,2, Xi, Ai)T; whereas both the IPW and AIPW estimators still use logistic regression to estimate r(Wi), and excluding Xi from Wi. Case 5 has the same design as Case 2, and Case 6 has the same design as Case 4. In both Cases 5 and 6, ρ(Wi) is estimated by excluding the important variable Ai in the logistic regression. Therefore, r(Wi) is misspecified in Case 4, ρ(Wi) is misspecified in Case 5, and both models are misspecified in Case 6. In Case 7, we generated the missing failure type indicator Ri from the logistic model: , where Wi = (1, Zi,1, Zi,2, Xi, Ai)T and ψ = (2.5, −0.9, −1, 2, 0)T. Since the probability of missingness depends on the unobserved failure type Ji, the missing at random assumption is violated in Case 7. In all four cases, the missing-cause proportion is 20%.
Table 6 reports the bias, empirical standard deviation, mean estimated standard deviation, and empirical coverage probability of 95% Wald-type confidence intervals for the Omni, CC, IPW, and AIPW estimators based on 500 simulated data sets for Cases 4–7 at τ = 0.2 and with n = 500. The presentations of the summaries for different τ values are given in Figures S.1–6 of the Supplementary Material. When r(Wi) was misspecified, the IPW estimator performed similar to the CC estimator, both having large biases for estimating β2(τ). As expected from its double robustness property, the AIPW estimator performed well in Cases 4–5 when one of the two models for r(Wi) and ρ(Wi) was misspecified. Since the IPW estimator does not utilize ρ(Wi), there is no misspecification for the IPW estimator under Case 5. When both models were misspecified in Case 6, the AIPW estimator had slightly larger biases than in Case 4, but still outperformed the CC estimator, in particular, for β1(τ) and β2(τ). Since both the IPW and AIPW estimators are developed based on the MAR assumption, it is no surprise that the IPW and AIPW estimators showed no improvement over the CC estimator in Case 7.
Table 6.
Method robustness. The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.2 with n = 500, based on 500 simulated data sets with 20% missing causes.
| Method | Bias×103
|
EmpSD×103
|
EstSD×103
|
CovP×102
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| Case 4: r(w) is misspecified | ||||||||||||
| Omni | 4 | 1 | 3 | 169 | 275 | 161 | 177 | 291 | 166 | 92.8 | 92.0 | 93.8 |
| CC | 10 | 92 | 333 | 166 | 274 | 157 | 169 | 281 | 162 | 92.8 | 91.2 | 50.0 |
| IPW | 18 | 73 | 294 | 171 | 278 | 159 | 170 | 286 | 160 | 92.2 | 91.2 | 54.2 |
| AIPW | 26 | −51 | −34 | 173 | 284 | 160 | 176 | 281 | 166 | 92.4 | 90.6 | 93.6 |
|
|
|
|
|
|||||||||
| Case 5: ρ(w) is misspecified | ||||||||||||
| Omni | 7 | −7 | 2 | 173 | 278 | 156 | 170 | 293 | 166 | 91.8 | 93.4 | 93.6 |
| CC | 83 | 64 | 163 | 184 | 297 | 165 | 178 | 293 | 174 | 87.0 | 91.2 | 79.8 |
| IPW | −2 | 8 | −5 | 196 | 338 | 192 | 207 | 347 | 203 | 91.6 | 91.2 | 92.0 |
| AIPW | 6 | −5 | −3 | 188 | 315 | 179 | 193 | 334 | 193 | 91.6 | 93.0 | 93.4 |
|
|
|
|
|
|||||||||
| Case 6: both r(w) and ρ(w) are misspecified | ||||||||||||
| Omni | 4 | 1 | 3 | 169 | 275 | 161 | 177 | 291 | 166 | 92.8 | 92.0 | 93.8 |
| CC | 10 | 92 | 333 | 166 | 274 | 157 | 169 | 281 | 162 | 92.8 | 91.2 | 50.0 |
| IPW | 18 | 73 | 294 | 171 | 278 | 159 | 170 | 286 | 160 | 92.2 | 91.2 | 54.2 |
| AIPW | 35 | −72 | −46 | 173 | 287 | 162 | 175 | 285 | 167 | 92.8 | 91.6 | 92.6 |
|
|
|
|
|
|||||||||
| Case 7: missing-at-random assumption is violated | ||||||||||||
| Omni | 7 | −7 | 2 | 173 | 278 | 156 | 170 | 293 | 166 | 91.8 | 93.4 | 93.6 |
| CC | 45 | 33 | 58 | 179 | 293 | 162 | 179 | 297 | 171 | 91.8 | 93.6 | 91.4 |
| IPW | −5 | −41 | −95 | 191 | 328 | 179 | 196 | 331 | 188 | 91.0 | 90.4 | 89.6 |
| AIPW | −7 | −42 | −85 | 180 | 300 | 167 | 184 | 295 | 170 | 92.0 | 91.2 | 89.4 |
6 Analysis of the Mashi data
The Mashi trial investigated the effect of formula- versus breast-feeding plus extended infant zidovudine prophylaxis among HIV-infected expecting mothers in Botswana (Thior et al. (2006)). Five-hundred and ninety-one women were randomized to formula feeding from birth plus 1 month of infant zidovudine (FF), and 588 women were randomized to breast-feeding from birth plus 6 months of infant zidovudine (BF+AZT). Live first-born infants were followed for 18-months for occurrence of the two primary endpoints, HIV infection and death. HIV-PCR tests were administered at visits at birth and at monthly ages of 1, 2, 3, 4, 5, 6, 7, 9, 12, 18 (with little missing data). The primary objectives assessed the treatment effect on these endpoints separately, as well as on the composite endpoint defined as the first event of HIV infection or death. A secondary objective was to assess the treatment effect on death due to HIV infection, which we refer to as HIV-related death. We apply the methods above to assess with J = 1 HIV-related death and J = 2 HIV-unrelated death.
We take a death to be HIV-related if either (1) the study clinicians deemed the death HIV-1 related (n = 4 deaths), or (2) the infant had at least one positive test result from the PCR assay used to test for HIV infection prior to death (n = 24 deaths). In addition, we take a death to be HIV-unrelated if the study clinician deemed the death unrelated to HIV/AIDS (n = 22 deaths). Of the 111 live-born infants who died, the cause of death is known in 50 cases and missing for 61.
Considering 20 covariates of the babies or their mothers, we used logistic regression and all-subsets model selection (with criterion Mallow’s Cp) to select a model for predicting among cases whether J was observed. The model included the following variables (estimated regression coefficient): the infant had birthweight < 2.5 kilograms (1.21); the randomization assignment of mom/baby to receive Placebo/Placebo was switched to Placebo/Nevirapine part-way during the trial due to a DSMB recommendation (−1.27); the infant had AZT toxicity (1.43); log 10 plasma viral load level of the mom at delivery (0.98); and the baby was hospitalized with a serious adverse event (−1.20). Using the same model selection strategy for analyzing cases with known death-cause, the following variables were included in the model for predicting J = 1: the infant received HAART (2.42), and log 10 plasma viral load level of the mom at delivery (1.70).
For assessing the treatment effect of BF+AZT versus FF we used the identity link function. The covariate of interest is Z = c(1, Z1), where Z1 is 1 for mother-infant pairs assigned BF+AZT and 0 for FF. The estimation of the quantile is invariant to the link function in this particular case, but the estimated values of the coefficients β0(τ) and β1(τ) can be different for different link functions. With the identity link, β1(τ) represents the treatment effect on the τth type-1 quantile. Let X be the survival time in days. According to the above logistic regressions, we let W include the variables that proved informative for P(R = 1|δ = 1, W) and/or for the probability of HIV-related death P(J = 1|δ = 1, W). We considered the subset of data with complete covariate information, that includes 1123 live-born infants (of the 1193 total), among whom 107 died and 49 died with known cause of death (28 are HIV-related). Based on the data, about 2.5% of infants died while known to be HIV infected (J = 1), and 54.2% of the infants who died had missing death cause.
We performed the quantile regression at τ = 0.005, 0.01, 0.02 and 0.03. The analysis at τ = 0.005 is interesting because it concerns early death and there were many early deaths in the data set. Table 7 summarizes the analysis results using the IPW and AIPW methods. From Table 7, by the AIPW method, the p-values for testing the treatment effect at τ = (0.005, 0.01, 0.02, 0.03) were (0.138, 0.042, 0.062, 0.52), respectively. The results indicate that BF+AZT had some positive effect in postponing/reducing HIV-related deaths compared to FF at the quantiles corresponding to τ = 0.01 and 0.02. Using the AIPW method, the HIV-related death rate reached 1% by 184 days for those assigned to BF+AZT, while it reached 1% by 64 days for those assigned FF. In addition, it reached 2% by 276 days for BF+AZT and 113 days for FF. This analysis suggests that it takes longer for the BF+AZT group to reach the same percentage of HIV-related deaths than the FF group, by 120–163 days. The estimated treatment effect using the AIPW estimator decreased at τ = 0.03, and the standard error increased because of the small number of deaths after the 0.03 quantile. The estimated treatment effect was also small at τ = 0.005. The IPW method did not identify a significant treatment effect at any of the quantile levels evaluated. This is attributed to the limited number of deaths and the high percentage of unknown death causes among those who died, and the AIPW method was able to recover some of the lost information by modeling the probability of HIV-related death under the missing at random assumption (2.3). The large differences in the IPW and AIPW estimation of the treatment effect at τ = 0.005 and 0.01 in Table 7 reflect the fact that the IPW estimation is not numerically stable.
Table 7.
Analysis of the Mashi data with the IPW and AIPW methods.
| IPW | AIPW | Rel. Efficiency IPW vs AIPW | |||||
|---|---|---|---|---|---|---|---|
| Coef Est. | S.E. | p-value | Coef Est. | S.E. | p-value | ||
| τ = 0.005 | |||||||
| Intercept | 52.0 | 19.2 | 0.007 | 37.0 | 22.1 | 0.09 | 1.32 |
| Treatment | 38.0 | 61.7 | 0.54 | 102.0 | 68.8 | 0.14 | 1.24 |
| τ = 0.01 | |||||||
| Intercept | 64.0 | 31.4 | 0.04 | 64.0 | 33.6 | 0.06 | 1.15 |
| Treatment | 66.0 | 108.9 | 0.54 | 120.0 | 59.1 | 0.04 | 0.29 |
| τ = 0.02 | |||||||
| Intercept | 94.0 | 113.0 | 0.41 | 113.0 | 95.9 | 0.24 | 0.72 |
| Treatment | 182.0 | 107.3 | 0.09 | 163.0 | 87.5 | 0.06 | 0.66 |
| τ = 0.03 | |||||||
| Intercept | 207.0 | 164.5 | 0.21 | 214.0 | 147.0 | 0.15 | 0.80 |
| Treatment | 91.0 | 186.7 | 0.63 | 84.0 | 132.1 | 0.52 | 0.50 |
The difference in the performances of the AIPW and IPW estimators for the Mashi data analysis is consistent with what we observed in the simulation study. That is, the AIPW estimator shows large efficiency gain over the IPW estimator when Ai and Ji are strongly correlated, and is still more efficient than the IPW estimator even when Ai and Ji are independent. We infer that both the efficiency of the AIPW method and the informativeness of the auxiliary variables for HIV-related death contributed to the efficiency gain.
We stress that the quantile regression based on the cumulative incidence function studies the “crude” effect on the time to HIV-related death in the presence of other competing risks, i.e., HIV-unrelated death. This analysis is directly interpretable and relevant. However, it should not be used to infer the “net effect”; this would require strong untestable assumptions and/or sensitivity analysis.
In conclusion, this analysis provides additional insights over the primary study results that showed that infants assigned to formula-feed (FF) had a higher rate of all-cause mortality by 7 months of age than infants assigned BF+AZT, but a lower rate of HIV infection (Thior et al. (2006)). Prior to the current analysis, a beneficial effect of either BF+AZT or FF on HIV-related death was plausible: for BF+AZT because breast-feeding decreases the general early death rate; for FF because, by decreasing the rate of early HIV infection, it reduces the number of infants that could potentially die from HIV. The analysis here supports that the beneficial effect of formula-feeding to reduce HIV infections is overwhelmed by the stronger deleterious effect of formula-feeding to increase early deaths in HIV-infected infants. These results support breast-feeding plus antiretroviral prophylaxis during the first several months of life for infants born to HIV-infected mothers in Botswana.
Supplementary Material
Table 2.
The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.4, based on 500 simulated data sets with 20% missing causes.
| Method | Bias×103
|
EmpSD×103
|
EstSD×103
|
CovP×102
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| n = 200 | ||||||||||||
| Omni | 6 | 15 | 8 | 307 | 514 | 308 | 343 | 566 | 338 | 92.4 | 93.2 | 93.6 |
| CC | 77 | 28 | 148 | 313 | 538 | 334 | 345 | 565 | 351 | 92.0 | 93.4 | 91.4 |
| IPW | −7 | 32 | 25 | 358 | 630 | 356 | 374 | 639 | 398 | 90.6 | 92.8 | 94.0 |
| AIPW (Case 1) | −4 | 28 | 34 | 341 | 599 | 354 | 372 | 640 | 397 | 92.2 | 94.0 | 93.4 |
| AIPW (Case 2) | −3 | 22 | 30 | 332 | 572 | 349 | 351 | 587 | 370 | 92.6 | 92.6 | 92.8 |
| AIPW (Case 3) | 5 | 13 | 13 | 315 | 532 | 321 | 334 | 557 | 342 | 92.0 | 94.8 | 94.6 |
|
|
|
|
|
|||||||||
| n = 500 | ||||||||||||
| Omni | 11 | −20 | 18 | 199 | 323 | 193 | 196 | 326 | 199 | 91.8 | 93.0 | 94.2 |
| CC | 88 | −13 | 141 | 198 | 322 | 193 | 196 | 337 | 210 | 89.0 | 93.2 | 89.0 |
| IPW | 4 | −4 | 15 | 218 | 363 | 217 | 218 | 363 | 230 | 91.0 | 91.6 | 94.8 |
| AIPW (Case 1) | 8 | −15 | 20 | 214 | 353 | 220 | 221 | 363 | 224 | 91.0 | 94.6 | 93.8 |
| AIPW (Case 2) | 6 | −9 | 18 | 208 | 344 | 207 | 213 | 357 | 213 | 92.2 | 95.0 | 94.2 |
| AIPW (Case 3) | 10 | −17 | 19 | 205 | 335 | 196 | 204 | 334 | 206 | 92.4 | 94.2 | 95.0 |
Table 3.
The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.2, based on 500 simulated data sets with 40% missing causes.
| Method | Bias×103
|
EmpSD×103
|
EstSD×103
|
CovP×102
|
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | β0 | β1 | β2 | |
| n = 200 | ||||||||||||
| Omni | 5 | −24 | 10 | 274 | 438 | 245 | 304 | 486 | 277 | 92.4 | 92.0 | 92.6 |
| CC | 42 | 226 | 346 | 316 | 551 | 360 | 362 | 606 | 370 | 92.2 | 91.0 | 79.0 |
| IPW | −51 | 45 | 66 | 430 | 768 | 436 | 450 | 779 | 425 | 89.6 | 91.0 | 89.2 |
| AIPW (Case 1) | 2 | −18 | 35 | 363 | 628 | 371 | 381 | 632 | 363 | 91.2 | 92.4 | 90.8 |
| AIPW (Case 2) | 2 | −16 | 32 | 338 | 575 | 330 | 343 | 548 | 328 | 91.0 | 90.8 | 93.0 |
| AIPW (Case 3) | 3 | −16 | 12 | 286 | 477 | 265 | 309 | 502 | 293 | 91.8 | 92.8 | 93.4 |
|
|
|
|
|
|||||||||
| n = 500 | ||||||||||||
| Omni | 7 | −7 | 2 | 173 | 278 | 156 | 170 | 293 | 166 | 91.8 | 93.4 | 93.6 |
| CC | 46 | 239 | 325 | 204 | 348 | 212 | 209 | 357 | 221 | 90.8 | 86.6 | 69.4 |
| IPW | 1 | 7 | 9 | 256 | 467 | 257 | 266 | 473 | 263 | 93.4 | 92.4 | 91.0 |
| AIPW (Case 1) | −1 | 0 | 19 | 221 | 399 | 224 | 222 | 388 | 222 | 92.8 | 91.2 | 92.2 |
| AIPW (Case 2) | 6 | −13 | 10 | 207 | 365 | 202 | 213 | 354 | 204 | 92.2 | 93.4 | 91.0 |
| AIPW (Case 3) | 5 | −5 | 10 | 176 | 294 | 169 | 187 | 303 | 173 | 92.6 | 94.0 | 92.6 |
Acknowledgments
The authors thank the Mashi study team (led by Dr. Max Essex) and the Mashi study participants for the data. The authors also thank the Editor, an associate editor, and the referee for their thoughtful and constructive comments that have significantly improved the paper. This research was partially supported by NSF grants DMS-0604576 and DMS-0905777 (Yanqing Sun), DMS-0706963 and DMS-1007420 (Huixia J. Wang), and NIH grant 2 R37 AI054165-09 (Yanqing Sun and Peter Gilbert).
7 Appendix
The following regularity conditions are assumed in Sections 3 and 4.
-
C1
There exists ν > 0 such that P(C = ν) > 0 and P(C > ν) = 0.
-
C2
Z is uniformly bounded with supi ||Zi|| < ∞.
-
C3
For 0 < τL ≤ τU < τ0 = infz P (T ≤ ν, J = 1 | z), β(τ) is Lipschitz continuous for τ ∈ [τL, τU], and f1(t | z) is bounded in t and z, where f1(t | z) = ∂F1(t | z)/∂t.
-
C4
For some ρ0 > 0 and c0 > 0, infb∈
(ρ0) eigminA(b) ≥ c0, where
(ρ) = {b ∈ Rp+1: infτ∈[τLτU]||b − β(τ)|| ≤ ρ}, A(b) = E[Z⊗2f1{g(ZTb)|Z}], eigminA(b) is the minimum of the eigenvalues of A(b), and u⊗2 = uuT. -
C5
π(Q, ψ) and ρ(Q, φ) are twice differentiable with respect to ψ and φ, respectively; π (Q, ψ) ≥ α > 0; is uniformly bounded; both ρ(W, φ) and are uniformly bounded.
The first four conditions are similar to those of Peng and Fine (2009). The condition C5 requires that the probability of non-missingness be bounded away from zero, as well as other boundedness conditions that are needed to establish weak convergence of the empirical processes.
Proof of Theorem 3.1
Let . The following proof of the consistency holds for both the estimators that are the roots of Sn(b, τ) by taking ϑ̂i = ϑ̂1,i for the IPW estimator and ϑ̂i = ϑ̂2,i for AIPW estimator. Let ϑ1,i = RiδiI(Ji = 1){π(Qi)G(Ti)}−1 and ϑ2,i = RiδiI(Ji = 1){π(Qi)G(Ti)}−1 + [1 − {π(Qi)}−1Ri] δiρ(Wi) {G(Ti)}−1. We use ϑi = ϑ1,i for the IPW estimator and ϑi = ϑ2,i for the AIPW estimator. For brevity, supb and supτ denote the supremum taken over b ∈ ℝp+1 and τ ∈ [τL, τU], respectively.
Let and . Under the missing at random assumption (2.3) and the conditional independence between (Ti, Ji) and Ci given Zi, .
By Condition C1 and C5, for every r > 0, we have, supt<ν |Ĝ (t) − G(t)| = op(n−1/2+r), |ψ̂ − ψ| = op(n−1/2+r) and |φ̂ − φ| = op(n−1/2+r). This, coupled with C2 and C5, implies that . It follows from arguments similar to those of Peng and Fine (2009) that , and thus supτ,b ||n−1/2 Sn(b, τ) − μ(b, τ)|| = op(1). This, together with μ{β(τ), τ} = 0, implies the uniform consistency of both β̂I(τ) and β̂A(τ) under C4.
Proof of Theorem 3.2
Let β̂ (τ) be the root of Sn(b, τ). First we show that Sn{β(τ), τ} converges weakly to a mean zero Gaussian process and derive its asymptotic covariance matrix. Note that
| (7.1) |
The asymptotic approximation for (7.1) is obtained below for the IPW and AIPW estimators, respectively.
For the IPW estimator, ϑ̂i and ϑi of (7.1) correspond to ϑ̂1,i and ϑ1,i, respectively, and β̂ (τ) = β̂I(τ). Let . We have
| (7.2) |
From Pepe (1991),
| (7.3) |
where and y(·) are defined in Section 3 just before Theorem 3.1.
By (3.1),
| (7.4) |
By (7.2), (7.3) and (7.4), the second term of (7.1) is
Writing and changing the order of the summations, the above is
| (7.5) |
Let
. The function class
is Donsker, and thus Glivenko-Cantelli (van der Vaart and Wellner (1996)) because the class of indicator functions is Donsker, and Zi, {π(Qi)G(Xi)}−1 and
are uniformly bounded. It follows from the Glivenko-Cantelli Theorem that
, uniformly in both b ∈ ℝp+1 and t ∈ [0, ν). The limit is w1(b, t), defined in Section 3 just before Theorem 3.1, under MAR and the independent censoring assumption. Since Ĝ (Xi) = G(Xi) + Op(n−1/2) and π̂ (Qi) = π(Qi) + Op(n−1/2) uniformly in i ∈ {1, … n},
uniformly in τ ∈ [τL, τU] and t ∈ [0, ν).
Similarly, uniformly in τ ∈ [τL, τU], where w2(b) is defined in Section 3 just before Theorem 3.1.
By (7.1) and (7.5), the next asymptotic equivalence follows by applying Lemma 2 of Gilbert, McKeague, and Sun (2008) to (7.5):
| (7.6) |
uniformly in τ ∈ [τL, τU], where ξ1,i(τ) = a1,i(τ) + bi(τ) + ci(τ), and a1,i(τ), bi(τ) and ci(τ) are defined in Section 3 just before Theorem 3.1.
For the AIPW estimator, ϑ̂i and ϑi of (7.1) correspond to ϑ̂2,i and ϑ2,i, respectively, and β̂ (τ) = β̂A(τ). Then ϑ̂2,i − ϑ2,i is
Now, we apply the decompositions (7.3), (7.4), and (3.2), and plug them into (7.1). By the Glivenko-Cantelli Theorem, we can show that
uniformly in both b ∈ ℝp+1 and t ∈ [0, ν), where . It is easy to see that w3(b) = −w2(b).
Using similar techniques as for the IPW estimator, we obtain
| (7.7) |
uniformly in τ ∈ [τL, τU] in probability, where ξ2,i(τ) = a2,i(τ) + bi(τ) and .
We have derived the asymptotic approximations of Sn{β(τ), τ} in (7.6) and (7.7) for the IPW estimator and AIPW estimator, respectively. It is obvious that the function class {ci(τ), τ ∈ [τL, τU]} is Donsker. Applying the similar arguments for
, the function classes {a1,i(τ), τ ∈ [τL, τU]} and {a2,i(τ), τ ∈ [τL, τU]} are Donsker by the Lipschitz continuity of β(·) implied by C3, and by using the fact that the Donsker Property is preserved under the Lipschitz transformation. It is not difficult to show that
is Lipschitz in b. Hence the function class {bi(τ), τ ∈ [τL, τU]} is Donsker. The Donsker property is preserved under addition. As a result, Sn{β(τ), τ} converges weakly to a mean zero Guassian process with covariance matrix
by (7.6) for the IPW estimator, and it converges weakly to a mean zero Guassian process with covariance matrix
by (7.7) for the AIPW estimator, for τ, τ′ ∈ [τL, τU].
Next, simple algebraic manipulations show that Sn{β̂ (τ), τ}− Sn{β(τ), τ} = (I) + (II), where
From Lemma 1 of Peng and Fine (2009) and the uniform consistency of β̂ (τ), it follows that the difference between (I) and n1/2[μ{β̂ (τ), τ} − μ{β(τ), τ}] converges to zero uniformly in τ ∈ [τL, τU] in probability. By the first order Taylor expansion of ϑ̂i around ϑi, (3.1), (3.2), (7.3) and applying Lemma 1 of Peng and Fine (2009), we can show that (II) = o(1) uniformly in τ ∈ [τL, τU] in probability. Taylor expansions of μ(b, τ) around b = β(τ), along with the fact that β̂ (τ) uniformly converges to β (τ), gives that
where . Given Sn{β̂ (τ), τ} = op(n−1/2), this further implies that
where . The asymptotic approximations (3.3) and (3.4) for the IPW estimator and the AIPW estimator follow from (7.6) and (7.7), respectively. Hence, n1/2{β̂ (τ) − β(τ)} converges weakly to a mean zero Guassian process with covariance matrix Φ1(τ′, τ) = [A{β(τ′)}]−1Σ1(τ′, τ) [A{β(τ)}]−1 for the IPW estimator and Φ2(τ′, τ) = [A{β(τ′)}]−1 Σ2(τ′, τ) [A{β(τ)}]−1 for the AIPW estimator.
Finally, we show that the AIPW estimator is more efficient than the IPW estimator by showing that Σ2(τ′, τ) ≤ Σ1(τ′, τ). Note that a2,i(τ) = a1,i(τ) + ei(τ) and ξ1,i(τ) = ξ2,i(τ) + {ci(τ) − ei(τ)}, where ei(τ) is defined in Section 3 just before Theorem 3.1. By (7.6) and (7.7), it suffices to show E[ξ2,i(τ′){ci(τ) − ei(τ)}T] = 0.
Under MAR, Ri and Ji are conditionally independent given Qi, we have
which equals zero by E(ηi | Qi) = 0. By E(ηi|Qi) = 0, we also have . Therefore . Similarly, . Hence .
Proof of (4.1)
Let . Under the MAR assumption, Ri and Ji are conditionally independent given Qi, and we have
where the last equation is obtained by the definition of ηi following (3.1). It is easy to see that and since . It follows that
Contributor Information
Yanqing Sun, Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223.
Huixia Judy Wang, Department of Statistics, North Carolina State University, Raleigh, NC 27695.
Peter B. Gilbert, Department of Biostatistics, University of Washington and Fred Hutchinson Cancer Research Center, Seattle, WA 98109
References
- Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]
- Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. Journal of Pediatrics. 1995;126:191–197. doi: 10.1016/s0022-3476(95)70544-9. [DOI] [PubMed] [Google Scholar]
- Beran R. Technical report. University of California; Berkeley: 1981. Nonparametric Regression With Randomly Censored Survival Data. [Google Scholar]
- Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]
- Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society, B. 1972;34:187–220. [Google Scholar]
- Dunn DT, Newell ML, Ades AE, Peckham CS. Risk of human immunodeficiency virus type 1 transmission through breastfeeding. The Lancet. 1992;340:585–588. doi: 10.1016/0140-6736(92)92115-v. [DOI] [PubMed] [Google Scholar]
- Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891. [Google Scholar]
- Gilbert PB, McKeague IW, Sun Y. The two-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics. 2008;9:263–276. doi: 10.1093/biostatistics/kxm028. [DOI] [PubMed] [Google Scholar]
- Goetghebeur E, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–834. [Google Scholar]
- Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1952;47:663–685. [Google Scholar]
- Huang Y. Calibration regression of censored lifetime medical cost. Journal of the American Statistical Association. 2002;97:318–327. [Google Scholar]
- Koenker R, Bassett GS. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
- Koul H, Susarla V, Van Ryzin J. Regression analysis with randomly right censored data. The Annals of Statistics. 1981;9:1276–1288. [Google Scholar]
- Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;19:219–234. [Google Scholar]
- Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. doi: 10.1111/j.0006-341x.2001.01191.x. [DOI] [PubMed] [Google Scholar]
- Neocleous T, Vanden Branden K, Portnoy S. Correction to censored regression quantiles by S. Portnoy, 98 (2003), 1001–1012. Journal of the American Statistical Association. 2006;101:860–861. [Google Scholar]
- Peng L, Fine JP. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]
- Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]
- Pepe MS. Inference for events with dependent risks in multiple endpoint studies. Journal of the American Statistical Association. 1991;86:770–778. [Google Scholar]
- Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003;98:1001–1012. [Google Scholar]
- Prentice RL, Kalbfleisch JD, Peterson AV, Fluornoy N, Farewell VT, Breslow NE. The analysis of failure time in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
- Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
- Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable dropout using semiparametric nonresponse models: rejoinder. Journal of the American Statistical Association. 1999;94:1135–1146. [Google Scholar]
- Thior I, Lockman S, Smeaton LM, Shapiro RL, Wester C, Heymann SJ, Gilbert PB, Stevens L, Peter T, Kim S, van Widenfelt E, Moffat C, Ndase P, Arimi P, Kebaabetswe P, Mazonde P, Makhema J, McIntosh K, Novitsky V, Lee TH, Marlink R, Lagakos S, Essex M and the Mashi Study Team. Breastfeeding plus infant zidovudine prophylaxis for 6 months vs formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana: a randomized trial: the Mashi Study. Journal of the American Medical Association. 2006;296:794–805. doi: 10.1001/jama.296.7.794. [DOI] [PubMed] [Google Scholar]
- Tsiatis AA. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences USA. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag; New York: 1996. [Google Scholar]
- Wang H, Wang L. Locally weighted censored quantile regression. Journal of the American Statistical Association. 2009;104:1117–1128. [Google Scholar]
- Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
