Quantile Regression for Competing Risks Data with Missing Cause of Failure

Yanqing Sun; Huixia Judy Wang; Peter B Gilbert

doi:10.5705/ss.2010.093

. Author manuscript; available in PMC: 2013 Aug 13.

Published in final edited form as: Stat Sin. 2012 Apr 1;22(2):703–728. doi: 10.5705/ss.2010.093

Quantile Regression for Competing Risks Data with Missing Cause of Failure

Yanqing Sun ¹, Huixia Judy Wang ², Peter B Gilbert ³

PMCID: PMC3742132 NIHMSID: NIHMS452342 PMID: 23950622

Abstract

This paper considers generalized linear quantile regression for competing risks data when the failure type may be missing. Two estimation procedures for the regression co-efficients, including an inverse probability weighted complete-case estimator and an augmented inverse probability weighted estimator, are discussed under the assumption that the failure type is missing at random. The proposed estimation procedures utilize supplemental auxiliary variables for predicting the missing failure type and for informing its distribution. The asymptotic properties of the two estimators are derived and their asymptotic efficiencies are compared. We show that the augmented estimator is more efficient and possesses a double robustness property against misspecification of either the model for missingness or for the failure type. The asymptotic covariances are estimated using the local functional linearity of the estimating functions. The finite sample performance of the proposed estimation procedures are evaluated through a simulation study. The methods are applied to analyze the ‘Mashi’ trial data for investigating the effect of formula-versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants born to HIV-infected mothers in Botswana.

Keywords: Augmented inverse probability weighted, Auxiliary variables, Competing risks, Double robustness, Efficient estimator, Estimating equation, Inverse probability weighted, Local functional linearity, Logistic regression, Mashi trial, Missing at random, Quantile regression

1 Introduction

This paper was motivated by the need to analyze the ‘Mashi’ trial data (mashi means milk in Setswana) for examining the effect of formula- versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants born to HIV-infected mothers in Botswana (Thior et al. (2006)). Whereas studies including the Mashi trial have shown that formula-feeding increases the overall risk of death while breast-feeding increases the risk of transmitting HIV (Dunn et al. (1992); Beaudry, Dufour, and Marcoux (1995)), the effect of feeding strategy on death due to HIV infection is unknown. Accordingly, it is of interest to assess the treatment effect on HIV-related death, with HIV-unrelated death considered as a competing risk. This analysis provides additional insight over the analysis of all-cause death by addressing whether the known beneficial effect of formula-feeding to prevent HIV infection leads to a beneficial effect to reduce mortality of HIV infected infants.

Of the 111 live-born infants who died in the Mashi trial, the cause of death is known for 50 and missing for 61. It is well-known that the analysis of only cases with complete information may lead to inefficient and/or biased estimates. To account for the missingness, a number of methods have been developed to estimate the covariate effects under different survival models for the cause-specific hazard functions, for instance the proportional hazards model (Goetghebeur and Ryan (1995), Lu and Tsiatis (2001)), the linear transformation model (Gao and Tsiatis (2005)) and the additive hazard model (Lu and Liang (2008)), among others.

In this paper, we consider a quantile regression model (Koenker and Bassett (1978)) that is a valuable complement to the Cox proportional hazards model (Cox (1972)) and the accelerated failure time model (Buckley and James (1979); Koul, Susarla, Van Ryzin (1981)). Quantile regression allows the covariate effects to vary at different tails of the event time distribution. Such important heterogeneity in the population may be overlooked by using the Cox model or the accelerated failure time model. General quantile regression methods in survival analysis are developed under the assumption that censoring is independent of failure time; see Ying, Jung, and Wei (1995), Bang and Tsiatis (2002), Portnoy (2003), Neocleous, Vanden Branden, and Portnoy (2006), Peng and Huang (2008), Wang and Wang (2009), among others.

Most relevant to this work, Peng and Fine (2009) studied the quantile for the cumulative incidence function, which is the distribution of the time to failure due to a particular cause of interest. However, their method does not account for missingness of failure cause. Its application based on the complete-cases may be invalid and misleading because of the high percentage of missing causes of death in the Mashi data. We consider generalized linear quantile regression for competing risks data when causes of failure may be missing. Two estimation procedures are discussed under the assumption that the failure cause is missing at random (Rubin (1976)). The first, following the idea of Horvitz and Thompson (1952), uses inverse probability weighting (IPW) of complete-cases, which leverages auxiliary predictors of whether cause of failure is observed. The second approach, adapting the theory of Robins, Rotnitzky, and Zhao (1994), augments the IPW complete-case estimator with auxiliary predictors of the cause of failure of interest.

This work fits in the general area of competing risks failure time analysis, wherein subjects are followed over time and may fail from one of many causes. The competing risks failure time can be represented by the minimum of the latent failure times, each of which is defined as the time to failure from a particular cause in the absence of all other competing risks. The existing quantile regression methods could be applied within this framework by considering quantile regression of the latent failure time for a particular cause while treating other latent failure times as censoring and by assuming mutual independence of the latent failure times (Tsiatis (1975)). This independence mutual assumption is untestable and is often dubious (because we expect positive correlation of the latent failure times), however, and Peng and Fine (2009) took a different approach that avoids this assumption. In particular, they studied the cumulative incidence function, which is the distribution of the time to failure due to a particular cause in the presence of the other competing risks. This approach evaluates “crude” effects on the cause-specific cumulative incidence, and hence caution is needed in the interpretation of the results (Prentice et al. (1978)). This is the dominant approach for assessing competing risks data given the fundamental non-identifiability of the latent failure times, and the methods developed here take this approach.

The rest of the paper is organized as follows. Two procedures for estimating the regression coefficients are proposed in Section 2. The asymptotic properties of these estimators are derived and their asymptotic efficiencies are compared in Section 3. Procedures for estimating the asymptotic covariances are given in Section 4. The finite-sample performance of the proposed estimation procedures are evaluated in Section 5 through a simulation study. The methods are applied to the Mashi data in Section 6 for investigating the effect of formula-versus breast-feeding plus extended infant zidovudine prophylaxis on HIV-related death of infants. All proofs are in Section 7.

2 Estimation procedures

2.1 Model descriptions and assumptions

Let T be the survival time of interest. Due to censoring, we only observe (X, δ), where X = min(T, C), δ = I(T ≤ C) and C is the censoring variable. Let J denote the failure type associated with the uncensored failure time T. The J is meaningless and undefined if T is censored. For convenience we let Z be the (p + 1)-dimensional concomitant variable including 1 as its first component corresponding to an intercept. A typical right-censored competing risks data set consists of independent and identically distributed (i.i.d.) copies (X_i, δ_i, δ_iJ_i, Z_i), i = 1, …, n, of (X, δ, δ J, Z).

We consider J = 1 as the failure type of interest and set J = 2 for all other failure types. The type-1 cumulative incidence function is F₁(t|Z) = P (T ≤ t, J = 1|Z), which represents the conditional probability of observing a type-1 failure by time t given the covariate Z. The τth type-1 conditional quantile given Z = z is defined as $F_{1}^{- 1} (τ ∣ z) = inf {t : F_{1} (t ∣ z) \geq τ}$ . Let ν be the end of follow-up time and satisfying the condition C1 given in the Appendix. For identifiability, we require that τ ≤ τ₀ where τ₀ = inf_z P (T ≤ ν, J = 1 | z). For τ ∈ [τ_L, τ_U] with 0 < τ_L ≤ τ_U < τ₀, the τth generalized linear quantile regression is

F_{1}^{- 1} (τ ∣ z) = g {z^{T} β (τ)},

(2.1)

where g(·) is a known monotone link function and β(τ) is a (p + 1)-dimensional coefficient vector depending on τ.

To help with understanding model (2.1) and the interpretation of β(τ), we consider the following scenario. Suppose Z_i = (1, Z_i,₁), where Z_i,₁ is the indicator of gender for subject i, say Z_i,₁ = 1 for male and Z_i,₁ = 0 for female, and T_i is the time (age in years) to death. Suppose β(τ) = (β₀(τ), β₁(τ)), with β₀(τ) = 70 and β₁(τ) = −5 at τ = 0.3. Thus, conditional on gender, the age by which 30% of the population dies of type-1 failure is 70 − 5Z_i,₁ under the identity link function. That is, 30% of females die from type-1 failure before age 70 and 30% of males die from type-1 failure before age 65. The gender effect is 5 years – the age at which 30% of individuals die from type-1 failure is 5 years sooner for males than for females.

Let G(t|Z_i) = P (C_i ≥ t|Z_i) and let Ĝ(t|Z_i) be a semiparametric or nonparametric consistent estimator of G(t|Z_i). Peng and Fine (2009) proposed the following estimating equation for β(τ) based on fully observed competing risks data {X_i, Z_i, δ_i, δ_i J_i, i = 1 …, n}:

S_{n} (b, τ) = \sum_{i = 1}^{n} Z_{i} [\frac{I {T_{i} \leq g (Z_{i}^{T} b), J_{i} = 1} δ_{i}}{\hat{G} (T_{i} ∣ Z_{i})} - τ] = 0.

(2.2)

In this paper we consider the quantile regression (2.1) based on the competing risks data with possibly missing failure type. Let R_i be the complete-case indicator: R_i = 1 either if δ_i = 0 or if δ_i = 1 and J_i is observed; and R_i = 0 otherwise. Auxiliary variables A_i may be helpful for predicting the missing failure type. Since the failure type is defined only for those who are observed to fail, only supplemental information for the observed failures are potentially useful for predicting missingness and for informing about the distribution of the failure type. As such, we denote available auxiliaries by δ_iA_i.

We assume the censoring time C_i is conditionally independent of (T_i, J_i) given Z_i. We also assume the failure type J_i is missing at random (Rubin (1976)); that is, given δ_i = 1 and W_i = (T_i, Z_i, A_i), the probability that the failure type J_i is missing depends only on the observed W_i, not on the value of J_i; this assumption is expressed as

MAR : r (W_{i}) \equiv P (R_{i} = 1 ∣ J_{i}, δ_{i} = 1, W_{i}) = P (R_{i} = 1 ∣ δ_{i} = 1, W_{i}) .

(2.3)

Let π(Q_i) = P (R_i = 1|Q_i), where Q_i = (W_i, δ_i). Then

π (Q_{i}) = δ_{i} r (W_{i}) + (1 - δ_{i}) .

(2.4)

The observed data can be summarized as O_i = {X_i, Z_i, δ_i, R_i, R_iδ_iJ_i, δ_iA_i}, i = 1 …, n. We assume that O_i’s are independent and identically distributed.

2.2 Inverse probability weighted estimator

First, following the idea of Horvitz and Thompson (1952), we propose a procedure for estimating β(τ) that uses inverse probability weighting (IPW) of complete-cases. We consider a parametric model r(W_i, ψ) for r(W_i) = P (R_i = 1|δ_i = 1, W_i), where ψ is an unknown vector of finite-dimensional parameters. For example, r(W_i, ψ) may be a logistic regression model with $log [r (W_{i}, ψ) / {1 - r (W_{i}, ψ)}] = W_{i}^{T} ψ$ . The parameter ψ can be estimated by ψ̂ the maximizer of the observed-data likelihood

\prod_{i = 1}^{n} {r (W_{i}, ψ)}^{R_{i} I (δ_{i} = 1)} {1 - r (W_{i}, ψ)}^{1 - R_{i}} .

(2.5)

Therefore, we can estimate π(Q_i, ψ) by π̂ (Q_i) = π(Q_i, ψ̂) = δ_ir̂ (W_i) + (1 − δ_i) where r̂ (W_i) = r(W_i, ψ̂).

Modifying (2.2) to accommodate missing failure types leads to the IPW estimating equation for β(τ):

S_{1, n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [\frac{R_{i}}{π (Q_{i}, \hat{ψ})} \frac{I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b, J_{i} = 1} δ_{i}}{\hat{G} (T_{i} ∣ Z_{i})} - τ] = 0.

(2.6)

We can write $S_{1, n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b} {\hat{ϑ}}_{1, i} - τ]$ , where ϑ̂₁_,i = R_iδ_iI(J_i = 1)/{π̂ (Q_i)Ĝ (T_i|Z_i)}. We refer to the solution of (2.6) as the IPW estimator, denoted by β̂_I(τ).

2.3 Augmented inverse probability weighted estimator

Because the IPW estimator obtained by solving (2.6) uses data from complete cases only, it is inefficient, and it is asymptotically consistent only if the missingness probability π(W_i, ψ) is correctly modeled. Adapting the theory of Robins, Rotnitzky, and Zhao (1994) to gain more efficiency and robustness against the misspecification of π(W_i, ψ), we propose an improved estimation procedure that augments the IPW complete-case estimator with auxiliary predictors of the failure type of interest.

Let ρ(W_i) = P (J_i = 1|δ_i = 1, W_i). The missing at random assumption (2.3) implies that J_i is independent of R_i given Q_i:

ρ (W_{i}) = P (J_{i} = 1 ∣ R_{i} = 1, δ_{i} = 1, W_{i}) .

(2.7)

Let ρ(W_i, φ) be a parametric model for ρ(W_i), where φ is a vector of unknown parameters. From (2.7), it follows that ρ(W_i) can be estimated from the complete cases with R_i = 1 and δ_i = 1. The maximum likelihood estimator of φ, φ̂, can be obtained by maximizing the likelihood

\prod_{i = 1}^{n} {ρ (W_{i}, φ)}^{R_{i} δ_{i} I (J_{i} = 1)} {1 - ρ (W_{i}, φ)}^{R_{i} δ_{i} I (J_{i} \neq 1)} .

(2.8)

Denote ρ(W_i, φ̂) by ρ̂ (W_i). We consider the augmented IPW estimating equation for β(τ):

S_{2, n} (b, τ) = 0,

(2.9)

where

S_{2, n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} ([\frac{R_{i}}{\hat{π} (Q_{i})} \frac{I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b, J_{i} = 1} δ_{i}}{\hat{G} (T_{i} ∣ Z_{i})} + {1 - \frac{R_{i}}{\hat{π} (Q_{i})}} \frac{I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b} δ_{i}}{\hat{G} (T_{i} ∣ Z_{i})} \hat{ρ} (W_{i})] - τ) .

Let ϑ̂_2,i = R_iδ_iI(J_i = 1)/{π̂(Q_i)Ĝ(T_i|Z_i)} + δ_i[1 − {π̂(Q_i)}⁻¹ R_i] ρ̂(W_i)/Ĝ(T_i|Z_i). Then $S_{2, n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b} {\hat{ϑ}}_{2, i} - τ]$ . The solution to the augmented IPW estimating equation (2.9) is referred to as the AIPW estimator and denoted by β̂_A(τ).

Replacing the estimates Ĝ (·), π̂ (·), and ρ̂ (·) in the estimating function S₂_,n(b, τ) by their estimands G(·), π(·), and ρ(·), we have E[S₂_,n{β(τ), τ}] = 0 if MAR holds and if one of the parametric models, r(W_i, ψ) and ρ(W_i, ψ), is correctly specified. In fact, under MAR (2.3) and consequently (2.7), E[S₂_,n{β(τ), τ}] = E (E [S₂_,n {β(τ), τ}|Q_i, J_i]) = 0 if r(W_i, ψ) is correctly specified, and E [S₂_,n {β(τ), τ}] = E (E [S₂_,n {β(τ), τ}|Q_i, δ_i = 1]) = 0 if ρ(W_i, ψ) is correctly specified. This leads to the double robustness property of the AIPW estimator that β̂_A(τ) is consistent for β(τ) provided that at least one of r̂ (·) and ρ̂ (·) is a consistent estimator for r(·) and ρ(·). The missing at random assumption MAR is essential for r(W_i) and ρ(W_i) to be identifiable. Violation of MAR may result in inconsistent estimation of both r(·) and ρ(·), and thus render both the IPW and AIPW estimators inconsistent. This property is further demonstrated in our simulation study in Section 5.

The augmented estimating equation (2.9) follows the ideas of Robins, Rotnitzky and Zhao (1994) for efficient augmentation, whereas (X_i, δ_i, δ_iJ_i, Z_i) is considered as the full data and the full data estimating equation is (2.2), as given by Peng and Fine (2009). It is interesting to note that Peng and Fine’s estimating equation (2.2) is, in turn, based on the inverse probability weighting for censoring of the estimating equation for the full data (T_i, J_i, Z_i), while the observed data in their case is (X_i, δ_i, δ_iJ_i, Z_i). It would be desirable to improve the efficiency of the Peng and Fine (2009) estimator with augmentation. By Robins, Rotnitzky, and Zhao (1994), the efficient augmentation of (2.2) requires the estimation of the conditional expectation $E [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b, J_{i} = 1} ∣ X_{i}, δ_{i}, δ_{i} J_{i}, Z_{i}]$ , which is unobtainable since the conditional distribution of (T_i, J_i = 1) given (C_i, δ_i = 0, Z_i) is not identifiable based on the observed competing risks data. Its implementation would require some untestable and perhaps unreasonable/conflicting assumptions, such as independence of (T_i, J_i = 1) and (C_i, δ_i = 0) given Z_i.

The numerical procedure for solving equation (2.9) is equivalent to locating the minimizer of the function:

U_{2, n} (b, τ) = \sum_{i = 1}^{n} {\hat{ϑ}}_{2, i} | g^{- 1} (X_{i}) - Z_{i}^{T} b | + | M + \sum_{i = 1}^{n} {\hat{ϑ}}_{2, i} Z_{i}^{T} b | + | M - \sum_{i = 1}^{n} 2 τ Z_{i}^{T} b |,

(2.10)

where M is a large positive number. Equivalency is due to the fact that U₂_,n(b, τ) is a convex function in b and its derivative is 2n^1/2 S₂_,n(b, τ) when M exceeds $∣ \sum_{i = 1}^{n} {\hat{ϑ}}_{2, i} Z_{i}^{T} b ∣$ and $∣ \sum_{i = 1}^{n} 2 τ Z_{i}^{T} b ∣$ for all b within the compact parameter set of β(τ). Under (2.1), $Z_{i}^{T} β (τ) = g^{- 1} {F_{1}^{- 1} (τ ∣ Z_{i})}$ . It is necessary that $Z_{i}^{T} b \leq {max}_{1 \leq i \leq n} ∣ g^{- 1} (X_{i}) ∣$ holds for all i for some parameter vector b for β(τ) to be identifiable. Hence $∣ \sum_{i = 1}^{n} {\hat{ϑ}}_{2, i} Z_{i}^{T} b ∣ \leq \sum_{i = 1}^{n} ∣ {\hat{ϑ}}_{2, i} ∣ {max}_{1 \leq i \leq n} ∣ g^{- 1} (X_{i}) ∣$ and $∣ \sum_{i = 1}^{n} 2 τ Z_{i}^{T} b ∣ \leq 2 n τ {max}_{1 \leq i \leq n} ∣ g^{- 1} (X_{i}) ∣$ . We further notice that |ϑ̂₂_,i| ≤ 3/{π̂ (Q_i)Ĝ(T_i|Z_i)}. For most practical applications, where the missingness probabilities are less than 0.9 and fewer than 90% of subjects are censored, it is reasonable to assume that $\sum_{i = 1}^{n} 3 / {\hat{π} (Q_{i}) \hat{G} (T_{i} ∣ Z_{i})} \leq 300 n$ . Then it suffices to take M ≥ 300n max₁_≤i≤n |g⁻¹ (X_i)|. One can use a number greater than 300 in the lower bound for M in more extreme situations. Similarly, the estimating equation (2.6) can be solved by minimizing (2.10) with ϑ̂₂_,i replaced by ϑ̂₁_,i, and the same choice of M can be used in the minimization.

3 Asymptotic properties

Throughout the rest of the paper, we assume the censoring distribution does not depend on the covariates, i.e., G(t|Z_i) = G(t), and use the Kaplan-Meier estimator Ĝ (t) to estimate G(t). The independence assumption for C_i and Z_i can be relaxed, in which case the conditional Kaplan-Meier estimator (Beran (1981)) can be used to estimate G(t|Z_i), and the asymptotic distributions for β̂_I(τ) and β̂_A(τ) need to be modified to accommodate the additional variations. This section derives the uniform consistency and weak convergence of the proposed estimators β̂_I(τ) and β̂_A(τ), for τ over the interval [τ_L, τ_U], under the conditions C1–C5 given in the Appendix. It also compares the asymptotic efficiency of the two estimators.

Under C5, n^1/2(ψ̂−ψ) and n^1/2(φ̂−φ) are asymptotically linear with influence functions η_i and ζ_i, respectively, such that

n^{1 / 2} (\hat{ψ} - ψ) = n^{- 1 / 2} \sum_{i = 1}^{n} η_{i} + o_{p} (1),

(3.1)

n^{1 / 2} (\hat{φ} - φ) = n^{- 1 / 2} \sum_{i = 1}^{n} ζ_{i} + o_{p} (1),

(3.2)

where {(η_i, ζ_i), i = 1, …, n} are i.i.d. random variables with Eη_i = 0 and Eζ_i = 0. Under the logistic regression model for r(W_i, ψ), we can write $η_{i} = I_{ψ}^{- 1} π_{ψ}^{'} (Q_{i}, ψ) δ_{i} {R_{i} - π_{i} (Q_{i})} / [π_{i} (Q_{i}) {1 - π_{i} (Q_{i})}]$ , where I_ψ is the asymptotic information matrix of the likelihood function (2.5).

Let $N_{i}^{G} (t) = I (X_{i} \leq t, δ_{i} = 0)$ , Y_i(t) = I(X_i ≥ t), y(t) = P (X_i ≥ t) and $M_{i}^{G} (t) = N_{i}^{G} (t) - \int_{0}^{t} Y_{i} (s) λ^{G} (s) d s$ , where λ^G(t) is the hazard function for the censoring variable C. Let $w_{1} (b, t) = E [Z_{i} Y_{i} (t) I {X_{i} \leq g (Z_{i}^{T} b)} ϑ_{1, i}] y^{- 1} (t), w_{2} (b) = - E (Z_{i} {π_{ψ}^{'} (Q_{i}, ψ)}^{T} {π (Q_{i})}^{- 1} I [X_{i} \leq g {Z_{i}^{T} b}] ϑ_{1, i})$ . Under MAR and the independent censoring assumption, it is easy to see that $w_{2} (b) = - E (Z_{i} {π_{ψ}^{'} (Q_{i}, ψ)}^{T} {π (Q_{i})}^{- 1} ρ (W_{i}) I [T_{i} \leq g {Z_{i}^{T} b}])$ . Let $a_{1, i} (τ) = Z_{i} (I [X_{i} \leq g {Z_{i}^{T} β (τ)}] ϑ_{1, i} - τ), a_{2, i} (τ) = Z_{i} (I [X_{i} \leq g {Z_{i}^{T} β (τ)}] ϑ_{2, i} - τ), b_{i} (τ) = \int_{0}^{\infty} w_{1} (β (τ), s) {d M}_{i}^{G} (s)$ , and $e_{i} (τ) = Z_{i} I [X_{i} \leq g {Z_{i}^{T} β (τ)}] δ_{i} {G (T_{i})}^{- 1} [1 - R_{i} {π (Q_{i})}^{- 1}] ρ (W_{i})$ . Define ξ₁_,i(τ) = a₁_,i(τ) + b_i(τ) + c_i(τ) and ξ₂_,i(τ) = a₂_,i(τ) + b_i(τ). Let β(τ) be the true regression coefficient at τ.

Theorem 3.1

Under C1–C5, given in the Appendix, we have lim_n_→∞ sup_{τ∈[τ_L, τ_U]}||β̂_I(τ) − β(τ)|| = 0 and lim_n_→∞ sup_{τ∈[τ_L,τ_U}]||β̂_A(τ) − β(τ)|| = 0 in probability, where ||·|| is the Euclidean norm.

We show in the Appendix that the asymptotic approximations hold for the IPW estimator and the AIPW estimator uniformly in τ ∈ [τ_L, τ_U] in probability:

n^{1 / 2} {{\hat{β}}_{I} (τ) - β (τ)} = n^{- 1 / 2} \sum_{i = 1}^{n} {[A {β (τ)}]}^{- 1} ξ_{1, i} (τ) + o_{p} (1),

(3.3)

n^{1 / 2} {{\hat{β}}_{A} (τ) - β (τ)} = n^{- 1 / 2} \sum_{i = 1}^{n} {[A {β (τ)}]}^{- 1} ξ_{2, i} (τ) + o_{p} (1),

(3.4)

where $A {β (τ)} = E (Z_{i}^{\otimes 2} f_{1} [g {Z_{i}^{T} β (τ)} ∣ Z_{i}])$ and f₁(t|z) = ∂F₁(t|z)/∂t. The approximations (3.3) and (3.4) lead to the following asymptotic results.

Theorem 3.2

Under C1–C5, given in the Appendix, we have

both n^1/2{β̂_I(τ) − β(τ)} and n^1/2{β̂_A(τ) − β(τ)} converge weakly to mean zero Guassian processes with covariance matrices Φ₁(τ′, τ) = [A{β(τ′)}]⁻¹Σ₁(τ′, τ) [A{β(τ)}]⁻¹ and Φ₂(τ′, τ) = [A{β(τ′)}]⁻¹Σ₂(τ′, τ) [A{β(τ)}]⁻¹ for τ, τ′ ∈ [τ_L, τ_U], respectively, where $\sum_{1} (τ^{'}, τ) = E {ξ_{1, i} (τ^{'}) ξ_{1, i}^{T} (τ)}$ and $\sum_{2} (τ^{'}, τ) = E {ξ_{2, i} (τ^{'}) ξ_{2, i}^{T} (τ)}$ ;
the AIPW estimator β̂_A(τ) is more efficient than the IPW estimator β̂_I(τ) with Σ₁(τ′, τ) ≥ Σ₂(τ′, τ).

4 Estimation of the covariance matrices

In quantile regression, the estimating functions are not smooth and the asymptotic covariances for the estimators of the regression coefficients involve a subdensity function, which poses difficulties for the estimation of the covariances. Huang (2002) proposed a novel variance estimation procedure for a calibration regression model using the local functional linearity of the estimating functions. Peng and Fine (2009) generalized this technique to the competing risks setting. Our estimators of the asymptotic covariances are constructed following the exposition of Peng and Fine (2009).

First we derive an estimator for Σ₁(τ, τ). It is shown in the Appendix that

E [ξ_{1, i} (τ^{'}) {ξ_{1, i} (τ)}^{T}] = E {a_{1, i} (τ^{'}) a_{1, i}^{T} (τ)} - E {b_{i} (τ^{'}) b_{i}^{T} (τ)} - w_{2} {β (τ^{'})} I_{ψ}^{- 1} {[w_{2} {β (τ)}]}^{T} .

(4.1)

Let (Î_ψ)⁻¹ be the estimator of the variance of ψ̂ and let ${\hat{w}}_{2} {{\hat{β}}_{I} (τ)} = - n^{- 1} \sum_{i = 1}^{n} Z_{i} {π_{ψ}^{'} (Q_{i}, \hat{ψ})}^{T} {\hat{π} (Q_{i})}^{- 1} I [X_{i} \leq g {Z_{i}^{T} {\hat{β}}_{I} (τ)}] {\hat{ϑ}}_{1, i}$ . Based on (4.1), Σ₁(τ, τ) can be consistently estimated by

\begin{array}{l} {\sum^{^}}_{1} (τ, τ) = n^{- 1} \sum_{i = 1}^{n} Z_{i}^{\otimes 2} {({\hat{ϑ}}_{1, i} I [X_{i} \leq g {Z_{i}^{T} {\hat{β}}_{I} (τ)}] - τ)}^{2} \\ - n^{- 1} \sum_{i = 1}^{n} (1 - δ_{i}) {(\frac{\sum_{j = 1}^{n} Z_{j} I [X_{i} \leq X_{j} \leq g {Z_{j}^{T} {\hat{β}}_{I} (τ)}] {\hat{ϑ}}_{1, j}}{\sum_{j = 1}^{n} I (X_{j} \geq X_{i})})}^{\otimes 2} \\ - {\hat{w}}_{2} {{\hat{β}}_{I} (τ)} {({\hat{I}}_{ψ})}^{- 1} {[{\hat{w}}_{2} {{\hat{β}}_{I} (τ)}]}^{T} . \end{array}

(4.2)

Next, since ξ₂_,i(τ) = a₂_,i(τ)+ b_i(τ), with similar arguments to the proof of (4.1) we obtain

E [ξ_{2, i} (τ^{'}) {ξ_{2, i} (τ)}^{T}] = E {a_{2, i} (τ^{'}) a_{2, i}^{T} (τ)} - E [b_{i} (τ^{'}) b_{i}^{T} (τ)] .

(4.3)

Thus Σ₂(τ, τ) can be consistently estimated by

{\sum^{^}}_{2} (τ, τ) = n^{- 1} \sum_{i = 1}^{n} Z_{i}^{\otimes 2} {({\hat{ϑ}}_{2, i} I [X_{i} \leq g {Z_{i}^{T} {\hat{β}}_{I} (τ)}] - τ)}^{2} - n^{- 1} \sum_{i = 1}^{n} (1 - δ_{i}) {(\frac{\sum_{j = 1}^{n} Z_{j} I [X_{i} \leq X_{j} \leq g {Z_{j}^{T} {\hat{β}}_{I} (τ)}] {\hat{ϑ}}_{2, j}}{\sum_{j = 1}^{n} I (X_{j} \geq X_{i})})}^{\otimes 2} .

(4.4)

The estimation of the covariance Φ₁(τ′, τ) of n^1/2{β̂_I(τ′) − β(τ′)} and n^1/2{β̂_I(τ) − β(τ)} is outlined as follows.

Find a symmetric and nonsingular (p+1)×(p+1) matrix E_n(τ) ≡ {e_n,₁(τ), …, e_n,p₊₁(τ)} such that Σ̂₁(τ, τ) = {E_n(τ)}².
Calculate D_n(τ) = ([S₁_,n{e_n_,1(τ), τ}]⁻¹ − β̂_I(τ), …, [S_1,_n{e_n_,_p₊₁(τ), τ}]⁻¹ − β̂_I(τ)), where {S_1,_n(e, τ)}⁻¹ is the solution to S_1,_n(b, τ) − e = 0. Similar to (2.6) in Section 2.2, S_1,_n(b, τ) − e = 0 can be solved by minimizing
$U_{3, n} (b, τ) = \sum_{i = 1}^{n} {\hat{ϑ}}_{1, i} | g^{- 1} (X_{i}) - Z_{i}^{T} b | + | M + \sum_{i = 1}^{n} {\hat{ϑ}}_{2, i} Z_{i}^{T} b | + | M - \sum_{i = 1}^{n} 2 τ Z_{i}^{T} b | + | M - 2 n^{1 / 2} e^{T} b | .$
Estimate Φ₁(τ′, τ) by Φ̂₁(τ′, τ) = nD_n(τ′){E_n(τ′)}⁻¹ Σ′₁(τ′, τ){E_n(τ)}⁻¹ D_n(τ). In the special case of τ′ = τ, Φ̂₁(τ, τ) = n{D_n(τ)}^⊗2.

The estimation of the covariance Φ₂(τ′, τ) of n^1/2{β̂_A(τ′) − β(τ′)} and n^1/2{β̂_A(τ) − β(τ)} follows the same procedure as above by replacing Σ̂₁(τ′, τ) with Σ̂₂(τ′, τ) and S_1,_n(e, τ) with S_2,_n(e, τ). The proof of the consistency of the variance estimators is similar to that in Peng and Fine (2009), and thus is omitted.

5 Simulation study

5.1 Assessment of estimation under correctly specified models

The simulation study examines finite-sample performance of the IPW estimator and the AIPW estimator, along with the omniscient estimator (Omni) that assumes complete knowledge of J_i for uncensored failure times, and the complete-case estimator (CC) that deletes observations with missing causes. The Omni and CC estimators are computed via Peng and Fine’s (2009) method.

Let Z_i = (1, Z_i,₁, Z_i,₂), where Z_i,₁ is a uniform random variable on (0, 1) and Z_i,₂ is Bernoulli with probability of success equal to 0.5. The failure type J_i takes values of 1 and 2 with P(J_i = 1|Z_i) = p₀I(Z_i,₂ = 0) + p₁I(Z_i,₂ = 1). The failure time T_i follows the conditional distributions P(T_i < t|J_i = 1, Z_i) = Φ(log t − γ^TZ_i) and P (T_i < t|J = 2, Z_i) = Φ(log t − α^TZ_i), where Φ(·) denotes the cumulative distribution function of N (0, 1), γ = (γ₀, γ₁, γ₂), and α = (α₀, α₁, α₂). With this set-up, the underlying τth conditional quantile of T_i is

\begin{array}{l} log F_{1}^{- 1} (τ ∣ Z_{i}) = inf {t : P (log T_{i} < t, J_{i} = 1 ∣ Z_{i}) \geq τ} \\ = β_{0} (τ) + β_{1} (τ) Z_{i, 1} + β_{2} (τ) Z_{i, 2}, \end{array}

(5.1)

where β₀(τ) = γ₀ + Φ⁻¹ (τ/p₀), β₁(τ) = γ₁, and β₂(τ) = γ₂ + Φ⁻¹(τ/p₁) − Φ⁻¹(τ/p₀). The covariate Z_i,₂ has a varying effect on the cumulative incidence quantiles across different quantile levels, whereas Z_i,₁ has a constant effect.

Let the censoring time C_i follow a uniform distribution on (0, 8). We generated the missing failure type indicator R_i from the logistic model: $r (W_{i}) = P (R_{i} = 1 ∣ δ_{i} = 1, W_{i}) = exp (W_{i}^{T} ψ) / {1 + exp (W_{i}^{T} ψ)}$ , where W_i = (1, Z_i,₁, Z_i,₂, X_i, A_i)^T, X_i = min(T_i, C_i), and A_i is a univariate auxiliary variable. The values ψ = (1, −0.9, −1, 2, 0)^T and ψ = (1, −1.4, −1.5, 1, 0)^T correspond to 20% and 40% missing failure types, respectively. Here we chose not to include A_i in the missingness model so that we could compare the IPW and AIPW estimators under different degrees of association between A_i and J_i, holding the degree of missingness fixed at the same rate. This set-up suggests that a stronger association between A_i and J_i yields a more efficient AIPW estimator under the same level of missingness.

We consider three different levels of association between A_i and J_i, which correspond to three different choices of ρ(W_i) for the AIPW estimator. In Case 1, the auxiliary variable A_i is independent of failure type J_i given Z_i. For Cases 2 and 3, we let

\begin{matrix} P (A_{i} = 1 ∣ J_{i} = 1, Z_{i}) = θ, & P (A_{i} = 2 ∣ J_{i} = 1, Z_{i}) = 1 - θ, \\ P (A_{i} = 2 ∣ J_{i} = 2, Z_{i}) = θ, & P (A_{i} = 1 ∣ J_{i} = 2, Z_{i}) = 1 - θ, \end{matrix}

where 0 ≤ θ ≤ 1. Case 2 corresponds to θ = 0.8 and Case 3 corresponds to θ = 0.95. A larger value of θ indicates stronger positive association between A_i and J_i given Z_i. This model set-up results in a logistic regression model for ρ(W_i) with logit{ρ(W_i)} = φ₀ + φ₁Z_i,₁ + φ₂Z_i₂ + φ₃A_i. For Case 1, φ₁ = φ₃ = 0, φ₀ = log{p₀/(1 − p₀)} and φ₂ = log [p₁(1 − p₀)/{p₀(1 − p₁)}]. For Cases 2 and 3, φ₀ = 3 log{θ/(1 − θ)}, φ₁ = 0, φ₂ = log[p₁(1 − p₀)/{p₀(1 − p₁)}] and φ₃ = 2 log{(1 − θ)/θ}.

We set p₀ = 0.8, p₁ = 0.6, γ = (0, 0.5, −0.5)^T, and α = (0, 0, −0.5)^T. Under this setting, on average 55% of the subjects fail from type-1 failure, 25% fail from type-2 failure, and the remaining 20% are right-censored. The performances of the four estimators, Omni, CC, IPW and AIPW, for β(τ) at τ = 0.2 and 0.4 with sample sizes n = 200 and n = 500 and two missing-causes percentages are summarized in Tables 1–4. The tables report the bias, empirical standard deviation, mean estimated standard deviation, and empirical coverage probability of 95% Wald-type confidence intervals based on 500 simulated data sets.

Table 1.

The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.2, based on 500 simulated data sets with 20% missing causes.

Method	Bias×10³			EmpSD× 10³			EstSD×10³			CovP×10²
Method	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	n = 200
Omni	5	−24	10	274	438	245	304	486	277	92.4	92.0	92.6
CC	68	71	184	288	477	265	301	490	296	91.0	90.8	87.8
IPW	−9	−5	21	337	575	307	341	574	334	92.0	90.0	93.6
AIPW (Case 1)	−8	0	24	316	531	281	334	547	314	91.0	90.4	92.6
AIPW (Case 2)	−10	−2	22	297	497	269	308	521	297	92.2	93.0	94.6
AIPW (Case 3)	−1	−21	17	284	461	246	303	499	280	91.6	91.4	94.2

	n = 500
Omni	7	−7	2	173	278	156	170	293	166	91.8	93.4	93.6
CC	83	64	163	184	297	165	178	293	174	87.0	91.2	79.8
IPW	−2	8	−5	196	338	192	207	347	203	91.6	91.2	92.0
AIPW (Case 1)	4	0	1	188	314	182	198	333	189	92.6	93.2	91.2
AIPW (Case 2)	2	3	2	181	304	171	194	314	182	93.8	93.2	93.4
AIPW (Case 3)	4	1	5	178	285	162	178	290	171	91.0	92.0	93.0

Open in a new tab

Table 4.

The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.4, based on 500 simulated data sets with 40% missing causes.

Method	Bias×10³			EmpSD×10³			EstSD×10³			CovP×10²
Method	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	n = 200
Omni	6	15	8	307	514	308	343	566	338	92.4	93.2	93.6
CC	43	211	395	347	612	432	388	671	492	91.6	90.6	84.4
IPW	−13	57	49	467	840	543	489	866	593	93.8	94.4	93.4
AIPW (Case 1)	22	−4	52	420	766	505	438	786	508	94.6	95.0	93.0
AIPW (Case 2)	15	−4	24	394	667	416	402	676	435	93.0	93.4	92.8
AIPW (Case 3)	−1	11	19	338	539	364	341	594	354	92.0	95.4	91.8

	n = 500
Omni	11	−20	18	199	323	193	196	326	199	91.8	93.0	94.2
CC	53	174	389	212	367	264	227	382	274	91.4	90.8	72.8
IPW	0	14	26	267	494	310	276	494	322	92.2	92.0	93.8
AIPW (Case 1)	3	−4	42	245	443	284	254	450	293	94.4	95.6	93.8
AIPW (Case 2)	11	−21	33	234	410	257	237	413	269	92.6	93.8	95.4
AIPW (Case 3)	8	−19	27	207	352	212	212	355	218	92.2	94.2	95.0

Open in a new tab

Note that the choices of ρ(W_i) do not change the IPW estimator. Only the results for the AIPW estimator are reported for Cases 1–3. The CC estimator had substantial bias for all scenarios. The IPW and AIPW estimators performed comparably to the Omni estimator with very small biases. In addition, the estimated standard deviations matched very well with the empirical ones, and the 95% confidence intervals had reasonable coverage probabilities, except for the CC estimator.

For the analysis of Mashi data presented in the next section, small values of τ = 0.005, 0.01, and 0.02 were considered due to small percentages of HIV-related and HIV-unrelated deaths. Furthermore, the Mashi analysis had a larger sample size. To mimic Mashi, additional simulations at τ = 0.01 with n = 1200 were conducted. The results, reported in Table S.1 of the Supplementary Material, show that the biases of the AIPW estimator remain small under 20% and 40% of missing causes. The biases of the IPW estimator are also small under 20% of missing causes. At 40% of missing causes, the biases of the IPW estimator are large compared to those for the AIPW estimator, but these biases for the slope coefficients are still smaller than those of the CC estimator.

Table 5 shows the Pitman relative efficiencies (ratios of variances) for the IPW and AIPW estimators with respect to the Omni estimator. By incorporating information from the missing failure types, AIPW improved efficiency over IPW, with greater improvement when there was a stronger association between the auxiliary variable A_i and J_i. For Case 3 with n = 500, the efficiencies of AIPW were comparable to those of the Omni estimator.

Table 5.

Pitman relative efficiencies of the IPW and AIPW estimators with respect to the Omni estimator based on 500 simulated data sets. The maximum standard error of the relative efficiencies is 0.07. MP stands for the missingness proportion of failure causes.

n	IPW			AIPW(Case 1)			AIPW(Case 2)			AIPW(Case 3)
n	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	MP=20%, τ = 0.2

200	0.66	0.58	0.64	0.75	0.68	0.76	0.85	0.78	0.83	0.93	0.90	0.99
500	0.78	0.68	0.66	0.85	0.79	0.74	0.91	0.84	0.84	0.95	0.95	0.93

	MP=20%, τ = 0.4
200	0.74	0.67	0.75	0.81	0.74	0.76	0.86	0.81	0.78	0.95	0.93	0.92
500	0.83	0.79	0.79	0.87	0.84	0.76	0.91	0.88	0.86	0.94	0.93	0.96

	MP=40%, τ = 0.2
200	0.41	0.33	0.32	0.57	0.49	0.44	0.65	0.58	0.55	0.92	0.84	0.85
500	0.46	0.36	0.37	0.62	0.49	0.49	0.70	0.58	0.60	0.97	0.90	0.85

	MP=40%, τ = 0.4
200	0.43	0.37	0.32	0.53	0.45	0.37	0.61	0.59	0.55	0.83	0.91	0.72
500	0.56	0.43	0.39	0.66	0.53	0.46	0.72	0.62	0.56	0.92	0.84	0.82

Open in a new tab

5.2 On robustness of estimation

To assess how sensitive the proposed methods are to model misspecifications for r(W_i) and/or ρ(W_i), and to violations of the missing at random assumption, we consider four additional cases, namely, Cases 4–7. In Case 4, instead of a logistic model we generated the missing failure type indicator R_i from the probit model: $r (W_{i}) = φ (W_{i}^{T} ψ)$ , where ψ = (1, −0.9, −1.4, 2, 0)^T and W_i = (1, Z_i,₁, Z_i,₂, X_i, A_i)^T; whereas both the IPW and AIPW estimators still use logistic regression to estimate r(W_i), and excluding X_i from W_i. Case 5 has the same design as Case 2, and Case 6 has the same design as Case 4. In both Cases 5 and 6, ρ(W_i) is estimated by excluding the important variable A_i in the logistic regression. Therefore, r(W_i) is misspecified in Case 4, ρ(W_i) is misspecified in Case 5, and both models are misspecified in Case 6. In Case 7, we generated the missing failure type indicator R_i from the logistic model: $P (R_{i} = 1 ∣ δ_{i} = 1, J_{i}, W_{i}^{T}) = exp (W_{i}^{T} ψ - J_{i}) / {1 + exp (W_{i}^{T} ψ - J_{i})}$ , where W_i = (1, Z_i,₁, Z_i,₂, X_i, A_i)^T and ψ = (2.5, −0.9, −1, 2, 0)^T. Since the probability of missingness depends on the unobserved failure type J_i, the missing at random assumption is violated in Case 7. In all four cases, the missing-cause proportion is 20%.

Table 6 reports the bias, empirical standard deviation, mean estimated standard deviation, and empirical coverage probability of 95% Wald-type confidence intervals for the Omni, CC, IPW, and AIPW estimators based on 500 simulated data sets for Cases 4–7 at τ = 0.2 and with n = 500. The presentations of the summaries for different τ values are given in Figures S.1–6 of the Supplementary Material. When r(W_i) was misspecified, the IPW estimator performed similar to the CC estimator, both having large biases for estimating β₂(τ). As expected from its double robustness property, the AIPW estimator performed well in Cases 4–5 when one of the two models for r(W_i) and ρ(W_i) was misspecified. Since the IPW estimator does not utilize ρ(W_i), there is no misspecification for the IPW estimator under Case 5. When both models were misspecified in Case 6, the AIPW estimator had slightly larger biases than in Case 4, but still outperformed the CC estimator, in particular, for β₁(τ) and β₂(τ). Since both the IPW and AIPW estimators are developed based on the MAR assumption, it is no surprise that the IPW and AIPW estimators showed no improvement over the CC estimator in Case 7.

Table 6.

Method robustness. The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.2 with n = 500, based on 500 simulated data sets with 20% missing causes.

Method	Bias×10³			EmpSD×10³			EstSD×10³			CovP×10²
Method	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	Case 4: r(w) is misspecified
Omni	4	1	3	169	275	161	177	291	166	92.8	92.0	93.8
CC	10	92	333	166	274	157	169	281	162	92.8	91.2	50.0
IPW	18	73	294	171	278	159	170	286	160	92.2	91.2	54.2
AIPW	26	−51	−34	173	284	160	176	281	166	92.4	90.6	93.6

	Case 5: ρ(w) is misspecified
Omni	7	−7	2	173	278	156	170	293	166	91.8	93.4	93.6
CC	83	64	163	184	297	165	178	293	174	87.0	91.2	79.8
IPW	−2	8	−5	196	338	192	207	347	203	91.6	91.2	92.0
AIPW	6	−5	−3	188	315	179	193	334	193	91.6	93.0	93.4

	Case 6: both r(w) and ρ(w) are misspecified
Omni	4	1	3	169	275	161	177	291	166	92.8	92.0	93.8
CC	10	92	333	166	274	157	169	281	162	92.8	91.2	50.0
IPW	18	73	294	171	278	159	170	286	160	92.2	91.2	54.2
AIPW	35	−72	−46	173	287	162	175	285	167	92.8	91.6	92.6

	Case 7: missing-at-random assumption is violated
Omni	7	−7	2	173	278	156	170	293	166	91.8	93.4	93.6
CC	45	33	58	179	293	162	179	297	171	91.8	93.6	91.4
IPW	−5	−41	−95	191	328	179	196	331	188	91.0	90.4	89.6
AIPW	−7	−42	−85	180	300	167	184	295	170	92.0	91.2	89.4

Open in a new tab

6 Analysis of the Mashi data

The Mashi trial investigated the effect of formula- versus breast-feeding plus extended infant zidovudine prophylaxis among HIV-infected expecting mothers in Botswana (Thior et al. (2006)). Five-hundred and ninety-one women were randomized to formula feeding from birth plus 1 month of infant zidovudine (FF), and 588 women were randomized to breast-feeding from birth plus 6 months of infant zidovudine (BF+AZT). Live first-born infants were followed for 18-months for occurrence of the two primary endpoints, HIV infection and death. HIV-PCR tests were administered at visits at birth and at monthly ages of 1, 2, 3, 4, 5, 6, 7, 9, 12, 18 (with little missing data). The primary objectives assessed the treatment effect on these endpoints separately, as well as on the composite endpoint defined as the first event of HIV infection or death. A secondary objective was to assess the treatment effect on death due to HIV infection, which we refer to as HIV-related death. We apply the methods above to assess $F_{1}^{- 1} (τ ∣ z)$ with J = 1 HIV-related death and J = 2 HIV-unrelated death.

We take a death to be HIV-related if either (1) the study clinicians deemed the death HIV-1 related (n = 4 deaths), or (2) the infant had at least one positive test result from the PCR assay used to test for HIV infection prior to death (n = 24 deaths). In addition, we take a death to be HIV-unrelated if the study clinician deemed the death unrelated to HIV/AIDS (n = 22 deaths). Of the 111 live-born infants who died, the cause of death is known in 50 cases and missing for 61.

Considering 20 covariates of the babies or their mothers, we used logistic regression and all-subsets model selection (with criterion Mallow’s Cp) to select a model for predicting among cases whether J was observed. The model included the following variables (estimated regression coefficient): the infant had birthweight < 2.5 kilograms (1.21); the randomization assignment of mom/baby to receive Placebo/Placebo was switched to Placebo/Nevirapine part-way during the trial due to a DSMB recommendation (−1.27); the infant had AZT toxicity (1.43); log 10 plasma viral load level of the mom at delivery (0.98); and the baby was hospitalized with a serious adverse event (−1.20). Using the same model selection strategy for analyzing cases with known death-cause, the following variables were included in the model for predicting J = 1: the infant received HAART (2.42), and log 10 plasma viral load level of the mom at delivery (1.70).

For assessing the treatment effect of BF+AZT versus FF we used the identity link function. The covariate of interest is Z = c(1, Z₁), where Z₁ is 1 for mother-infant pairs assigned BF+AZT and 0 for FF. The estimation of the quantile is invariant to the link function in this particular case, but the estimated values of the coefficients β₀(τ) and β₁(τ) can be different for different link functions. With the identity link, β₁(τ) represents the treatment effect on the τth type-1 quantile. Let X be the survival time in days. According to the above logistic regressions, we let W include the variables that proved informative for P(R = 1|δ = 1, W) and/or for the probability of HIV-related death P(J = 1|δ = 1, W). We considered the subset of data with complete covariate information, that includes 1123 live-born infants (of the 1193 total), among whom 107 died and 49 died with known cause of death (28 are HIV-related). Based on the data, about 2.5% of infants died while known to be HIV infected (J = 1), and 54.2% of the infants who died had missing death cause.

We performed the quantile regression at τ = 0.005, 0.01, 0.02 and 0.03. The analysis at τ = 0.005 is interesting because it concerns early death and there were many early deaths in the data set. Table 7 summarizes the analysis results using the IPW and AIPW methods. From Table 7, by the AIPW method, the p-values for testing the treatment effect at τ = (0.005, 0.01, 0.02, 0.03) were (0.138, 0.042, 0.062, 0.52), respectively. The results indicate that BF+AZT had some positive effect in postponing/reducing HIV-related deaths compared to FF at the quantiles corresponding to τ = 0.01 and 0.02. Using the AIPW method, the HIV-related death rate reached 1% by 184 days for those assigned to BF+AZT, while it reached 1% by 64 days for those assigned FF. In addition, it reached 2% by 276 days for BF+AZT and 113 days for FF. This analysis suggests that it takes longer for the BF+AZT group to reach the same percentage of HIV-related deaths than the FF group, by 120–163 days. The estimated treatment effect using the AIPW estimator decreased at τ = 0.03, and the standard error increased because of the small number of deaths after the 0.03 quantile. The estimated treatment effect was also small at τ = 0.005. The IPW method did not identify a significant treatment effect at any of the quantile levels evaluated. This is attributed to the limited number of deaths and the high percentage of unknown death causes among those who died, and the AIPW method was able to recover some of the lost information by modeling the probability of HIV-related death under the missing at random assumption (2.3). The large differences in the IPW and AIPW estimation of the treatment effect at τ = 0.005 and 0.01 in Table 7 reflect the fact that the IPW estimation is not numerically stable.

Table 7.

Analysis of the Mashi data with the IPW and AIPW methods.

	IPW			AIPW			Rel. Efficiency IPW vs AIPW
	Coef Est.	S.E.	p-value	Coef Est.	S.E.	p-value	Rel. Efficiency IPW vs AIPW
τ = 0.005
Intercept	52.0	19.2	0.007	37.0	22.1	0.09	1.32
Treatment	38.0	61.7	0.54	102.0	68.8	0.14	1.24
τ = 0.01
Intercept	64.0	31.4	0.04	64.0	33.6	0.06	1.15
Treatment	66.0	108.9	0.54	120.0	59.1	0.04	0.29
τ = 0.02
Intercept	94.0	113.0	0.41	113.0	95.9	0.24	0.72
Treatment	182.0	107.3	0.09	163.0	87.5	0.06	0.66
τ = 0.03
Intercept	207.0	164.5	0.21	214.0	147.0	0.15	0.80
Treatment	91.0	186.7	0.63	84.0	132.1	0.52	0.50

Open in a new tab

The difference in the performances of the AIPW and IPW estimators for the Mashi data analysis is consistent with what we observed in the simulation study. That is, the AIPW estimator shows large efficiency gain over the IPW estimator when A_i and J_i are strongly correlated, and is still more efficient than the IPW estimator even when A_i and J_i are independent. We infer that both the efficiency of the AIPW method and the informativeness of the auxiliary variables for HIV-related death contributed to the efficiency gain.

We stress that the quantile regression based on the cumulative incidence function studies the “crude” effect on the time to HIV-related death in the presence of other competing risks, i.e., HIV-unrelated death. This analysis is directly interpretable and relevant. However, it should not be used to infer the “net effect”; this would require strong untestable assumptions and/or sensitivity analysis.

In conclusion, this analysis provides additional insights over the primary study results that showed that infants assigned to formula-feed (FF) had a higher rate of all-cause mortality by 7 months of age than infants assigned BF+AZT, but a lower rate of HIV infection (Thior et al. (2006)). Prior to the current analysis, a beneficial effect of either BF+AZT or FF on HIV-related death was plausible: for BF+AZT because breast-feeding decreases the general early death rate; for FF because, by decreasing the rate of early HIV infection, it reduces the number of infants that could potentially die from HIV. The analysis here supports that the beneficial effect of formula-feeding to reduce HIV infections is overwhelmed by the stronger deleterious effect of formula-feeding to increase early deaths in HIV-infected infants. These results support breast-feeding plus antiretroviral prophylaxis during the first several months of life for infants born to HIV-infected mothers in Botswana.

Supplementary Material

Supplementary Materials

NIHMS452342-supplement-Supplementary_Materials.pdf^{(103.7KB, pdf)}

Table 2.

The average bias (Bias), empirical standard deviation (EmpSD), mean estimated standard deviation (EstSD), and empirical coverage probability (CovP) of 95% confidence intervals at τ = 0.4, based on 500 simulated data sets with 20% missing causes.

Method	Bias×10³			EmpSD×10³			EstSD×10³			CovP×10²
Method	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	n = 200
Omni	6	15	8	307	514	308	343	566	338	92.4	93.2	93.6
CC	77	28	148	313	538	334	345	565	351	92.0	93.4	91.4
IPW	−7	32	25	358	630	356	374	639	398	90.6	92.8	94.0
AIPW (Case 1)	−4	28	34	341	599	354	372	640	397	92.2	94.0	93.4
AIPW (Case 2)	−3	22	30	332	572	349	351	587	370	92.6	92.6	92.8
AIPW (Case 3)	5	13	13	315	532	321	334	557	342	92.0	94.8	94.6

	n = 500
Omni	11	−20	18	199	323	193	196	326	199	91.8	93.0	94.2
CC	88	−13	141	198	322	193	196	337	210	89.0	93.2	89.0
IPW	4	−4	15	218	363	217	218	363	230	91.0	91.6	94.8
AIPW (Case 1)	8	−15	20	214	353	220	221	363	224	91.0	94.6	93.8
AIPW (Case 2)	6	−9	18	208	344	207	213	357	213	92.2	95.0	94.2
AIPW (Case 3)	10	−17	19	205	335	196	204	334	206	92.4	94.2	95.0

Open in a new tab

Table 3.

Method	Bias×10³			EmpSD×10³			EstSD×10³			CovP×10²
Method	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂	β₀	β₁	β₂
	n = 200
Omni	5	−24	10	274	438	245	304	486	277	92.4	92.0	92.6
CC	42	226	346	316	551	360	362	606	370	92.2	91.0	79.0
IPW	−51	45	66	430	768	436	450	779	425	89.6	91.0	89.2
AIPW (Case 1)	2	−18	35	363	628	371	381	632	363	91.2	92.4	90.8
AIPW (Case 2)	2	−16	32	338	575	330	343	548	328	91.0	90.8	93.0
AIPW (Case 3)	3	−16	12	286	477	265	309	502	293	91.8	92.8	93.4

	n = 500
Omni	7	−7	2	173	278	156	170	293	166	91.8	93.4	93.6
CC	46	239	325	204	348	212	209	357	221	90.8	86.6	69.4
IPW	1	7	9	256	467	257	266	473	263	93.4	92.4	91.0
AIPW (Case 1)	−1	0	19	221	399	224	222	388	222	92.8	91.2	92.2
AIPW (Case 2)	6	−13	10	207	365	202	213	354	204	92.2	93.4	91.0
AIPW (Case 3)	5	−5	10	176	294	169	187	303	173	92.6	94.0	92.6

Open in a new tab

Acknowledgments

The authors thank the Mashi study team (led by Dr. Max Essex) and the Mashi study participants for the data. The authors also thank the Editor, an associate editor, and the referee for their thoughtful and constructive comments that have significantly improved the paper. This research was partially supported by NSF grants DMS-0604576 and DMS-0905777 (Yanqing Sun), DMS-0706963 and DMS-1007420 (Huixia J. Wang), and NIH grant 2 R37 AI054165-09 (Yanqing Sun and Peter Gilbert).

7 Appendix

The following regularity conditions are assumed in Sections 3 and 4.

C1
There exists ν > 0 such that P(C = ν) > 0 and P(C > ν) = 0.
C2
Z is uniformly bounded with sup_i ||Z_i|| < ∞.
C3
For 0 < τ_L ≤ τ_U < τ₀ = inf_z P (T ≤ ν, J = 1 | z), β(τ) is Lipschitz continuous for τ ∈ [τ_L, τ_U], and f₁(t | z) is bounded in t and z, where f₁(t | z) = ∂F₁(t | z)/∂t.
C4
For some ρ₀ > 0 and c₀ > 0, inf_b∈
(ρ₀) eigminA(b) ≥ c₀, where (ρ) = {b ∈ R^p⁺¹: inf_{τ∈[τ_Lτ_U]}||b − β(τ)|| ≤ ρ}, A(b) = E[Z^⊗2f₁{g(Z^Tb)|Z}], eigminA(b) is the minimum of the eigenvalues of A(b), and u^⊗2 = uu^T.
C5
π(Q, ψ) and ρ(Q, φ) are twice differentiable with respect to ψ and φ, respectively; π (Q, ψ) ≥ α > 0; $π_{ψ}^{'} (Q) = d π (Q, ψ) / d ψ$ is uniformly bounded; both ρ(W, φ) and $ρ_{φ}^{'} (W) = d ρ (Q, φ) / d φ$ are uniformly bounded.

The first four conditions are similar to those of Peng and Fine (2009). The condition C5 requires that the probability of non-missingness be bounded away from zero, as well as other boundedness conditions that are needed to establish weak convergence of the empirical processes.

Proof of Theorem 3.1

Let $S_{n} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b} {\hat{ϑ}}_{i} - τ]$ . The following proof of the consistency holds for both the estimators that are the roots of S_n(b, τ) by taking ϑ̂_i = ϑ̂₁_,i for the IPW estimator and ϑ̂_i = ϑ̂₂_,i for AIPW estimator. Let ϑ₁_,i = R_iδ_iI(J_i = 1){π(Q_i)G(T_i)}⁻¹ and ϑ₂_,i = R_iδ_iI(J_i = 1){π(Q_i)G(T_i)}⁻¹ + [1 − {π(Q_i)}⁻¹R_i] δ_iρ(W_i) {G(T_i)}⁻¹. We use ϑ_i = ϑ₁_,i for the IPW estimator and ϑ_i = ϑ₂_,i for the AIPW estimator. For brevity, sup_b and sup_τ denote the supremum taken over b ∈ ℝ^p⁺¹ and τ ∈ [τ_L, τ_U], respectively.

Let $S_{n}^{G} (b, τ) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} b} ϑ_{i} - τ]$ and $μ (b, τ) = E (Z_{i} [F_{1} {g (Z_{i}^{T} b) ∣ Z_{i}} - τ])$ . Under the missing at random assumption (2.3) and the conditional independence between (T_i, J_i) and C_i given Z_i, $E {n^{- 1 / 2} S_{n}^{G} (b, τ)} = E [E {n^{- 1 / 2} S_{n}^{G} (b, τ) ∣ Q_{i}, J_{i}}] = μ (b, τ)$ .

By Condition C1 and C5, for every r > 0, we have, sup_t_<_ν |Ĝ (t) − G(t)| = o_p(n^−1/2+^r), |ψ̂ − ψ| = o_p(n^−1/2+^r) and |φ̂ − φ| = o_p(n^−1/2+^r). This, coupled with C2 and C5, implies that ${sup}_{τ, b} | | n^{- 1 / 2} {S_{n} (b, τ) - S_{n}^{G} (b, τ)} | | = o_{p} (n^{- 1 / 2 + r})$ . It follows from arguments similar to those of Peng and Fine (2009) that ${sup}_{τ, b} | | n^{- 1 / 2} S_{n}^{G} (b, τ) - μ (b, τ) | | = o_{p} (1)$ , and thus sup_τ,b ||n^−1/2 S_n(b, τ) − μ(b, τ)|| = o_p(1). This, together with μ{β(τ), τ} = 0, implies the uniform consistency of both β̂_I(τ) and β̂_A(τ) under C4.

Proof of Theorem 3.2

Let β̂ (τ) be the root of S_n(b, τ). First we show that S_n{β(τ), τ} converges weakly to a mean zero Gaussian process and derive its asymptotic covariance matrix. Note that

S_{n} {β (τ), τ} = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} ϑ_{i} - τ] + n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} ({\hat{ϑ}}_{i} - ϑ_{i}) .

(7.1)

The asymptotic approximation for (7.1) is obtained below for the IPW and AIPW estimators, respectively.

For the IPW estimator, ϑ̂_i and ϑ_i of (7.1) correspond to ϑ̂₁_,i and ϑ₁_,i, respectively, and β̂ (τ) = β̂_I(τ). Let $δ_{i}^{*} = R_{i} δ_{i} I (J_{i} = 1)$ . We have

{\hat{ϑ}}_{1, i} - ϑ_{1, i} = - {\frac{\hat{G} (X_{i}) - G (X_{i})}{\hat{G} (X_{i}) G (X_{i}) π (Q_{i})} + \frac{\hat{π} (Q_{i}) - π (Q_{i})}{\hat{π} (Q_{i}) π (Q_{i}) \hat{G} (X_{i})}} δ_{i}^{*} .

(7.2)

From Pepe (1991),

sup_{t \in [0, ν)} | | n^{1 / 2} {\hat{G} (t) - G (t)} + n^{- 1 / 2} G (t) \sum_{j = 1}^{n} \int_{0}^{t} y^{- 1} (s) {d M}_{j}^{G} (s) | | \overset{P}{\to} 0,

(7.3)

where $M_{j}^{G} (\cdot)$ and y(·) are defined in Section 3 just before Theorem 3.1.

By (3.1),

n^{1 / 2} {\hat{π} (Q_{i}) - π (Q_{i})} = n^{- 1 / 2} {π_{ψ}^{'} (Q_{i}, ψ)}^{T} \sum_{j = 1}^{n} η_{j} + o_{p} (1) .

(7.4)

By (7.2), (7.3) and (7.4), the second term of (7.1) is

n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} δ_{i}^{*} [{\hat{G} (X_{i}) π (Q_{i})}^{- 1} n^{- 1} \sum_{j = 1}^{n} \int_{0}^{X_{i}} y^{- 1} (s) {d M}_{j}^{G} (s) - {\hat{π} (Q_{i}) π (Q_{i}) \hat{G} (X_{i})}^{- 1} n^{- 1} {π_{ψ}^{'} (Q_{i}, ψ)}^{T} \sum_{j = 1}^{n} η_{j}] + o_{p} (1) .

Writing $\int_{0}^{X_{i}} y^{- 1} (s) {d M}_{j}^{G} (s) = \int_{0}^{\infty} Y_{i} (s) y^{- 1} (s) {d M}_{j}^{G} (s)$ and changing the order of the summations, the above is

\begin{array}{l} n^{- 1 / 2} \sum_{j = 1}^{n} \int_{0}^{\infty} (n^{- 1} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} δ_{i}^{*} {\hat{G} (X_{i}) π (Q_{i})}^{- 1} Y_{i} (s) y^{- 1} (s)) {d M}_{j}^{G} (s) \\ - n^{- 1 / 2} \sum_{j = 1}^{n} (n^{- 1} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} δ_{i}^{*} {\hat{π} (Q_{i}) π (Q_{i}) \hat{G} (X_{i})}^{- 1} {π_{ψ}^{'} (Q_{i}, ψ)}^{T}) η_{j} . \end{array}

(7.5)

Let $F = {Z_{i} I {X_{i} \leq g (Z_{i}^{T} b)} δ_{i}^{*} {π (Q_{i}) G (X_{i})}^{- 1} Y_{i} (t); for b \in R^{p + 1}, t \in [0, ν)}$ . The function class Inline graphic is Donsker, and thus Glivenko-Cantelli (van der Vaart and Wellner (1996)) because the class of indicator functions is Donsker, and Z_i, {π(Q_i)G(X_i)}⁻¹ and $δ_{i}^{*}$ are uniformly bounded. It follows from the Glivenko-Cantelli Theorem that $n^{- 1} \sum_{i = 1}^{n} Z_{i} Y_{i} (t) {π (Q_{i}) G (X_{i})}^{- 1} I {X_{i} \leq g (Z_{i}^{T} b)} δ_{i}^{*} y^{- 1} (t) \overset{P}{\to} E [Z_{i} I {X_{i} \leq g (Z_{i}^{T} b)} δ_{i}^{*} {π (Q_{i}) G (X_{i})}^{- 1} Y_{i} (t)] y^{- 1} (t)$ , uniformly in both b ∈ ℝ^p⁺¹ and t ∈ [0, ν). The limit is w₁(b, t), defined in Section 3 just before Theorem 3.1, under MAR and the independent censoring assumption. Since Ĝ (X_i) = G(X_i) + O_p(n^−1/2) and π̂ (Q_i) = π(Q_i) + O_p(n^−1/2) uniformly in i ∈ {1, … n}, $n^{- 1} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} δ_{i}^{*} {\hat{G} (X_{i}) π (Q_{i})}^{- 1} Y_{i} (t) y^{- 1} (t) \overset{P}{\to} w_{1} {β (τ), t}$ uniformly in τ ∈ [τ_L, τ_U] and t ∈ [0, ν).

Similarly, $- n^{- 1} \sum_{i = 1}^{n} Z_{i} I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)} δ_{i}^{*} {\hat{π} (Q_{i}) π (Q_{i}) \hat{G} (X_{i})}^{- 1} {π_{ψ}^{'} (Q_{i}, ψ)}^{T} \overset{P}{\to} w_{2} {β (τ)}$ uniformly in τ ∈ [τ_L, τ_U], where w₂(b) is defined in Section 3 just before Theorem 3.1.

By (7.1) and (7.5), the next asymptotic equivalence follows by applying Lemma 2 of Gilbert, McKeague, and Sun (2008) to (7.5):

S_{n} {β (τ), τ} = n^{- 1 / 2} \sum_{i = 1}^{n} ξ_{1, i} (τ) + o_{p} (1) .

(7.6)

uniformly in τ ∈ [τ_L, τ_U], where ξ₁_,i(τ) = a₁_,i(τ) + b_i(τ) + c_i(τ), and a₁_,i(τ), b_i(τ) and c_i(τ) are defined in Section 3 just before Theorem 3.1.

For the AIPW estimator, ϑ̂_i and ϑ_i of (7.1) correspond to ϑ̂_2,_i and ϑ_2,_i, respectively, and β̂ (τ) = β̂_A(τ). Then ϑ̂_2,_i − ϑ_2,_i is

\begin{array}{l} {\hat{ϑ}}_{1, i} - ϑ_{1, i} + \frac{δ_{i}}{\hat{G} (X_{i})} {1 - \frac{R_{i}}{\hat{π} (Q_{i})}} \hat{ρ} (W_{i}) - \frac{δ_{i}}{G (X_{i})} {1 - \frac{R_{i}}{π (Q_{i})}} ρ (W_{i}) \\ = {\hat{ϑ}}_{1, i} - ϑ_{1, i} + {\frac{δ_{i}}{\hat{G} (X_{i})} - \frac{δ_{i}}{G (X_{i})}} {1 - \frac{R_{i}}{\hat{π} (Q_{i})}} \hat{ρ} (W_{i}) \\ + \frac{δ_{i}}{G (X_{i})} {\frac{R_{i}}{π (Q_{i})} - \frac{R_{i}}{\hat{π} (Q_{i})}} \hat{ρ} (W_{i}) + \frac{δ_{i}}{G (X_{i})} {1 - \frac{R_{i}}{π (Q_{i})}} {\hat{ρ} (W_{i}) - ρ (W_{i})} . \end{array}

Now, we apply the decompositions (7.3), (7.4), and (3.2), and plug them into (7.1). By the Glivenko-Cantelli Theorem, we can show that

\begin{array}{l} n^{- 1} \sum_{i = 1}^{n} δ_{i} Z_{i} Y_{i} (t) {G (X_{i})}^{- 1} I {X_{i} \leq g (Z_{i}^{T} b)} {1 - R_{i} π^{- 1} (Q_{i})} ρ (W_{i}) y^{- 1} (t) \overset{P}{\to} 0, \\ n^{- 1} \sum_{i = 1}^{n} δ_{i} Z_{i} ρ_{φ}^{'} (W_{i}, φ) {G (X_{i})}^{- 1} I {X_{i} \leq g (Z_{i}^{T} b)} {1 - R_{i} π^{- 1} (Q_{i})} \overset{P}{\to} 0, \\ n^{- 1} \sum_{i = 1}^{n} δ_{i} R_{i} Z_{i} π_{ψ}^{'} (Q_{i}, ψ) {π^{2} (Q_{i}) G (X_{i})}^{- 1} ρ (W_{i}) I {X_{i} \leq g (Z_{i}^{T} b)} \overset{P}{\to} w_{3} (b) \end{array}

uniformly in both b ∈ ℝ^p⁺¹ and t ∈ [0, ν), where $w_{3} (b) = E [δ_{i} Z_{i} π_{ψ}^{'} (Q_{i}, ψ) {π (Q_{i}) G (X_{i})}^{- 1} ρ (W_{i}) I {X_{i} \leq g (Z_{i}^{T} b)}]$ . It is easy to see that w₃(b) = −w₂(b).

Using similar techniques as for the IPW estimator, we obtain

S_{n} {β (τ), τ} = n^{- 1 / 2} \sum_{i = 1}^{n} ξ_{2, i} (τ) + o_{p} (1) .

(7.7)

uniformly in τ ∈ [τ_L, τ_U] in probability, where ξ_2,_i(τ) = a_2,_i(τ) + b_i(τ) and $a_{2, i} (τ) = Z_{i} (I [X_{i} \leq g {Z_{i}^{T} β (τ)}] ϑ_{2, i} - τ)$ .

We have derived the asymptotic approximations of S_n{β(τ), τ} in (7.6) and (7.7) for the IPW estimator and AIPW estimator, respectively. It is obvious that the function class {c_i(τ), τ ∈ [τ_L, τ_U]} is Donsker. Applying the similar arguments for Inline graphic , the function classes {a₁_,i(τ), τ ∈ [τ_L, τ_U]} and {a₂_,i(τ), τ ∈ [τ_L, τ_U]} are Donsker by the Lipschitz continuity of β(·) implied by C3, and by using the fact that the Donsker Property is preserved under the Lipschitz transformation. It is not difficult to show that $\int_{0}^{\infty} w_{1} (b, s) {d M}_{i}^{G} (s)$ is Lipschitz in b. Hence the function class {b_i(τ), τ ∈ [τ_L, τ_U]} is Donsker. The Donsker property is preserved under addition. As a result, S_n{β(τ), τ} converges weakly to a mean zero Guassian process with covariance matrix $\sum_{1} (τ^{'}, τ) = E {ξ_{1, i} (τ^{'}) ξ_{1, i}^{T} (τ)}$ by (7.6) for the IPW estimator, and it converges weakly to a mean zero Guassian process with covariance matrix $\sum_{2} (τ^{'}, τ) = E {ξ_{2, i} (τ^{'}) ξ_{2, i}^{T} (τ)}$ by (7.7) for the AIPW estimator, for τ, τ′ ∈ [τ_L, τ_U].

Next, simple algebraic manipulations show that S_n{β̂ (τ), τ}− S_n{β(τ), τ} = (I) + (II), where

\begin{array}{l} (I) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} ϑ_{i} (I [X_{i} \leq g {Z_{i} \hat{β} (τ)}]) - I [X_{i} \leq g {Z_{i} β (τ)}], \\ (II) = n^{- 1 / 2} \sum_{i = 1}^{n} Z_{i} ({\hat{ϑ}}_{i} - ϑ_{i}) (I [X_{i} \leq g {Z_{i} \hat{β} (τ)}] - I [X_{i} \leq g {Z_{i} β (τ)}]) . \end{array}

From Lemma 1 of Peng and Fine (2009) and the uniform consistency of β̂ (τ), it follows that the difference between (I) and n^1/2[μ{β̂ (τ), τ} − μ{β(τ), τ}] converges to zero uniformly in τ ∈ [τ_L, τ_U] in probability. By the first order Taylor expansion of ϑ̂_i around ϑ_i, (3.1), (3.2), (7.3) and applying Lemma 1 of Peng and Fine (2009), we can show that (II) = o(1) uniformly in τ ∈ [τ_L, τ_U] in probability. Taylor expansions of μ(b, τ) around b = β(τ), along with the fact that β̂ (τ) uniformly converges to β (τ), gives that

S_{n} {\hat{β} (τ), τ} - S_{n} {β (τ), τ} = [A {β (τ)} + ε_{n} (τ)] \cdot n^{1 / 2} {\hat{β} (τ) - β (τ)} + o_{p} (1),

where ${sup}_{τ} | | ε_{n} (τ) | | \overset{P}{\to} 0$ . Given S_n{β̂ (τ), τ} = o_p(n^−1/2), this further implies that

n^{1 / 2} {\hat{β} (τ) - β (τ)} = - {[A {β (τ)}]}^{- 1} S_{n} {β (τ), τ} + ε_{n}^{*} (τ),

where ${sup}_{τ} | | ε_{n}^{*} (τ) | | \overset{P}{\to} 0$ . The asymptotic approximations (3.3) and (3.4) for the IPW estimator and the AIPW estimator follow from (7.6) and (7.7), respectively. Hence, n^1/2{β̂ (τ) − β(τ)} converges weakly to a mean zero Guassian process with covariance matrix Φ₁(τ′, τ) = [A{β(τ′)}]⁻¹Σ₁(τ′, τ) [A{β(τ)}]⁻¹ for the IPW estimator and Φ₂(τ′, τ) = [A{β(τ′)}]⁻¹ Σ₂(τ′, τ) [A{β(τ)}]⁻¹ for the AIPW estimator.

Finally, we show that the AIPW estimator is more efficient than the IPW estimator by showing that Σ₂(τ′, τ) ≤ Σ₁(τ′, τ). Note that a₂_,i(τ) = a₁_,i(τ) + e_i(τ) and ξ₁_,i(τ) = ξ₂_,i(τ) + {c_i(τ) − e_i(τ)}, where e_i(τ) is defined in Section 3 just before Theorem 3.1. By (7.6) and (7.7), it suffices to show E[ξ₂_,i(τ′){c_i(τ) − e_i(τ)}^T] = 0.

Under MAR, R_i and J_i are conditionally independent given Q_i, we have

\begin{array}{l} E {a_{2, i} (τ^{'}) c_{i}^{T} (τ)} = E (\frac{R_{i} δ_{i} ρ (W_{i})}{π (Q_{i}) G (X_{i})} Z_{i} I [X_{i} \leq g {Z_{i}^{T} β (τ^{'})}] η_{i}^{T}) w_{2}^{T} {β (τ)} \\ - E (\frac{{R_{i} - π (Q_{i})} δ_{i} ρ (W_{i})}{π (Q_{i}) G (X_{i})} Z_{i} I [X_{i} \leq g {Z_{i}^{T} β (τ^{'})}] η_{i}^{T}) w_{2}^{T} {β (τ)}, \end{array}

which equals zero by E(η_i | Q_i) = 0. By E(η_i|Q_i) = 0, we also have $E {b_{i} (τ^{'}) c_{i}^{T} (τ)} = 0$ . Therefore $E {ξ_{2, i} (τ^{'}) c_{i}^{T} (τ)} = 0$ . Similarly, $E {ξ_{2, i} (τ^{'}) e_{i}^{T} (τ)} = 0$ . Hence $E {ξ_{1, i} (τ^{'}) ξ_{1, i}^{T} (τ)} = E {ξ_{2, i} (τ^{'}) ξ_{2, i}^{T} (τ)} + E [{c_{i} (τ^{'}) - e_{i} (τ^{'})} {c_{i} (τ) - e_{i} (τ)}^{T}]$ .

Proof of (4.1)

Let $f_{i} = Z_{i} [I {g^{- 1} (T_{i}) \leq Z_{i}^{T} β (τ)}] ϑ_{1, i}$ . Under the MAR assumption, R_i and J_i are conditionally independent given Q_i, and we have

\begin{array}{l} E {a_{1, i} (τ^{'}) b_{i}^{T} (τ)} = - E (f_{i} \int_{0}^{\infty} {[w_{1} {β (τ), s}]}^{T} λ^{G} (s) Y_{i} (s) d s) \\ = - \int_{0}^{\infty} w_{1} {β (τ^{'}), s} {[w_{1} {β (τ), s}]}^{T} y (s) λ^{G} (s) d s \\ = - E {b_{i} (τ^{'}) b_{i}^{T} (τ)}, \end{array}

\begin{array}{l} E {a_{1, i} (τ^{'}) c_{i}^{T} (τ)} = E (ϑ_{1, i} Z_{i} I [X_{i} \leq g {Z_{i}^{T} β (τ^{'})}] η_{i}^{T}) w_{2}^{T} (β (τ)) \\ = E (\frac{R_{i} δ_{i} ρ (W_{i})}{π (Q_{i}) G (X_{i})} Z_{i} I [X_{i} \leq g {Z_{i}^{T} β (τ^{'})}] η_{i}^{T}) w_{2}^{T} (β (τ)) \\ = - w_{2} {β (τ^{'})} I_{ψ}^{- 1} {[w_{2} {β (τ)}]}^{T}, \end{array}

where the last equation is obtained by the definition of η_i following (3.1). It is easy to see that $E {b_{i} (τ^{'}) c_{i}^{T} (τ)} = 0$ and $E {c_{i} (τ^{'}) c_{i}^{T} (τ)} = w_{2} {β (τ^{'})} I_{ψ}^{- 1} {[w_{2} {β (τ)}]}^{T}$ since $E (η_{i} η_{i}^{T}) = I_{ψ}^{- 1}$ . It follows that

\begin{array}{l} E [ξ_{1, i} (τ^{'}) {ξ_{1, i} (τ)}^{T}] = E [{a_{1, i} (τ^{'}) + b_{i} (τ^{'})} {a_{1, i} (τ) + b_{i} (τ)}^{T}] - w_{2} {β (τ^{'})} I_{ψ}^{- 1} {[w_{2} {β (τ)}]}^{T} \\ = E {a_{1, i} (τ^{'}) a_{1, i}^{T} (τ)} - E {b_{i} (τ^{'}) b_{i}^{T} (τ)} - w_{2} {β (τ^{'})} I_{ψ}^{- 1} {[w_{2} {β (τ)}]}^{T} . \end{array}

Contributor Information

Yanqing Sun, Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223.

Huixia Judy Wang, Department of Statistics, North Carolina State University, Raleigh, NC 27695.

Peter B. Gilbert, Department of Biostatistics, University of Washington and Fred Hutchinson Cancer Research Center, Seattle, WA 98109

References

Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]
Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. Journal of Pediatrics. 1995;126:191–197. doi: 10.1016/s0022-3476(95)70544-9. [DOI] [PubMed] [Google Scholar]
Beran R. Technical report. University of California; Berkeley: 1981. Nonparametric Regression With Randomly Censored Survival Data. [Google Scholar]
Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]
Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society, B. 1972;34:187–220. [Google Scholar]
Dunn DT, Newell ML, Ades AE, Peckham CS. Risk of human immunodeficiency virus type 1 transmission through breastfeeding. The Lancet. 1992;340:585–588. doi: 10.1016/0140-6736(92)92115-v. [DOI] [PubMed] [Google Scholar]
Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891. [Google Scholar]
Gilbert PB, McKeague IW, Sun Y. The two-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics. 2008;9:263–276. doi: 10.1093/biostatistics/kxm028. [DOI] [PubMed] [Google Scholar]
Goetghebeur E, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–834. [Google Scholar]
Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1952;47:663–685. [Google Scholar]
Huang Y. Calibration regression of censored lifetime medical cost. Journal of the American Statistical Association. 2002;97:318–327. [Google Scholar]
Koenker R, Bassett GS. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
Koul H, Susarla V, Van Ryzin J. Regression analysis with randomly right censored data. The Annals of Statistics. 1981;9:1276–1288. [Google Scholar]
Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;19:219–234. [Google Scholar]
Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. doi: 10.1111/j.0006-341x.2001.01191.x. [DOI] [PubMed] [Google Scholar]
Neocleous T, Vanden Branden K, Portnoy S. Correction to censored regression quantiles by S. Portnoy, 98 (2003), 1001–1012. Journal of the American Statistical Association. 2006;101:860–861. [Google Scholar]
Peng L, Fine JP. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]
Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]
Pepe MS. Inference for events with dependent risks in multiple endpoint studies. Journal of the American Statistical Association. 1991;86:770–778. [Google Scholar]
Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003;98:1001–1012. [Google Scholar]
Prentice RL, Kalbfleisch JD, Peterson AV, Fluornoy N, Farewell VT, Breslow NE. The analysis of failure time in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]
Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable dropout using semiparametric nonresponse models: rejoinder. Journal of the American Statistical Association. 1999;94:1135–1146. [Google Scholar]
Thior I, Lockman S, Smeaton LM, Shapiro RL, Wester C, Heymann SJ, Gilbert PB, Stevens L, Peter T, Kim S, van Widenfelt E, Moffat C, Ndase P, Arimi P, Kebaabetswe P, Mazonde P, Makhema J, McIntosh K, Novitsky V, Lee TH, Marlink R, Lagakos S, Essex M and the Mashi Study Team. Breastfeeding plus infant zidovudine prophylaxis for 6 months vs formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana: a randomized trial: the Mashi Study. Journal of the American Medical Association. 2006;296:794–805. doi: 10.1001/jama.296.7.794. [DOI] [PubMed] [Google Scholar]
Tsiatis AA. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences USA. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag; New York: 1996. [Google Scholar]
Wang H, Wang L. Locally weighted censored quantile regression. Journal of the American Statistical Association. 2009;104:1117–1128. [Google Scholar]
Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS452342-supplement-Supplementary_Materials.pdf^{(103.7KB, pdf)}

[R1] Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]

[R2] Beaudry M, Dufour R, Marcoux S. Relation between infant feeding and infections during the first six months of life. Journal of Pediatrics. 1995;126:191–197. doi: 10.1016/s0022-3476(95)70544-9. [DOI] [PubMed] [Google Scholar]

[R3] Beran R. Technical report. University of California; Berkeley: 1981. Nonparametric Regression With Randomly Censored Survival Data. [Google Scholar]

[R4] Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436. [Google Scholar]

[R5] Cox DR. Regression models and life tables (with discussion) Journal of the Royal Statistical Society, B. 1972;34:187–220. [Google Scholar]

[R6] Dunn DT, Newell ML, Ades AE, Peckham CS. Risk of human immunodeficiency virus type 1 transmission through breastfeeding. The Lancet. 1992;340:585–588. doi: 10.1016/0140-6736(92)92115-v. [DOI] [PubMed] [Google Scholar]

[R7] Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891. [Google Scholar]

[R8] Gilbert PB, McKeague IW, Sun Y. The two-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics. 2008;9:263–276. doi: 10.1093/biostatistics/kxm028. [DOI] [PubMed] [Google Scholar]

[R9] Goetghebeur E, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–834. [Google Scholar]

[R10] Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1952;47:663–685. [Google Scholar]

[R11] Huang Y. Calibration regression of censored lifetime medical cost. Journal of the American Statistical Association. 2002;97:318–327. [Google Scholar]

[R12] Koenker R, Bassett GS. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]

[R13] Koul H, Susarla V, Van Ryzin J. Regression analysis with randomly right censored data. The Annals of Statistics. 1981;9:1276–1288. [Google Scholar]

[R14] Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;19:219–234. [Google Scholar]

[R15] Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. doi: 10.1111/j.0006-341x.2001.01191.x. [DOI] [PubMed] [Google Scholar]

[R16] Neocleous T, Vanden Branden K, Portnoy S. Correction to censored regression quantiles by S. Portnoy, 98 (2003), 1001–1012. Journal of the American Statistical Association. 2006;101:860–861. [Google Scholar]

[R17] Peng L, Fine JP. Competing risks quantile regression. Journal of the American Statistical Association. 2009;104:1440–1453. [Google Scholar]

[R18] Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008;103:637–649. [Google Scholar]

[R19] Pepe MS. Inference for events with dependent risks in multiple endpoint studies. Journal of the American Statistical Association. 1991;86:770–778. [Google Scholar]

[R20] Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003;98:1001–1012. [Google Scholar]

[R21] Prentice RL, Kalbfleisch JD, Peterson AV, Fluornoy N, Farewell VT, Breslow NE. The analysis of failure time in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]

[R22] Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592. [Google Scholar]

[R23] Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866. [Google Scholar]

[R24] Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable dropout using semiparametric nonresponse models: rejoinder. Journal of the American Statistical Association. 1999;94:1135–1146. [Google Scholar]

[R25] Thior I, Lockman S, Smeaton LM, Shapiro RL, Wester C, Heymann SJ, Gilbert PB, Stevens L, Peter T, Kim S, van Widenfelt E, Moffat C, Ndase P, Arimi P, Kebaabetswe P, Mazonde P, Makhema J, McIntosh K, Novitsky V, Lee TH, Marlink R, Lagakos S, Essex M and the Mashi Study Team. Breastfeeding plus infant zidovudine prophylaxis for 6 months vs formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana: a randomized trial: the Mashi Study. Journal of the American Medical Association. 2006;296:794–805. doi: 10.1001/jama.296.7.794. [DOI] [PubMed] [Google Scholar]

[R26] Tsiatis AA. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences USA. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag; New York: 1996. [Google Scholar]

[R28] Wang H, Wang L. Locally weighted censored quantile regression. Journal of the American Statistical Association. 2009;104:1117–1128. [Google Scholar]

[R29] Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184. [Google Scholar]

PERMALINK

Quantile Regression for Competing Risks Data with Missing Cause of Failure

Yanqing Sun

Huixia Judy Wang

Peter B Gilbert

Abstract

1 Introduction

2 Estimation procedures

2.1 Model descriptions and assumptions

2.2 Inverse probability weighted estimator

2.3 Augmented inverse probability weighted estimator

3 Asymptotic properties

Theorem 3.1

Theorem 3.2

4 Estimation of the covariance matrices

5 Simulation study

5.1 Assessment of estimation under correctly specified models

Table 1.

Table 4.

Table 5.

5.2 On robustness of estimation

Table 6.

6 Analysis of the Mashi data

Table 7.

Supplementary Material

Table 2.

Table 3.

Acknowledgments

7 Appendix

Proof of Theorem 3.1

Proof of Theorem 3.2

Proof of (4.1)

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases