Smoothed quantile regression analysis of competing risks

Sangbum Choi; Sangwook Kang; Xuelin Huang

doi:10.1002/bimj.201700104

. Author manuscript; available in PMC: 2019 Sep 1.

Published in final edited form as: Biom J. 2018 Jul 5;60(5):934–946. doi: 10.1002/bimj.201700104

Smoothed quantile regression analysis of competing risks

Sangbum Choi ^1,^*, Sangwook Kang ², Xuelin Huang ³

PMCID: PMC6156950 NIHMSID: NIHMS970979 PMID: 29978507

Abstract

Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L₁-type convex function or solving the non-smoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.

Keywords: Censored quantile regression, Cumulative incidence function, Induced smoothing, Variance estimation, Weighted estimating equation

1 Introduction

In statistical and econometric research, linear quantile regression models (Koenker and Bassett, 1978) have been extensively studied as a significant extension of traditional linear models. The regression parameters are often estimated by minimizing the L₁-type convex objective function or solving quantile-based estimating equations using linear programming or interior point methods. Interest has also been growing in modeling regression quantiles on time-to-event data (Powell, 1984, 1986; Ying, Jung, and Wei, 1995; Bang and Tsiatis, 2002; Portnoy, 2003; Peng and Huang, 2008; Wang and Wang, 2009; Pang, Lu, and Wang, 2012). When an earlier or later stage of the follow-up is the primary focus, conventional unconditional mean-based methods such as Cox’s proportional hazards model or the accelerated failure time (AFT) model may be unsuitable. Alternatively, quantile regressions can directly assess the lower or higher quantiles of interest by modeling regression quantiles to transform survival times. In addition, quantile regressions can eliminate the difficulty of going from a conditional mean model to the entire survival function, thereby fully assessing the covariate effects across different quantiles. These distinctive features make the quantile-based approach attractive for modeling typically right-skewed failure time data.

In this study, we consider linear censored quantile regression models for competing risks data, a scenario that involves multiple but mutually disjoint censoring events. The standard approach for competing risks analysis often models the cumulative incidence (or subdistribution) function (Gray, 1988; Lin, 1997; Fine and Gray, 1999), which, coupled with the cause-specific hazard function (Prentice et al., 1978), has been widely used in practice to summarize the marginal risk of failure from a particular cause over time. By extending the work of Bang and Tsiatis (2002) on median regression with censored medical cost data, Peng and Fine (2007, 2009) proposed non-parametric and regression quantile methods for competing risks, respectively, by formulating the model based on conditional quantiles adapted to the cumulative incidence function for a specific risk. They derived an unbiased monotone estimating equation for quantile regression parameters and found the uniform consistency and weak convergence of the resulting estimators. Sun, Wang, and Gilbert (2012) studied competing risks quantile regression when parts of the failure cause are missing. Recently, Li and Peng (2011, 2015) studied quantile-based inferences to adjust for dependent censoring from semi-competing risks data.

Although censored quantile regressions have attractive theoretical and practical properties, making associated inferences has been difficult because of the lack of efficient and reliable computational algorithms. The implementation of inferential procedures usually amounts to minimizing discontinuous functions that generally have multiple local minima. Moreover, the direct estimation of the variance associated with the regression parameter generally involves estimating the unspecified conditional error density function for the residual term. Non-parametric methods used to estimate the unknown density function such as kernel smoothing may require moderate to large sample sizes and the selection of tuning parameters. To estimate the variance, most existing methods for survival data (e.g., Portnoy, 2003; Peng and Huang, 2008; Wang and Wang, 2009) rely on computationally intensive resampling techniques. Alternatively, Peng and Fine (2009) employed Huang (2002)’s approach that estimates the slope matrix of the estimating equations based on their asymptotic linearity. Nevertheless, this approach still requires solving a set of non-smoothed estimating equations whose convergence may be problematic in a practical setting.

A more computationally efficient and practical approach to this situation is an induced kernel smoothing procedure (Brown and Wang, 2005, 2006), which has proven useful for accommodating rank-based inferences for the semiparametric AFT model (Johnson and Strawderman, 2009; Chiou, Kang, and Yan, 2015a,b) and the accelerated hazards model (Li, Zhang and Tang, 2012). The induced smoothing technique approximates a discontinuous estimating function by a continuously differentiable function that is asymptotically equivalent to the original approach, facilitating rapid numerical solutions. In this study, we employ the induced smoothing technique to enable a fast and reliable inference procedure for semipara-metric censored regression quantiles with competing risks by using the smoothed version of the weighted estimating equation of Peng and Fine (2009). The estimator from the smoothed estimating function is shown to be consistent and have the same asymptotic distribution as that from the non-smoothed estimating equation. A useful consequence of developing smoothed quantile regressions is an easy-to-compute sandwich variance estimator that avoids the need for resampling methods.

The rest of this article is organized as follows. In Section 2, we present the smoothed estimating functions for competing risks quantile regressions and establish the asymptotic properties of the proposed estimators. Variance estimation procedures are also provided. In Section 3, we examine the finite-sample properties by using simulation studies. In Section 4, we illustrate the application of the proposed method with reference to a sarcoma cancer study. We offer concluding remarks in Section 5 and present the proofs in the Appendix.

2 Models and methods

2.1 Notation and assumptions

Consider a random sample of n individuals from competing risks data subject to random right censoring, { ${\tilde{T}}_{i} \equiv T_{i} \land C_{i}$ , δ_i ≡ I(T_i ≤ C_i), δ_iε_i, Z_i; i = 1, …, n}, where T_i and C_i denote the failure and censoring times, respectively, ε_i ∈ {1, …, K} is the cause of failure for which the K causes are assumed to be known, and Z_i is a p × 1 bounded time-independent covariate vector. Here, a ˄ b = min(a, b) and I(·) is the indicator function. It is assumed that T_i and C_i are independent variables and the first entry of Z_i is set to be 1, corresponding to the intercept. Without loss of generality, we focus on the occurrence of the cause-1 failure (i.e., ε_i = 1) in the presence of the other types of events (i.e., ε_i = 1). Then, the implied latent time variable $T_{i}^{*}$ for cause-1 failure is related to the usual survival time T_i through

T_{i}^{*} = {\begin{matrix} T_{i}, & ε_{i} = 1, \\ \infty & ε_{i} \neq 1. \end{matrix}

With right-censoring, the failure from cause 1 cannot be fully identified. Hereafter, we write Δ_i = δ_iI(ε_i = 1) to denote the observed status of the failure from cause 1.

Our main objective is to estimate the p × 1 vector of the quantile coefficient β₀(τ) at a fixed quantile level τ ∈ [τ_L, τ_U] in the linear quantile regression model,

\log T_{i}^{*} = β_{0} {(τ)}^{'} Z_{i} + e_{i} (τ), i = 1, \dots, n,

(1)

where e_i(τ) ∈ {−∞, ∞} is the random error variable whose τth conditional quantile given Z_i equals 0. Given the quantile level, model (1) is equivalent to the AFT model for event time from the first failure cause. Note that $T_{i}^{*}$ has a distribution function equal to the cause-1 cumulative incidence function:

F_{1} (t | Z_{i}) = P (T_{i}^{*} \leq t | Z_{i}) = P (T_{i} \leq t, ε_{i} = 1 | Z_{i}),

which represents the probability of observing the cause-1 failure in the presence of the other types of events given the covariates. The corresponding conditional quantile given Z_i = z is $Q_{1} (τ | z) = \inf {t; F_{1} (t | z) \geq τ}$ and the τth linear quantile under model (1) is equivalent to

Q_{1} (τ | z) = \exp {β {(τ)}^{'} z} .

(2)

To ensure identifiability, we let 0 < τ_L < t < τυ < inf_z F₁(L|Z = z), where L is the maximum followup time, satisfying the conditions given in Section 2.3. Hereafter, we may use β in place of β(τ) for ease of presentation if the context is clear.

2.2 Competing risks quantile regressions

If complete competing risks data without censoring are available for each patient, the quantile regression parameters can be consistently obtained by using the least absolute deviations estimator for β(τ) by minimizing $n^{- 1} \sum_{i = 1}^{n} | I (X_{i} - β^{'} Z_{i} < 0, ε_{i} = 1) - τ |$ , where X_i = log $X_{i} = \log {\tilde{T}}_{i}$ . A minimizer is also a solution of the estimating equation

n^{- 1} \sum_{i = 1}^{n} Z_{i} {I (X_{i} - β^{'} Z_{i} < 0, ε_{i} = 1) - τ} \approx 0.

The approximation is used here because the estimating equation is a discontinuous function of β. In the presence of random right-censoring, Peng and Fine (2009) proposed using the following weighted estimating equation:

S_{n} (β, τ) = n^{- 1} \sum_{i = 1}^{n} Z_{i} {\frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} I (X_{i} - β^{'} Z_{i} < 0) - τ} \approx 0,

(3)

where Ĝ(t) is the Kaplan–Meier estimator for the censoring survivor function G(t) = P(C ≥ t), based on the data { $({\tilde{T}}_{i}, 1 - δ_{i})$ ; i = 1, …, n}, i.e., Ĝ(t) = Π_u<t{1 − dN^c(u)/Y(u)} with $N^{c} (u) = \sum_{i = 1}^{n} N_{i}^{c} (u) = \sum_{i = 1}^{n} I ({\tilde{T}}_{i} \leq u, δ_{i} = 0)$ and $Y (u) = \sum_{i = 1}^{n} Y_{i} (u) = \sum_{i = 1}^{n} I ({\tilde{T}}_{i} \geq u)$ . Let the associated censoring martingale process $M_{i}^{c} (t) = N_{i}^{c} (t) - \int_{0}^{t} λ^{c} (u) Y_{i} (u) d t$ , where λ^c(u) is the hazard function of the censoring distribution. The unbiasedness of this estimating equation follows easily using a conditioning argument (Bang and Tsiatis, 2002; Peng and Fine, 2009).

Because of the discontinuity of the estimating function (3), its exact solution may not exist. Instead, one can minimize the Euclidean norm of the estimating function, but this is also discontinuous and has no derivatives. Peng and Fine (2009) showed that the implementation of the estimation procedure amounts to minimizing the L₁-type convex function:

L_{n} (β, τ) = \sum_{i = 1}^{n} \frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} | X_{i} - β^{'} Z_{i} | + | M_{0} + β^{'} \sum_{j = 1}^{n} \frac{Δ_{j} Z_{j}}{\hat{G} ({\tilde{T}}_{j})} | + | M_{0} - 2 τ β^{'} \sum_{j = 1}^{n} Z_{j} |,

where M₀ > 0 is an extremely large constant. They proposed using ${\hat{β}}_{n} = {argmin}_{β \in B} L_{n} (β, τ)$ . However, the solution to this minimization problem may not be unique. The convexity of L_n(β, τ) implies that the set of minimizers on $B$ is convex. The lack of smoothness still makes the minimization of L_n(β, τ) computationally challenging, particularly with multiple covariates.

Under regularity conditions, the results of Peng and Fine (2009) imply that there exists a sequence of solutions that is strongly consistent for β₀ and that $n^{1 / 2} ({\hat{β}}_{n} - β_{0})$ converges in distribution to an N(0, Ψ) random vector, where Ψ = A⁻¹Γ(A⁻¹)′, Γ = lim_n_→∞ var{n^1/2S_n(β₀, τ)}, and A = (∂/∂β)S₀(β₀, τ) for S₀(β, τ) = lim_n_→∞ S_n(β, τ). In addition to the numerical challenges that arise in computing ${\hat{β}}_{n}$ , the variance estimation is complicated by the presence of A that involves the unknown distribution of e_i(τ), and the fact that S_n(β, τ) is not differentiable in β. Note that Γ can be approximated by

{\hat{Γ}}_{n} (β_{0}) = n^{- 1} \sum_{i = 1}^{n} Z_{i}^{\otimes 2} {\frac{Δ_{i} R_{i} (β_{0})}{\hat{G} ({\tilde{T}}_{i})} - τ}^{2} - n^{- 1} \int_{0}^{L} \frac{d N^{c} (u)}{Y^{2} (u)} {\hat{B} (β_{0}, u)}^{\otimes 2},

(4)

where a^⊗2 = aa′, $R_{i} = (β_{0}) = I (X_{i} - {β^{'}}_{0} Z_{i} < 0)$ , and $\hat{B} (β_{0}, u) = \sum_{i = 1}^{n} {Δ_{i} Z_{i} Y_{i} (u) R_{i} (β_{0})} / \hat{G} ({\tilde{T}}_{i})$ . The ${\hat{Γ}}_{n}$ can be directly obtained from (4) by replacing β₀ with ${\hat{β}}_{n}$ .

Since the formula in (4) has a complicated expression, it may be more convenient and reliable to bootstrap from the data. In our implementation, we apply an efficient resampling method by Zeng and Lin (2008) to approximate Γ, in which a random perturbation whose mean and variance both equal one is assigned to estimating function (3) for multiple times and compute their variance to have ${\hat{Γ}}_{n}$ . This approach does not require solving estimating equations and is thus much faster than the existing resampling procedures. Once ${\hat{Γ}}_{n}$ is obtained, Peng and Fine (2009) estimated the variance of ${\hat{β}}_{n}$ by generalizing the technique of Huang (2002). Specifically, they first decomposed the empirical estimate ${\hat{Γ}}_{n}$ as γγ′, where γ = (γ₁, …, γ_p)′, and computed the solutions $\tilde{b} = ({\tilde{b}}_{1}, \dots, {\tilde{b}}_{p})$ by solving Huang (2002)’s local linearization S_n(β, τ) − γ_k = 0 for k = 1, …, p. The variance estimator of $n^{1 / 2} ({\hat{β}}_{n} - β_{0})$ can then be approximated by $n (\tilde{b} - {\hat{B}}_{n}) {(\tilde{b} - {\hat{B}}_{n})}^{'}$ . As illustrated in our simulation studies, however, this method would unsatisfactorily overestimate empirical variances, resulting in broader confidence intervals.

2.3 Induced smoothing for competing risks data

Brown and Wang (2005) proposed an induced smoothing method for approximating discontinuous but monotone estimating functions using continuously differentiable functions. Assuming independent failure time observations, Brown and Wang (2006) applied this smoothing method to the problem of estimating the regression parameter in the AFT model. Based on a similar idea, we seek the smoothed version of the estimating function S_n(β, τ).

Let V be an N(0, I_p) random vector independent of the data, where I_p denotes the p × p identity matrix. Let Σ be a p × p symmetric, positive definite matrix such that ‖Σ‖ = O(n⁻¹). According to the asymptotic normality of ${\hat{β}}_{n}$ , we can write ${\hat{β}}_{n} = β_{0} + \sum^{1 / 2} V$ with Σ = n⁻¹Ψ, which implies that ${\hat{β}}_{n}$ can also be regarded as a random perturbation of β₀. By adding the random perturbation Σ^1/2V to the argument β of the score function S_n(β, τ) and taking the expectation with respect to V, we obtain the smoothed version of estimating equation (3) as

{\tilde{S}}_{n} (β, τ) \equiv E_{V} {S_{n} (β + \sum^{1 / 2} V, τ)} = n^{- 1} \sum_{i = 1}^{n} Z_{i} {\frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} Φ (- \frac{X_{i} - β^{'} Z_{i}}{\sqrt{{Z^{'}}_{i} \sum Z_{i}}}) - τ} = 0,

(5)

where Φ(·) is the standard normal cumulative distribution function. Let ϕ(·) denote the standard normal density function. The smoothed equation (5) is a monotone function with respect to each element of β.

Moreover, using standard results from normal random variables and integration by parts, we have

{\tilde{L}}_{n} (β, τ) = n^{- 1} \sum_{i = 1}^{n} [(X_{i} - β^{'} Z_{i}) {τ - \frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} Φ (- \frac{X_{i} - β^{'} Z_{i}}{\sqrt{{Z^{'}}_{i} \sum Z_{i}}})} + \frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} ϕ (- \frac{X_{i} - β^{'} Z_{i}}{\sqrt{{Z^{'}}_{i} \sum Z_{i}}}) \sqrt{{Z^{'}}_{i} \sum Z_{i}}] .

(6)

A straightforward calculation shows that $\partial {\tilde{L}}_{n} (β, τ) / (\partial β) = {\tilde{S}}_{n} (β, τ)$ . If Σ is given, the smoothed quantile regression parameter may be obtained from ${\tilde{β}}_{n} = {argmin}_{β \in B} {\tilde{L}}_{n} (β, τ)$ . The smoothed objective function, ${\tilde{L}}_{n} (β, τ)$ , is convex and continuously differentiable and thus standard numerical routines, such as the Newton-Raphson algorithm, can be used to effectively compute ${\tilde{β}}_{n}$ . Alternatively, the corresponding coefficient estimator ${\tilde{β}}_{n}$ can be found as the multivariate root of ${\tilde{S}}_{n} (β, τ) = 0$ .

In the case that Σ is unknown and has to be estimated, we propose using a positive definite p × p matrix ${\sum^{\sim}}^{(0)} = O (n^{- 1})$ as an initial value and updating through iterations to estimate Σ as described below. According to Brown and Wang (2005, 2006), the estimation of the asymptotic variance is mainly based on the sandwich formula of the covariance matrix of the estimated parameters. The partial derivatives of the smoothed estimating function (5) can be explicitly expressed as

{\tilde{A}}_{n} (β) = \frac{\partial {\tilde{S}}_{n} (β, τ)}{\partial β} = n^{- 1} \sum_{i = 1}^{n} \frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} ϕ (- \frac{X_{i} - β^{'} Z_{i}}{\sqrt{{Z^{'}}_{i} \sum Z_{i}}}) \frac{Z_{i} {Z^{'}}_{i}}{\sqrt{{Z^{'}}_{i} \sum Z_{i}}} .

(7)

As suggested by Pang, Lu, and Wang (2012) for censored regression quantiles, an iterative procedure can be constructed to simultaneously estimate the regression parameter β and its covariance matrix Σ. The estimating procedure consists of the following steps:

Step 1: Let ${\tilde{β}}_{n}^{(0)} = {\hat{β}}_{n}$ , from the Peng and Fine (2009) estimator from solving (3), and ${\sum^{\sim}}_{n}^{(0)} = n^{- 1} I_{p}$ .
Step 2: Given ${\tilde{β}}_{n}^{(k - 1)}$ and ${\sum^{\sim}}_{n}^{(k - 1)}$ from the (k − 1)th step, update ${\tilde{β}}_{n}^{(k)}$ and ${\sum^{\sim}}_{n}^{(k)}$ as
$\begin{matrix} {\tilde{β}}_{n}^{(k)} = {\tilde{β}}_{n}^{(k - 1)} - {{\tilde{A}}_{n} ({\tilde{β}}_{n}^{(k - 1)})}^{- 1} {\tilde{S}}_{n} ({\tilde{β}}_{n}^{(k - 1)}, τ), \\ {\sum^{\sim}}_{n}^{(k)} = n^{- 1} {{\tilde{A}}_{n} ({\tilde{β}}_{n}^{(k - 1)})}^{- 1} {\hat{Γ}}_{n} ({\tilde{β}}_{n}^{(k - 1)}) {{\tilde{A}}_{n} ({\tilde{β}}_{n}^{(k - 1)})}^{- 1} . \end{matrix}$
Step 3: Repeat Step 2 until convergence. Denote the coefficient estimate and covariance estimate at convergence as ${\tilde{β}}_{n}$ and ${\sum^{\sim}}_{n}$ , respectively.

To establish the asymptotic properties of the proposed method, we state the regularity conditions:

(A1): There exists L > 0 such that P(C = L) > ν and P(C > L) =0 for a constant ν ∈ (0, 1).
(A2): β₀(τ) is Lipschitz continuous for τ ∈ [τ_L, τ_U] and the parameter space $B$ containing β₀(τ) is a compact subset of ℝ^p. Z is uniformly bounded, i.e., sup_i ‖Z_i‖ < ∞.
(A3): Let f₁(·) ≡ f₁(·|Z₁) denote the marginal density associated with the model error term e₁. Then, f₁(·) and ${f^{'}}_{1} (\cdot)$ are bounded functions on ℝ with $\int_{ℝ} {\frac{{f^{'}}_{1} (t)}{f_{1} (t)}}^{2} f_{1} (t) d t < \infty$ .
(A4): The matrix $A = \lim_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} f_{i} (0)$ exists and is nonsingular.

In the Appendix, we show that under the regularity conditions (A1)–(A4),

$\lim_{n \to \infty} \sup_{τ \in [τ_{L}, τ_{U}]} ‖ {\tilde{β}}_{n} - β_{0} ‖ \to_{p} 0$ ,
$n^{1 / 2} ({\tilde{β}}_{n}^{(k)} - β_{0}) \to_{d} N (0, Ψ)$ for any k ≥ 1.

This fact implies that the induced smoothing estimator ${\tilde{β}}_{n}$ has the same asymptotic distribution as the original estimator ${\hat{β}}_{n}$ from the non-smoothed estimating equation. At convergence, the variance estimator for Ψ can be approximated by ${\tilde{Ψ}}_{n} = {{\tilde{A}}_{n} ({\tilde{β}}_{n})}^{- 1} {\hat{Γ}}_{n} ({\tilde{β}}_{n}) {{\tilde{A}}_{n} ({\tilde{β}}_{n})}^{- 1}$ , which is equivalent to $n {\sum^{\sim}}_{n}$ . Note that the variance estimator is obtained as a byproduct while performing the proposed iterative procedure. In addition, the proposed variance estimator is shown to be consistent. This suggests that ${\tilde{Ψ}}_{n}$ may be used as a variance estimation procedure for the regression parameter. Our simulation study in the next section confirms that the proposed algorithm converges quickly, usually within five iterations, and that the variance estimates are fairly accurate.

3 Simulation studies

In this section, we present the results of the simulation studies carried out to assess the performance of our smoothed estimating equations with finite samples for competing risks regression quantiles. We also compare the proposed induced smoothing method with Peng and Fine (2009)’s non-smoothed estimating equation method. The simulation study involves two covariates with Z_i = (1, Z₁_i, Z₂_i)′, where Z₁_i is a uniform (−1, 1) variable and Z₂_i is a Bernoulli (0.5) variable. Two competing failure causes, ε_i ∈ {1, 2}, are assumed and generated according to P(ε_i = 1|Z_i = z_i) = p₁I(z₂_i = 1) + p₀I(z₂_i = 0) with (p₀, p₁) = (0.7, 0.8). We consider two quantile levels, τ = 0.2 and τ = 0.4, for which the respective survival times follow conditional distributions:

P (T_{i} \leq t | ε_{i} = 1, Z_{i} = z_{i}) = H_{0} (\log t - {θ^{'}}_{0} z_{i}), P (T_{i} \leq t | ε_{i} = 2, Z_{i} = z_{i}) = H_{0} (\log t - {ϑ^{'}}_{0} z_{i}) .

For the quantile residual subdistribution H₀(·), we consider the following four distributions: (i) standard normal N(0, 1), (ii) Logistic(0, 1), (iii) Cauchy(0, 1), and (iv) t₍₃₎, i.e., t-distribution with 3 degrees of freedom, respectively. Under this setting, the τth conditional quantile of T_i from cause 1 is

Q_{1 i} (τ | Z_{i} = z_{i}) = \inf {t > 0 : P (T_{i} \leq t, ε_{i} = 1 | Z_{i} = z_{i}) \geq τ} = \exp {β_{1} (τ) + β_{2} (τ) z_{1 i} + β_{3} (τ) z_{2 i}},

where $β_{1} (τ) = θ_{1} + H_{0}^{- 1} (τ / p_{0})$ , β₂(τ) = θ₂, and $β_{3} (τ) = θ_{3} + H_{0}^{- 1} (τ / p_{1}) - H_{0}^{- 1} (τ / p_{0})$ . We let θ₀ = (θ₁, θ₂, θ₃)′ = (−1, 1, 1)′ and ϑ₀ = (ϑ₁, ϑ₂, ϑ₃)′ = (−1, 1, −1)′, which leads to the occurrence of events 1 and 2 in a nearly 3:1 ratio. The censoring times C_i are generated from Uniform(0, L), where L is chosen to yield approximately 20% censoring rates in all cases. On average, this simulation setup finds that 55%, 25% and 20% of subjects experience event 1, event 2, or are censored.

Tables 1 and 2 summarize the simulation results from 1000 replications with sample sizes n = 300 and 500, respectively. All computing work here was done in R (version 3.4.2). The iteration procedure is stopped when either $‖ {\tilde{β}}_{n}^{(k)} - {\tilde{β}}_{n}^{(k - 1)} ‖ < 10^{- 5}$ or the maximum number of iteration, specified as 100, is first achieved. If starting values are properly selected, our algorithm converges after 4-8 iterations. For example, mean (sd) time to convergence and mean iteration per dataset are 0.12 (0.05) seconds and 5.12 times, respectively, when n = 300, and 0.24 (0.07) seconds and 4.56 times when n = 500 (case (i), τ = 0.2). For each simulation run, we report the mean bias (Bias), averages of the standard error estimates (ASE), empirical standard deviations (ESE), and empirical coverage probabilities (Cov) of the cause-1 quantile regression coefficients β_k(τ), k = 1, 2, 3. We also compute the MSEs for each method and present the relative efficiency, defined as RE = MSE₁/MSE₂, where MSE₁ and MSE₂ correspond to the non-smoothed estimator and proposed induced smoothing estimator, respectively. We used the dfsane function in library(BB) to solve the non-smoothed estimating function (3). To approximate Γ from those provided by Peng and Fine (2009)’s approach, we used the resampling method by Zeng and Lin (2008) with 300 bootstrap samples. Since this approach does not require solving estimating functions for bootstrapped samples, it is fairly fast and compatible to the proposed method in computing time. In our simulation experiment, for example, the induced smoothing method took about 190 seconds and the non-smoothed method requires 270 seconds to complete 1000 simulation runs (case (i), τ = 0.2, and n = 300) on our machine (Intel i5-6200U CPU, 2.30GHz).

Table 1.

Simulation results with n = 300 from the smoothed and non-smoothed estimating equations for competing risks regression quantiles. The marginal residual error for the failure from cause 1 follows the (i) normal, (ii) logistic, (iii) Cauchy, and (iv) t₍₃₎ distributions, respectively.

Dist	τ	Par	True	Smoothed equation				Non-smoothed equation				RE
Dist	τ	Par	True	Bias	ASE	ESE	Cov	Bias	ASE	ESE	Cov	RE
Normal	0.2	β₁	−1.566	−0.002	0.130	0.133	0.928	0.004	0.165	0.139	0.942	1.099
		β₂	1.000	−0.001	0.156	0.158	0.931	0.001	0.187	0.167	0.936	1.115
		β₃	0.891	−0.006	0.182	0.183	0.940	−0.005	0.225	0.193	0.942	1.112
	0.4	β₁	−0.820	0.012	0.144	0.148	0.935	0.008	0.173	0.153	0.938	1.062
		β₂	1.000	0.010	0.173	0.178	0.936	0.009	0.207	0.186	0.931	1.090
		β₃	0.820	−0.008	0.198	0.201	0.937	−0.003	0.232	0.209	0.937	1.076

Logistic	0.2	β₁	−1.916	−0.012	0.216	0.220	0.935	0.001	0.263	0.228	0.934	1.073
		β₂	1.000	−0.007	0.257	0.261	0.933	−0.008	0.308	0.273	0.938	1.089
		β₃	0.817	−0.011	0.302	0.305	0.940	−0.007	0.356	0.315	0.941	1.065
	0.4	β₁	−0.712	0.018	0.229	0.236	0.940	0.014	0.266	0.242	0.940	1.052
		β₂	1.000	0.007	0.263	0.270	0.936	0.004	0.310	0.283	0.924	1.098
		β₃	0.712	−0.019	0.307	0.312	0.936	−0.014	0.343	0.320	0.921	1.048

Cauchy	0.2	β₁	−1.797	−0.041	0.235	0.248	0.928	−0.025	0.289	0.244	0.931	0.934
		β₂	1.000	−0.010	0.291	0.306	0.930	−0.013	0.351	0.310	0.946	1.022
		β₃	0.797	−0.017	0.345	0.361	0.942	−0.010	0.407	0.355	0.946	0.964
	0.4	β₁	−0.772	0.023	0.195	0.203	0.952	0.018	0.229	0.202	0.945	0.979
		β₂	1.000	0.005	0.220	0.228	0.939	0.004	0.254	0.230	0.936	1.016
		β₃	0.772	−0.023	0.257	0.262	0.942	−0.016	0.290	0.262	0.945	0.995

t₍₃₎	0.2	β₁	−1.633	−0.014	0.155	0.159	0.932	−0.003	0.192	0.164	0.943	1.056
		β₂	1.000	−0.005	0.187	0.192	0.931	−0.005	0.226	0.201	0.933	1.087
		β₃	0.869	−0.009	0.220	0.222	0.940	−0.006	0.263	0.228	0.940	1.051
	0.4	β₁	−0.804	0.016	0.159	0.157	0.959	0.014	0.184	0.162	0.933	1.050
		β₂	1.000	0.015	0.184	0.177	0.936	0.016	0.218	0.179	0.944	1.027
		β₃	0.804	−0.007	0.214	0.211	0.951	−0.001	0.242	0.216	0.936	1.051

Open in a new tab

Note: ASE, average of standard error estimates; ESE, empirical standard error; Cov, coverage probability of the 95% Wald-type confidence intervals; RE, relative efficiency of the mean squared errors (MSEs) for the non-smoothed estimator over the induced smoothing estimator. t₍₃₎ represents t-distribution with 3 degrees of freedom.

Table 2.

Simulation results with n = 500 from the smoothed and non-smoothed estimating equations for competing risks regression quantiles. The marginal residual error for the failure from cause 1 follows the (i) normal, (ii) logistic, (iii) Cauchy, and (iv) t₍₃₎ distributions, respectively.

Dist	τ	Par	True	Smoothed equation				Non-smoothed equation				RE
Dist	τ	Par	True	Bias	ASE	ESE	Cov	Bias	ASE	ESE	Cov	RE
Normal	0.2	β₁	−1.566	−0.004	0.101	0.105	0.928	0.001	0.123	0.110	0.932	1.089
		β₂	1.000	0.005	0.123	0.119	0.943	0.006	0.147	0.126	0.940	1.108
		β₃	0.891	0.003	0.143	0.148	0.926	0.004	0.172	0.154	0.929	1.088
	0.4	β₁	−0.820	0.002	0.112	0.111	0.941	0.001	0.131	0.115	0.934	1.064
		β₂	1.000	0.010	0.139	0.145	0.934	0.010	0.163	0.151	0.936	1.081
		β₃	0.820	0.006	0.158	0.158	0.943	0.007	0.181	0.164	0.931	1.075

Logistic	0.2	β₁	−1.916	−0.008	0.167	0.173	0.932	0.001	0.193	0.179	0.927	1.069
		β₂	1.000	0.009	0.200	0.194	0.942	0.009	0.233	0.204	0.933	1.110
		β₃	0.818	0.001	0.234	0.242	0.934	0.001	0.266	0.250	0.928	1.067
	0.4	β₁	−0.712	0.003	0.175	0.174	0.944	−0.002	0.199	0.177	0.942	1.045
		β₂	1.000	0.008	0.202	0.211	0.940	0.009	0.231	0.218	0.923	1.068
		β₃	0.712	−0.002	0.236	0.232	0.947	0.004	0.262	0.239	0.942	1.053

Cauchy	0.2	β₁	−1.797	−0.042	0.181	0.188	0.934	−0.019	0.214	0.190	0.943	0.985
		β₂	1.000	0.011	0.226	0.221	0.950	0.009	0.267	0.231	0.947	1.091
		β₃	0.797	0.001	0.266	0.278	0.942	0.005	0.304	0.279	0.938	1.008
	0.4	β₁	−0.772	0.012	0.146	0.139	0.956	0.007	0.165	0.140	0.953	1.011
		β₂	1.000	0.010	0.166	0.165	0.956	0.010	0.187	0.170	0.939	1.063
		β₃	0.772	−0.009	0.194	0.191	0.943	−0.004	0.214	0.193	0.929	1.011

t₍₃₎	0.2	β₁	−1.633	−0.010	0.120	0.124	0.938	−0.003	0.141	0.128	0.934	1.070
		β₂	1.000	0.007	0.147	0.140	0.946	0.007	0.171	0.148	0.937	1.112
		β₃	0.869	−0.002	0.171	0.175	0.934	0.001	0.196	0.181	0.930	1.075
	0.4	β₁	−0.804	0.006	0.121	0.114	0.955	0.002	0.136	0.115	0.952	1.026
		β₂	1.000	0.012	0.140	0.139	0.942	0.010	0.164	0.144	0.939	1.067
		β₃	0.804	−0.001	0.164	0.160	0.938	0.003	0.181	0.166	0.932	1.071

Open in a new tab

Note: See the note in Table 1.

In all scenarios, both the induced smoothing and the non-smoothed estimators are virtually unbiased, the standard error estimators appear to reasonably reflect the true variations, and the confidence intervals have reasonable coverage probabilities. In theory, the non-smoothed estimator is asymptotically equivalent to the proposed induced smoothing estimator. With finite sample sizes, however, the proposed estimators are slightly more efficient than the non-smoothed estimators in nearly all scenarios, even though there are some fluctuations in relative efficiency between two methods. It is also observed that ASEs from the proposed smoothing method are fairly close to ESEs, while ASEs from the non-smoothed method are 10-30% higher than ESEs, resulting in noticeably inflated standard errors. Nonetheless, Peng and Fine’s approach often leads to lower coverage probabilities than the nominal level especially when n = 300. This is partly because their variance estimation method has relatively large variation. For example, $\hat{se} (ASE)$ for ${\hat{β}}_{2}$ is about 0.03 for the smoothed method, while it is 0.08 for the non-smoothed method, which might cause a higher chance to miss true parameters in computing coverage probabilities for the latter.

Our finding from the simulation study also reveals that the induced smoothing method is relatively more sensitive to the initial values, especially when τ is extreme (too small or too large). In that case, the non-smoothed estimator may be used for the initial input for the induced smoothing method, as we suggested in Section 2.3.

4 Analysis of soft tissue sarcoma data

We illustrate the application of the proposed method by using data from a study of the effects of chemotherapy on the survival of soft tissue sarcoma patients. For these patients, the primary treatment is surgical resection. Chemotherapy is often used as an adjuvant therapy for treating sarcomas. However, the utility of adjuvant chemotherapy remains uncertain in the treatment of sarcoma patients and the results have been conflicting in several studies. To assess its effect, Cormier et al. (2004) retrospectively identified and analyzed a cohort of 679 patients from two major cancer centers in the United States. In their initial treatments, all patients received definitive surgical resection. Among the 679 patients, 228 received adjuvant radiation alone, 109 received adjuvant chemotherapy alone, 207 received both, and 135 received none of these treatments. Of the 316 patients treated with adjuvant chemotherapy, 148 (46.8%) died from sarcomas, and of the 363 patients not treated with adjuvant chemotherapy, 140 (38.6%) died from sarcomas. Medical records were examined retrospectively to verify the known prognostic factors, including tumor size and location treatment sequences, pathologic margin status, and survival outcomes.

The primary objective of this analysis was to evaluate the impact of chemotherapy while accounting for the known prognostic variables. However, the analysis of this dataset is complicated by the existence of competing events and a considerable proportion of survivors: 288 (42.4%) patients died from the sarcoma, 65 (9.6%) died from other competing causes, and 326 (48.0%) patients survived throughout the follow-up period of the study. Furthermore, the non-parametric subdistributions, as shown in Figure 1, for the sarcoma-related death of patients who received adjuvant chemotherapy compared with those who did not crossed at about two years after surgery, indicating that the effect of adjuvant chemotherapy changes over time. In this situation, analysis using the proportional subdistribution hazards model (Fine and Gray, 1999) could be misleading, as it cannot facilitate the crossing of the subdistribution functions. The mixture approach (Choi and Huang, 2015) can fit such a data pattern but has some complexity in interpreting the results. Alternatively, we applied the quantile regression model that can evaluate the conditional treatment effects orderly over a set of different quantiles.

Non-parametric cumulative incidence (subdistribution) functions for (a) sarcoma-related death and (b) death from other competing causes of patients who received local therapy only compared with local therapy plus adjuvant chemotherapy.

We begin with the univariate quantile analysis. Figure 2 plots the results from the smoothed quantile regression for the sarcoma endpoint, estimating the effects of chemotherapy (yes or no), radiation (yes or no), tumor size (> 10 or not), and age (> 60 or not), respectively. The estimated regression coefficients are plotted in bold solid lines and the 95% pointwise confidence intervals are in dotted lines at an equally spaced τ-grid for sarcoma-specific cumulative incidence. In Figure 2, we also provide the regression estimates from the non-smoothed estimating function in dashed lines. It can be seen that the effects of chemotherapy are positive roughly for τ < 0.2, which is statistically significant, while τ ≤ 0.1, and negative for τ ≥ 0.2 in reducing sarcoma-related death. This observation, coupled with the 95% confidence intervals, indicates that the effect of chemotherapy is only significant for patients at lower quantiles but becomes insignificant for higher quantiles. On the contrary, patients who received radiation died later than those who did not. The associated coefficient estimates are positive for all the observed quantiles. The effect of tumor size has an adverse effect on patients′ survival times, and this adverse effect tends to be larger as τ grows. It also shows that Age ≥ 60 seems to have a negative effect on survival and the associated pointwise confidence intervals are under zero for most of the observed quantiles.

Estimated quantile regression coefficients for death from sarcoma data; the regression coefficient estimates from the smoothed estimating function (solid line), along with the 95% pointwise confidence intervals (dotted line), and the the regression coefficient estimates from the non-smoothed estimating function (dashed line).

Next, we apply multiple quantile regression models for competing risks. Table 3 summarizes the quantile coefficient estimates, corresponding estimated standard errors, 95% confidence intervals, and p-values obtained from the proposed induced smoothing model and Peng and Fine (2009)’s non-smoothed model. For Peng and Fine (2009)’s method, we make inferences with 500 bootstrap replicates. We include chemotherapy, radiation, tumor size, age, upper (vs. lower) limb, and proximal site (yes or no) as the covariates, and let τ ∈ {0.1, 0.2, 0.3, 0.4}. The estimation and inferences are challenging for higher quantiles, for which the quantile functions may not be identifiable in the presence of competing risks. Overall, these two methods provide similar coefficient estimates, while the estimators of the induced smoothing method have slightly larger estimated standard errors and consequently inflated p-values. By studying different quantiles, our analysis therefore reveals a richer picture. Likewise, for the univariate case, the effect of chemotherapy is initially effective on patients’ survival and quickly becomes adverse, but not significantly, for higher quantiles. Radiation is consistently effective for different quantiles and tumor size is another important factor when treating sarcoma patients. Proximal site appears to be significant in some cases. This additional information may not be detected by the proportional subdistribution hazards model. Hence, quantile-based varying coefficients may provide useful insights into the association between the progression of sarcoma cancer and potential risk factors.

Table 3.

Results from the soft tissue sarcoma data, showing the estimated quantile coefficients (Est), associated standard errors (SE), lower bound (LB) and upper bound (UB) of the 95% Wald-type confidence intervals, and p-values for the quantiles of τ = 0.1, 0.2, 0.3, and 0.4. Covariates include chemotherapy (yes or no), radiation (yes or no), tumor size (≥ 10 or not), age (≥ 60 or not), upper limb (upper vs. lower limb), and proximal site (yes or no)

τ	Effect	Induced smoothing					Peng and Fine (2009)
τ	Effect	Est	SE	LB	UB	p-value	Est	SE	LB	UB	p-value
0.1	Intercept	0.076	0.245	−0.405	0.557	0.756	0.058	0.104	−0.146	0.261	0.579
	Chemotherapy	0.249	0.138	−0.022	0.520	0.072	0.280	0.108	0.067	0.492	0.010
	Radiation	0.444	0.142	0.166	0.723	0.002	0.326	0.113	0.104	0.547	0.004
	Tumor size	−0.318	0.125	−0.564	−0.073	0.011	−0.231	0.096	−0.418	−0.043	0.016
	Age	−0.319	0.139	−0.591	−0.046	0.022	−0.208	0.112	−0.428	0.011	0.063
	Upper limb	0.091	0.174	−0.251	0.432	0.603	0.098	0.101	−0.100	0.296	0.332
	Proximal site	−0.072	0.185	−0.435	0.291	0.698	−0.070	0.102	−0.269	0.130	0.494

0.2	Intercept	1.138	0.225	0.697	1.578	< 0.001	1.042	0.119	0.809	1.276	< 0.001
	Chemotherapy	−0.076	0.147	−0.365	0.212	0.604	−0.091	0.124	−0.334	0.152	0.464
	Radiation	0.418	0.136	0.152	0.684	0.002	0.383	0.112	0.164	0.603	0.001
	Tumor size	−0.550	0.159	−0.862	−0.238	0.001	−0.497	0.112	−0.717	−0.277	< 0.001
	Age	−0.311	0.146	−0.597	−0.025	0.033	−0.309	0.116	−0.537	−0.081	0.008
	Upper limb	0.003	0.217	−0.423	0.429	0.989	−0.004	0.147	−0.292	0.284	0.977
	Proximal site	−0.295	0.214	−0.715	0.124	0.167	−0.213	0.127	−0.462	0.037	0.094

0.3	Intercept	1.927	0.315	1.310	2.544	< 0.001	1.785	0.216	1.361	2.208	< 0.001
	Chemotherapy	−0.242	0.212	−0.658	0.173	0.253	−0.183	0.173	−0.523	0.157	0.292
	Radiation	0.578	0.214	0.158	0.997	0.007	0.466	0.196	0.081	0.851	0.018
	Tumor size	−0.722	0.218	−1.149	−0.294	0.001	−0.678	0.194	−1.059	−0.297	< 0.001
	Age	−0.247	0.198	−0.636	0.142	0.214	−0.170	0.167	−0.498	0.157	0.307
	Upper limb	0.415	0.537	−0.638	1.467	0.440	0.173	0.563	−0.931	1.277	0.759
	Proximal site	−0.548	0.332	−1.198	0.102	0.098	−0.426	0.214	−0.845	−0.007	0.046

0.4	Intercept	2.448	0.505	1.459	3.437	< 0.001	2.388	0.317	1.866	3.110	< 0.001
	Chemotherapy	−0.411	0.243	−0.888	0.065	0.091	−0.391	0.175	−0.733	−0.049	0.025
	Radiation	0.719	0.236	0.256	1.182	0.002	0.698	0.223	0.262	1.134	0.002
	Tumor size	−0.868	0.249	−1.356	−0.380	< 0.001	−0.811	0.226	−1.254	−0.368	< 0.001
	Age	−0.306	0.208	−0.715	0.102	0.142	−0.256	0.160	−0.568	0.057	0.109
	Upper limb	0.922	0.838	−0.720	2.565	0.271	0.578	0.716	−0.825	1.980	0.420
	Proximal site	−0.753	0.480	−1.693	0.187	0.116	−0.506	0.303	−1.100	0.087	0.095

Open in a new tab

5 Conclusion and remarks

In this study, we consider an induced kernel smoothing technique for making inferences on competing risks regression quantiles. The proposed method can be easily and effectively implemented by using standard algorithms such as Newton–Raphson, which do not rely on statistical software. The inference procedure developed in this work is particularly useful for estimating the variance of quantile regression parameters. Our numerical studies demonstrate that the smoothed estimating approach performs well with practical sample sizes and variance estimation follows from basic asymptotic normality arguments. On the other hand, variance estimation for the Peng and Fine (2009)’s method does not seem to be justified by asymptotic theory unless sample size is large enough, necessitating extra inferential steps, such as bootstrapping. Our approach has the potential to provide more computing-effective approach than Peng and Fine (2009)’s method. However, our experience reveals that the induced smoothing estimators seem relatively more sensitive to the initial values if τ is close to boundary, in which case it is helpful to take Peng and Fine (2009)’s estimator as an initial value.

The conditional competing risks quantile regression approach not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the AFT model for competing risks by relaxing some of the stringent model assumptions (e.g., global linearity and unconditional independence) required by many existing procedures. By using the data example of the sarcoma study, quantile regression is proven useful to describe the time-varying effect of adjuvant chemotherapy on the cause-specific probability of death from sarcoma and subsequent crossing survival rates. Conventional Cox’s or AFT models yield an overall mean-based assessment of the covariate effects on survival for the entire follow-up period, and thus cannot distinguish the survival difference that may occur at a specific stage of disease progression. Further, these approaches provide low statistical power when crossing survival rates are concerned. This observation forced Cormier et al. (2004) to divide patients into two groups, namely before and after survival crossing, to fit the Cox models. However, splitting the dataset in this manner could bring about additional selection biases and is not recommended. In this setting, quantile regression analysis offers substantial flexibility in accommodating the varying covariate effects, as it can directly model a certain quantile of interest and provide a global picture of the relationship between the covariates and patients’ survival.

Supplementary Material

Supp DataS1

NIHMS970979-supplement-Supp_DataS1.RData^{(41.6KB, RData)}

Supp TableS1

NIHMS970979-supplement-Supp_TableS1.R^{(4.2KB, R)}

Supp TableS2

NIHMS970979-supplement-Supp_TableS2.R^{(4.2KB, R)}

Supp figS1

NIHMS970979-supplement-Supp_figS1.pdf^{(5.9KB, pdf)}

Supp figS2

NIHMS970979-supplement-Supp_figS2.pdf^{(8.4KB, pdf)}

Supp info1

NIHMS970979-supplement-Supp_info1.Rhistory^{(10.3KB, Rhistory)}

supp info2

NIHMS970979-supplement-supp_info2.R^{(9.7KB, R)}

supp info3

NIHMS970979-supplement-supp_info3.R^{(5.8KB, R)}

Acknowledgments

Dr. Choi was supported by grants from Korea University (K1607341) and National Research Foundation (NRF) of Korea (2017R1C1B1004817). Dr. Kang was supported by grant from NRF of Korea (2017R1A2B4005818). The research of Dr. Huang was supported in part by USA NSF grant (DMS 1612965) and NIH grants (U54 CA096300, U01 CA152958 and 5P50 CA100632).

Appendix

In this section, we provide the proof for the consistency and asymptotic normality of the proposed estimator ${\tilde{β}}_{n}$ and the consistency of the variance estimation. By using the consistency results of Peng and Fine (2009), S_n(β, τ) converges uniformly in probability to a continuous and deterministic function of β that has a unique zero at β₀ ∈ ℝ^p and is bounded in the neighborhood of β₀. Thus, it is sufficient to prove that ${\tilde{S}}_{n} (β, τ) - S_{n} (β, τ)$ uniformly converges to 0 in probability for β in the compact neighborhood of β₀ as n → ∞.

Define $ξ_{i} (β) = Φ {- \sqrt{n} η_{i} (β) / σ_{i}} - I {η_{i} (β) < 0}$ , where η_i(β) = X_i − β′Z_i and $σ_{i} = \sqrt{{Z^{'}}_{i} \sum Z_{i}}$ . By manipulating the martingale representation of the Kaplan–Meier estimator (Fleming and Harrington, 1991), we have

\frac{\hat{G} (t) - G (t)}{G (t)} = - \sum_{i = 1}^{n} \int_{0}^{t} {\frac{\hat{G} (u -)}{G (u)}} \frac{d M_{i}^{c} (u)}{Y (u)},

which, along with the uniform convergence of Ĝ(·) to G(·), shows that

n^{1 / 2} {{\tilde{S}}_{n} (β, τ) - S_{n} (β, τ)} = n^{- 1 / 2} \sum_{i = 1}^{n} \frac{Z_{i} Δ_{i} ξ_{i} (β)}{\hat{G} ({\tilde{T}}_{i})} = n^{- 1 / 2} \sum_{i = 1}^{n} \frac{Z_{i} Δ_{i} ξ_{i} (β)}{G ({\tilde{T}}_{i})} - n^{- 1 / 2} \sum_{i = 1}^{n} \int_{0}^{L} \frac{d M_{i}^{c} (u)}{Y (u)} B (β, u) + o_{p} (1),

(8)

where $B (β, u) = \lim_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} {Δ_{i} Z_{i} Y_{i} (u) ξ_{i} (β)} / G ({\tilde{T}}_{i})$ . According to the martingale central limit theorem, it can be checked that the second term on the right-hand side of (8) is dominated by a bounded function and will vanish uniformly in probability for β in the neighborhood of β₀. Let us note that for u ∈ ℝ,

| u {Φ (- \sqrt{n u}) - I (u < 0)} | = sign (u) {u Φ (- \sqrt{n} | u |)},

where sign(u) = 2I(u ≥ 0) − 1, and hence

\lim_{n \to \infty} \sup_{u \in ℝ} | u {Φ (- \sqrt{n} u) - I (u < 0)} | = 0.

By applying this result to (8) and using the boundedness of Z_i and ‖Σ‖ = O(n⁻¹), it follows that the first term of (8) will also converge in probability to 0 uniformly for β. Therefore, $‖ {\tilde{S}}_{n} (β, τ) - S_{n} (β, τ) ‖ \to 0$ as n → ∞.

To show that $n^{1 / 2} ({\hat{β}}_{n} - β_{0})$ and $n^{1 / 2} ({\tilde{β}}_{n} - β_{0})$ converge to the same asymptotic distribution, it suffices to establish the following two convergence results: as n → ∞, (i) $n^{1 / 2} ‖ {\tilde{S}}_{n} (β_{0}, τ) - S_{n} (β_{0}, τ) ‖ \to 0$ and (ii) $‖ \nabla_{β} {\tilde{S}}_{n} (β_{0}, τ) - A ‖ \to 0$ . Note that (i) is implied from the previous argument and we show (ii) in the following. Let us write

{\tilde{A}}_{n} (β_{0}) = \nabla_{β} {\tilde{S}}_{n} (β_{0}, τ) = n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} \frac{Δ_{i}}{\hat{G} ({\tilde{T}}_{i})} φ_{i} (β_{0}),

where $φ_{i} (β_{0}) = (\sqrt{n} / σ_{i}) ϕ {- \sqrt{n} η_{i} (β_{0}) / σ_{i}}$ . For any vectors a, b ∈ ℝ^p,

E [a^{'} {\tilde{A}}_{n} (β_{0}) b] = a^{'} E [n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} \frac{Δ_{i}}{G ({\tilde{T}}_{i})} φ_{i} (β_{0})] b = a^{'} [n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} E {φ_{i} (β_{0})}] b,

and by integration by parts

E {φ_{i} (β_{0})} = \int_{- \infty}^{\infty} \frac{\sqrt{n}}{σ_{i}} ϕ (- \frac{\sqrt{n} η_{i}}{σ_{i}}) f_{1 i} (η_{i}) d η_{i} = f_{1 i} (0) + \int_{- \infty}^{\infty} \sqrt{n} η_{i} ϕ (- \sqrt{n} η_{i}) {f^{'}}_{1 i} (σ_{i} η_{i}^{*}) d η_{i},

where η_i ≡ η_i(β₀) for brevity and $η_{i}^{*}$ is some point between 0 and η_i. Since $\lim_{n \to \infty} \sup_{u \in ℝ} | u ϕ (- \sqrt{n} u) | = 0$ and from Assumption (A3), we find that E{φ_i(β₀)} → f_i(0) as n → ∞, and thus

\lim_{n \to \infty} a^{'} {\tilde{A}}_{n} (β_{0}) b = a^{'} {\lim_{n \to \infty} n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} f_{i} (0)} b = a^{'} A b .

(9)

Following similar arguments in Pang, Lu, and Wang (2012), we can also show that

var [a^{'} {n^{- 1} \sum_{i = 1}^{n} Z_{i} {Z^{'}}_{i} \frac{Δ_{i}}{G ({\tilde{T}}_{i})} φ_{i} (β_{0})} b] \to 0.

This result, coupled with (9), implies Ã_n(β₀) →_p A, and hence Ã_n(β_n) →_p A according to the consistency result of ${\tilde{β}}_{n}$ to β₀. It then follows from the asymptotic normality of ${\hat{β}}_{n}$ (Peng and Fine, 2009) that $n^{1 / 2} ({\tilde{β}}_{n} - β_{0}) \to_{d} N (0, Ψ)$ . In addition, ${\tilde{Ψ}}_{n} ({\tilde{β}}_{n}) = {{\tilde{A}}_{n} ({\tilde{β}}_{n})}^{- 1} {\hat{Γ}}_{n} ({\tilde{β}}_{n}) {{\tilde{A}}_{n} ({\tilde{β}}_{n})}^{- 1}$ converges in probability to Ψ = A⁻¹ΓA⁻¹, since ${\hat{Γ}}_{n} ({\tilde{β}}_{n}) \to_{p} Γ$ as n → ∞.

Footnotes

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]
Brown BM, Wang YG. Standard errors and covariance matrices for smoothed rank estimators. Biometrika. 2005;92:149–158. [Google Scholar]
Brown BM, Wang YG. Induced smoothing for rank regression with censored survival times. Statistics in Medicine. 2006;26:828–836. doi: 10.1002/sim.2576. [DOI] [PubMed] [Google Scholar]
Chiou SH, Kang S, Yan J. Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine. 2015;34:1495–1510. doi: 10.1002/sim.6415. [DOI] [PubMed] [Google Scholar]
Chiou SH, Kang S, Yan J. Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. Journal of American Statistical Association. 2015;110:621–629. [Google Scholar]
Choi S, Huang X. Efficient semiparametric mixture inferences on cure rate models for competing risks. Canadian Journal of Statistics. 2015;43:420–435. [Google Scholar]
Cormier JN, Huang X, Xing Y, Thall PF, Wang X, Benjamin RS, Pollock RE, Antonescu CR, Maki RG, Brennan MF, Pisters PWT. Cohort analysis of patients with localized high-risk extremity soft tissue sarcoma treated at two cancer centers: Chemotherapy-associated outcomes. Journal of Clinical Oncology. 2004;22:4567–4574. doi: 10.1200/JCO.2004.02.057. [DOI] [PubMed] [Google Scholar]
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. Journal of American Statistical Association. 1999;94:496–509. [Google Scholar]
Fleming RT, Harrington PD. Counting Processes and Survival Analysis. Wiley; New Jersey: 1991. [Google Scholar]
Huang Y. Calibration regression of censored lifetime medical cost. Journal of American Statistical Association. 2002;97:318–327. [Google Scholar]
Gray RJ. A class of K-sampling tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics. 16:1141–1154. [Google Scholar]
Johnson LM, Strawderman RL. Induced smoothing for the semiparametric accelerated failure time model: Asymptotics and extensions to clustered data. Biometrika. 2009;96:577–590. doi: 10.1093/biomet/asp025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]
Li H, Zhang J, Tang Y. Induced smoothing for the semiparametric accelerated hazards model. Computational Statistics and Data Analysis. 2012;56:4312–4319. doi: 10.1016/j.csda.2012.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li R, Peng L. Quantile regression for left-truncated semicompeting risks data. Biometrics. 2011;67:701–710. doi: 10.1111/j.1541-0420.2010.01521.x. [DOI] [PubMed] [Google Scholar]
Li R, Peng L. Quantile regression adjusting for dependent censoring from semicompeting risks. Journal of Royal Statistical Society, Series B. 2015;77:107–130. doi: 10.1111/rssb.12063. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin DY. Non-parametric inference for cumulative incidence functions in competing risks studies. Statistics in Medicine. 1997;16:901–910. doi: 10.1002/(sici)1097-0258(19970430)16:8<901::aid-sim543>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
Pang L, Lu W, Wang HJ. Variance estimation in censored quantile regression via induced smoothing. Computational Statistics and Data Analysis. 2012;56:785–796. doi: 10.1016/j.csda.2010.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng L, Huang Y. Survival analysis with quantile regression models. Journal of American Statistical Association. 2008;103:637–649. [Google Scholar]
Peng L, Fine JP. Nonparametric quantile inference with competing risks. Biometrika. 2007;94:735–744. [Google Scholar]
Peng L, Fine JP. Competing risks quantile regression. Journal of American Statistical Association. 2009;104:1440–1453. [Google Scholar]
Portnoy S. Censored regression quantiles. Journal of American Statistical Association. 2003;98:1001–1012. [Google Scholar]
Powell JL. Least absolute deviations estimation for the censored regression model. Journal of Econometrics. 1984;25:303–325. [Google Scholar]
Powell JL. Censored regression quantiles. Journal of Econometrics. 1986;32:143–155. [Google Scholar]
Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]
Sun Y, Wang HJ, Gilbert PB. Quantile regression for competing risks data with missing cause of failure. Statistica Sinica. 2012;22:703–728. doi: 10.5705/ss.2010.093. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H, Wang L. Locally weighted censored quantile regression. Journal of American Statistical Association. 2009;104:1117–1128. [Google Scholar]
Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of American Statistical Association. 1995;90:178–184. [Google Scholar]
Zeng D, Lin DY. Efficient resampling methods for nonsmooth estimating functions. Biostatistics. 2008;9:355–363. doi: 10.1093/biostatistics/kxm034. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp DataS1

NIHMS970979-supplement-Supp_DataS1.RData^{(41.6KB, RData)}

Supp TableS1

NIHMS970979-supplement-Supp_TableS1.R^{(4.2KB, R)}

Supp TableS2

NIHMS970979-supplement-Supp_TableS2.R^{(4.2KB, R)}

Supp figS1

NIHMS970979-supplement-Supp_figS1.pdf^{(5.9KB, pdf)}

Supp figS2

NIHMS970979-supplement-Supp_figS2.pdf^{(8.4KB, pdf)}

Supp info1

NIHMS970979-supplement-Supp_info1.Rhistory^{(10.3KB, Rhistory)}

supp info2

NIHMS970979-supplement-supp_info2.R^{(9.7KB, R)}

supp info3

NIHMS970979-supplement-supp_info3.R^{(5.8KB, R)}

[R1] Bang H, Tsiatis AA. Median regression with censored cost data. Biometrics. 2002;58:643–649. doi: 10.1111/j.0006-341x.2002.00643.x. [DOI] [PubMed] [Google Scholar]

[R2] Brown BM, Wang YG. Standard errors and covariance matrices for smoothed rank estimators. Biometrika. 2005;92:149–158. [Google Scholar]

[R3] Brown BM, Wang YG. Induced smoothing for rank regression with censored survival times. Statistics in Medicine. 2006;26:828–836. doi: 10.1002/sim.2576. [DOI] [PubMed] [Google Scholar]

[R4] Chiou SH, Kang S, Yan J. Rank-based estimating equations with general weight for accelerated failure time models: An induced smoothing approach. Statistics in Medicine. 2015;34:1495–1510. doi: 10.1002/sim.6415. [DOI] [PubMed] [Google Scholar]

[R5] Chiou SH, Kang S, Yan J. Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. Journal of American Statistical Association. 2015;110:621–629. [Google Scholar]

[R6] Choi S, Huang X. Efficient semiparametric mixture inferences on cure rate models for competing risks. Canadian Journal of Statistics. 2015;43:420–435. [Google Scholar]

[R7] Cormier JN, Huang X, Xing Y, Thall PF, Wang X, Benjamin RS, Pollock RE, Antonescu CR, Maki RG, Brennan MF, Pisters PWT. Cohort analysis of patients with localized high-risk extremity soft tissue sarcoma treated at two cancer centers: Chemotherapy-associated outcomes. Journal of Clinical Oncology. 2004;22:4567–4574. doi: 10.1200/JCO.2004.02.057. [DOI] [PubMed] [Google Scholar]

[R8] Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. Journal of American Statistical Association. 1999;94:496–509. [Google Scholar]

[R9] Fleming RT, Harrington PD. Counting Processes and Survival Analysis. Wiley; New Jersey: 1991. [Google Scholar]

[R10] Huang Y. Calibration regression of censored lifetime medical cost. Journal of American Statistical Association. 2002;97:318–327. [Google Scholar]

[R11] Gray RJ. A class of K-sampling tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics. 16:1141–1154. [Google Scholar]

[R12] Johnson LM, Strawderman RL. Induced smoothing for the semiparametric accelerated failure time model: Asymptotics and extensions to clustered data. Biometrika. 2009;96:577–590. doi: 10.1093/biomet/asp025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–50. [Google Scholar]

[R14] Li H, Zhang J, Tang Y. Induced smoothing for the semiparametric accelerated hazards model. Computational Statistics and Data Analysis. 2012;56:4312–4319. doi: 10.1016/j.csda.2012.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Li R, Peng L. Quantile regression for left-truncated semicompeting risks data. Biometrics. 2011;67:701–710. doi: 10.1111/j.1541-0420.2010.01521.x. [DOI] [PubMed] [Google Scholar]

[R16] Li R, Peng L. Quantile regression adjusting for dependent censoring from semicompeting risks. Journal of Royal Statistical Society, Series B. 2015;77:107–130. doi: 10.1111/rssb.12063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Lin DY. Non-parametric inference for cumulative incidence functions in competing risks studies. Statistics in Medicine. 1997;16:901–910. doi: 10.1002/(sici)1097-0258(19970430)16:8<901::aid-sim543>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]

[R18] Pang L, Lu W, Wang HJ. Variance estimation in censored quantile regression via induced smoothing. Computational Statistics and Data Analysis. 2012;56:785–796. doi: 10.1016/j.csda.2010.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Peng L, Huang Y. Survival analysis with quantile regression models. Journal of American Statistical Association. 2008;103:637–649. [Google Scholar]

[R20] Peng L, Fine JP. Nonparametric quantile inference with competing risks. Biometrika. 2007;94:735–744. [Google Scholar]

[R21] Peng L, Fine JP. Competing risks quantile regression. Journal of American Statistical Association. 2009;104:1440–1453. [Google Scholar]

[R22] Portnoy S. Censored regression quantiles. Journal of American Statistical Association. 2003;98:1001–1012. [Google Scholar]

[R23] Powell JL. Least absolute deviations estimation for the censored regression model. Journal of Econometrics. 1984;25:303–325. [Google Scholar]

[R24] Powell JL. Censored regression quantiles. Journal of Econometrics. 1986;32:143–155. [Google Scholar]

[R25] Prentice RL, Kalbfleisch JD, Peterson AV, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed] [Google Scholar]

[R26] Sun Y, Wang HJ, Gilbert PB. Quantile regression for competing risks data with missing cause of failure. Statistica Sinica. 2012;22:703–728. doi: 10.5705/ss.2010.093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Wang H, Wang L. Locally weighted censored quantile regression. Journal of American Statistical Association. 2009;104:1117–1128. [Google Scholar]

[R28] Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of American Statistical Association. 1995;90:178–184. [Google Scholar]

[R29] Zeng D, Lin DY. Efficient resampling methods for nonsmooth estimating functions. Biostatistics. 2008;9:355–363. doi: 10.1093/biostatistics/kxm034. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Smoothed quantile regression analysis of competing risks

Sangbum Choi

Sangwook Kang

Xuelin Huang

Abstract

1 Introduction

2 Models and methods

2.1 Notation and assumptions

2.2 Competing risks quantile regressions

2.3 Induced smoothing for competing risks data

3 Simulation studies

Table 1.

Table 2.

4 Analysis of soft tissue sarcoma data

Figure 1.

Figure 2.

Table 3.

5 Conclusion and remarks

Supplementary Material

Acknowledgments

Appendix

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Smoothed quantile regression analysis of competing risks

Sangbum Choi

Sangwook Kang

Xuelin Huang

Abstract

1 Introduction

2 Models and methods

2.1 Notation and assumptions

2.2 Competing risks quantile regressions

2.3 Induced smoothing for competing risks data

3 Simulation studies

Table 1.

Table 2.

4 Analysis of soft tissue sarcoma data

Figure 1.

Figure 2.

Table 3.

5 Conclusion and remarks

Supplementary Material

Acknowledgments

Appendix

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases