Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data

Lynn M Johnson; Robert L Strawderman

doi:10.1093/biomet/asp025

. 2009 Jun 25;96(3):577–590. doi: 10.1093/biomet/asp025

Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data

Lynn M Johnson ¹, Robert L Strawderman ²

PMCID: PMC3412573 PMID: 23049117

Abstract

This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton–Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost.

Some key words: Censoring, Convex optimization, Multivariate survival data, Rank regression

1. Introduction

The need to analyze failure time data, possibly subject to right-censoring, arises in a number of fields, including medicine, economics, epidemiology, demography and engineering. Semiparametric regression models are commonly used for characterizing the relationship between failure time and covariates, with the Cox proportional hazards regression model (Cox, 1972) being used almost exclusively in practice. The accelerated failure time model (e.g. Kalbfleisch & Prentice, 2002) provides a useful but infrequently used alternative to the Cox proportional hazards model. Letting T̄_i and X_i respectively denote the failure time and vector of observed covariates for subject i (i = 1, . . . , n), the accelerated failure time model specifies that log T̄_i = X′_iβ + ∊_i, where the error terms are independent and identically distributed with an unspecified distribution. The regression coefficient β has a nice interpretation and a variety of simple estimators are available when T̄₁, . . . , T̄_n are fully observed. The infrequent use of this model in applications involving censored failure time data may be largely attributed to the computational challenges that arise in both regression parameter and covariance matrix estimation.

In the presence of censoring, the observed data for subject i can be described by the triplet (T_i, Δ_i, X_i) where T_i = min(T̄_i, C_i), Δ_i = I (T̄_i ⩽ C_i), and C_i denotes the censoring time for subject i. Tsiatis (1990) proposes to estimate β using a weighted estimating equation of the form

W_{n}^{*} (β) = \sum_{i = 1}^{n} w_{i} (β) Δ_{i} [X_{i} - \frac{\sum_{j = 1}^{n} X_{j} I {e_{j} (β) ⩾ e_{i} (β)}}{\sum_{j = 1}^{n} I {e_{j} (β) ⩾ e_{i} (β)}}],

(1)

where e_i (β) = log(T_i) − X′_iβ and w_i (·) are nonnegative weight functions (i = 1, . . . , n). Due to the fact that β appears in this expression only inside indicator functions, $W_{n}^{*} (β)$ is not a continuous function of β and a solution to $W_{n}^{*} (β) = 0$ typically does not exist. Parameter estimates may instead be obtained by minimizing $‖ W_{n}^{*} (β) ‖$ , where ||v|| denotes (v′v)^1/2 for a vector v. However, this minimization problem may admit several solutions and, because $W_{n}^{*} (β)$ is not necessarily monotone in β, the resulting set of minimizers is not even guaranteed to be convex. Hence, despite the existence of a consistent and asymptotically normal sequence of generalized solutions (e.g. Tsiatis, 1990), identifying this sequence can be challenging in practice.

Fygenson & Ritov (1994) show that using the Gehan weight function $w_{i} (β) = \sum_{j = 1}^{n} I {e_{j} (β) ⩾ e_{i} (β)}$ (i = 1, . . . , n) leads to the monotone estimating equation

W_{n} (β) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} Δ_{i} (X_{i} - X_{j}) I {e_{i} (β) - e_{j} (β) ⩽ 0} .

(2)

Recognizing that W_n(β) is the gradient of the convex objective function

O_{n} (β) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} Δ_{i} {e_{j} (β) - e_{i} (β)} I {e_{i} (β) - e_{j} (β) ⩽ 0},

(3)

a regression parameter estimate may be obtained by minimizing O_n(β) with respect to β. The resulting set of solutions is convex and thus easier to locate than in the general case. However, even in this comparatively nice setting, the associated lack of smoothness continues to present computational challenges. Numerous methods have been proposed for finding parameter estimates derived from (2) and (3). The most promising methods to date utilize linear programming techniques (e.g. Jin et al., 2003). However, while such methods can be implemented with relative ease, the computational burden can be high, especially with large datasets.

Estimating the covariance matrix of the regression parameter estimate obtained under the accelerated failure time model remains a challenging problem. Fygenson & Ritov (1994) show that the regression parameter estimate derived from (3) is asymptotically normal with a covariance matrix that involves the hazard function of the unspecified error distribution. Direct estimation of the covariance matrix thus requires an estimate of this hazard function. Tsiatis (1990) suggests kernel-based estimation, whereas Fygenson & Ritov (1994) suggest a form of numerical differentiation. Both have proven to be unstable choices in the presence of censored data and several authors have since tackled this problem in other ways; see, for example, Jones (1997) and Jin et al. (2003). Jin et al. (2003) propose to randomly reweight the Gehan log-rank objective function (3) and then minimize the resulting perturbed objective function. Repeating this process a large number of times, the covariance matrix may then be estimated using the empirical covariance matrix of these parameter estimates. This interesting and useful approach eliminates the need to estimate the indicated hazard function. However, the computationally intensive nature of this procedure quickly becomes unwieldy, particularly with large datasets. Huang (2002), Strawderman (2005) and Jin et al. (2006a) propose useful alternatives in three related problems.

Several authors have recently proposed useful smoothing methods for nonsmooth estimating equations arising in the accelerated failure time model; see, for example, Brown & Wang (2005; 2006), Heller (2007) and Song et al. (2007). Each of these smoothing methods leads to a continuously differentiable objective or estimating function that can be dealt with using standard numerical methods. Of direct relevance to this paper are Brown & Wang (2006) and Heller (2007). Building on Brown & Wang (2005), Brown & Wang (2006) propose the use of induced smoothing for the Gehan estimating equation (2). This method, described in more detail in §2, involves solving the equation E_Z {W_n(β + Γ_nZ)} = 0, where W_n(·) is given in (2), Z is a continuous, mean zero normal random vector independent of all of the data, and Γ_n is a sequence of matrices converging to zero with elements Γ_n,ij = O_p(n^−1/2). The smoothed estimating equation E_Z {W_n(β + Γ_nZ)} reduces to

{\tilde{W}}_{n} (β) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} Δ_{i} (X_{i} - X_{j}) Φ {\frac{e_{j} (β) - e_{i} (β)}{r_{n, i j}}},

(4)

where $r_{n, i j}^{2} = {(X_{i} - X_{j})}^{'} Σ_{n} (X_{i} - X_{j})$ , $Σ_{n} = Γ_{n}^{2}$ , and Φ (·) denotes the standard normal cumulative distribution function. In a related vein, Heller (2007) directly approximates the indicator function I(u ⩽ 0) in W_n(β) with 1 − ϒ(u/h), where ϒ(·) denotes a local distribution function satisfying certain conditions and the fixed scalar parameter h is used to control the accuracy of approximation. The resulting estimating equation,

W_{n}^{* *} (β) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} Δ_{i} (X_{i} - X_{j}) ϒ {\frac{e_{j} (β) - e_{i} (β)}{h}},

(5)

has the same structure as (4). In fact, upon taking ϒ(·) to be the standard normal distribution function Φ(·), (5) is essentially a special case of (4), utilizing a fixed bandwidth h in place of the covariate-dependent bandwith r_n,ij. Heller (2007) also proposes a robust version of (5) having a bounded influence function. A potential difference between (4) and (5) lies in the ability of the former to employ a smoothing parameter that respects the scaling and covariance structure of the solution sequence. Brown & Wang (2006) claim but do not prove that the sequence of solutions obtained under (4) has the same asymptotic distribution as that obtained in the absence of smoothing. Heller (2007) proves that the solution sequence obtained under (5) is consistent and asymptotically normal, provided that h satisfies nh → ∞ and nh⁴ → 0 as n → ∞. Interestingly, Heller (2007) further proves that (2) and (5) are asymptotically equivalent but does not establish the equivalence result posited in Brown & Wang (2006).

The problem of regression parameter estimation under the accelerated failure time model with correlated survival data has also been considered. For example, Lin & Wei (1992), Lee et al. (1993) and Jin et al. (2006b) consider the setting in which failure times are grouped into clusters, such that observations within a cluster may be correlated but observations in distinct clusters may be considered independent. Each proposes a marginal method for rank-based estimation of regression parameters, avoiding the need to model the correlation structure among observations. Jin et al. (2006b) also devise a suitable extension of the resampling procedure proposed in Jin et al. (2003). Pan (2001) and Zhang & Peng (2007) instead propose frailty models, handling the dependence among failure times within a cluster using an additive cluster level random effect; see also Strawderman (2006) for related work in the case of a recurrent event outcome. These various methods suffer from estimation and computational challenges that equal or exceed those experienced in the case of independent failure time data. However, to our knowledge, the validity and utility of smoothing methods like those developed in Brown & Wang (2006) and Heller (2007) have not been investigated in connection with the clustered data problem.

This paper extends the smoothing procedure of Brown & Wang (2006) to the problem of marginal estimation of the regression parameter in the presence of clustered data. We prove that the resulting estimator is consistent and asymptotically normal in both the independent and correlated data settings. We further establish the equivalence of these limiting distributions with those arising in the unsmoothed case, providing rigorous justification of the equivalence claim made in Brown & Wang (2006) for the case of independent failure times and its extension to the setting of clustered data. Several possible methods of covariance matrix estimation are evaluated, among them a generalization of the Brown & Wang (2006) procedure and a modification of the resampling procedure due to Jin et al. (2006b). A useful consequence of developing the extended Brown & Wang (2006) estimator is an easy-to-compute sandwich estimator that avoids the need for resampling. The proposed methods substantially ease the computational burden of previously proposed methods for parameter and covariance matrix estimation.

2. Methodology and key results

2.1. Notation and assumptions

Consider a random sample of n independent clusters with K_i members in the i th cluster. Let T̄_ik and C_ik denote the failure time and censoring time for the kth member of the i th cluster, and let X_ik denote the corresponding p × 1 vector of covariates. We assume that (T̄_i₁, . . . , T̄_{iK_i})′ and (C_i₁, . . . , C_{iK_i})′ are independent conditional on the covariates (X_i₁, . . . , X_{iK_i})′. Let the survival data for the kth member of ith cluster be denoted W_ik = (log T_ik, Δ_ik, X_ik)′, where T_ik = min(T̄_ik, C_ik) and Δ_ik = I(T̄_ik ⩽ C_ik).

We assume that the marginal distribution of T_ik follows the accelerated failure time model

log {\bar{T}}_{i k} = X_{i k}^{'} β_{0} + ∊_{i k},

where β₀ is a p × 1 vector of unknown regression parameters contained in a compact subset 𝔹 of ℝ^p and (∊_i₁, . . . , ∊_{iK_i})′ (i = 1, . . . , n) are independent random error vectors. Within each cluster i, the error terms ∊_i₁, . . . , ∊_{iK_i} may be correlated; however, as in Jin et al. (2006b, § 4), we assume that these error terms are exchangeable with a common, unknown marginal distribution. That is, for any i, j = 1, . . . , n and K ⩽ min(K_i, K_j), the vectors (∊_i₁, . . . , ∊_iK)′ and (∊_j₁, . . . , ∊_jK)′ have the same distribution. Evidently, the case of independent failure time data follows as a special case of the above model upon setting K_i = 1 for all i.

2.2. Estimation for clustered data using the Gehan weight

Let e_ik(β) = log(T_ik) − X′_ikβ. Under the assumptions of §2.1, the relevant extension of (3) to the clustered data setting may be written

L_{n} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Δ_{i k} {e_{j l} (β) - e_{i k} (β)} I {e_{i k} (β) - e_{j l} (β) ⩽ 0};

(6)

see, for example, Jin et al. (2006b, §4). Observe that L_n(β) is a continuous convex function for β ∈ 𝔹 and thus differentiable almost everywhere. The derivative of the objective function with respect to β, or S_n(β) = ∇L_n(β), is the discontinuous function

S_{n} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Δ_{i k} (X_{i k} - X_{j l}) I {e_{i k} (β) - e_{j l} (β) ⩽ 0} .

(7)

Let β̂_n = argmin_β_∈𝔹 L_n(β). The solution to this minimization problem may not be unique; however, the convexity of L_n(β) implies that the set of minimizers on 𝔹 is convex (e.g. Fygenson & Ritov, 1994). The lack of smoothness makes minimization of L_n(β) computationally challenging, particularly with multiple covariates. However, under regularity conditions to be described later, the results of Jin et al. (2006b, Theorem 5) imply that there exists a sequence of solutions that is strongly consistent for β₀ and, in addition, such that n^1/2(β̂_n − β₀) converges in distribution to a N(0, A⁻¹ΩA⁻¹) random vector, where Ω = lim_n_→∞ var{n^1/2 S_n(β₀)} and A = ∇S₀(β₀) for S₀(β) = lim_n_→∞ S_n(β). An explicit formula for A is provided in (A1). In addition to the numerical challenges that arise in computing the solution β̂_n, variance estimation is difficult due to the dependence on A and the fact that S_n(β) is not differentiable in β.

2.3. Induced smoothing for clustered data

Brown & Wang (2005) propose an induced smoothing method for approximating discontinuous but monotone estimating functions using continuously differentiable functions. Assuming independent failure time observations, Brown & Wang (2006) apply this smoothing method to the problem of estimating the regression parameter in the accelerated failure time model, using (4) in place of (2). As shown below, the extension of this methodology to the problem of estimating β in the clustered data setting under the assumptions of §2.1 is straightforward.

Let Z be a N(0, I_p) random vector independent of the data, where I_p denotes the p × p identity matrix. Let Γ be a p × p matrix such that ||Γ|| = O(1) and Γ² = Σ, where Σ is some symmetric, positive definite matrix. Then, similarly to Brown & Wang (2005, 2006), a smoothed score function may be constructed by adding the random perturbation n^−1/2ΓZ to the argument of the score function S_n(β) in (7) and then taking the expectation with respect to Z. Specifically, with S̃_n(β) = E_Z{S_n(β + n^−1/2ΓZ)}, an easy calculation shows that

{\tilde{S}}_{n} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Δ_{i k} (X_{i k} - X_{j l}) Φ [n^{1 / 2} {\frac{e_{j l} (β) - e_{i k} (β)}{r_{i k j l}}}],

(8)

where $r_{i k j l}^{2} = {(X_{i k} - X_{j l})}^{'} Σ (X_{i k} - X_{j l})$ . With K_i = K_j = 1 for i, j = 1, . . . , n, this estimating equation reduces to (4). Alternatively, one might work directly with the smoothed objective function L̃_n(β) = E_Z {L_n(β + n^−1/2ΓZ)}. Let ϕ(·) denote the standard normal density function. Then, using standard results for normal random variables and integration by parts, we have

{\tilde{L}}_{n} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Δ_{i k} [{e_{j l} (β) - e_{i k} (β)} H_{i k j l}^{(n)} (β) + \frac{r_{i k j l}}{n^{1 / 2}} h_{i k j l}^{(n)} (β)],

(9)

where

H_{i k j l}^{(n)} (β) = Φ [n^{1 / 2} {\frac{e_{j l} (β) - e_{i k} (β)}{r_{i k j l}}}], h_{i k j l}^{(n)} (β) = ϕ [n^{1 / 2} {\frac{e_{j l} (β) - e_{i k} (β)}{r_{i k j l}}}] .

(10)

A straightforward calculation shows that ∇L̃_n(β) = S̃_n(β).

Let β̃_n = argmin_β_∈𝔹 L̃_n(β). The smoothed objective function, L̃_n(β), is convex and continuously differentiable and standard numerical methods can be used to efficiently compute β̃_n. Alternatively, β̃_n can be found as the multivariate root of S̃_n(β). The asymptotic results, summarized below and proved in the Appendix, also imply that inference for β̃_n is straightforward.

Theorem 1. Let Σ = Γ² be any symmetric and positive definite matrix with ||Γ|| < ∞. Under conditions A1–A4 of the Appendix, β̃_n is a strongly consistent estimator of β₀.

Theorem 2. Let Σ = Γ² be any symmetric and positive definite matrix with ||Γ|| < ∞. Under conditions A1–A6 of the Appendix, n^1/2(β̃_n − β₀) converges in distribution to N (0, Ψ), where Ψ = A⁻¹ΩA⁻¹, Ω = lim_n_→∞ var{n^1/2S_n(β₀)} and A = ∇S₀(β₀) is defined in (A1).

The above results provide theoretical justification for the proposed smoothing procedure when estimating regression parameters under the marginal accelerated failure time model with clustered failure time data. Importantly, the matrices A and Ω in Theorem 2 are defined in terms of the limiting behaviour (7), demonstrating that the limiting distribution of n^1/2(β̃_n − β₀) coincides with that for n^1/2(β̂_n − β₀), where β̂_n is obtained via the unsmoothed objective function (6). Since justification for the independent data case follows directly from the above theorems upon setting K_i = 1 (i = 1, . . . , n), Theorems 1 and 2 also provide rigorous justification for the claims made in Brown & Wang (2006).

Remark 1. The above results hold for a general smoothing matrix Γ that satisfies certain minimal conditions. Brown & Wang (2006) propose an iterative procedure for estimating β₀, in which Σ = Γ² is updated at each iteration using successive estimates of Ψ. One implementation of this procedure in the clustered data setting is provided in §3.2.

Remark 2. The smoothing bandwidth employed in (8) and (9) is O(n^−1/2), where n denotes the number of independent clusters. In the absence of clustering, Heller (2007) recommends the choice h = σ̂n^−0.26 in (5), where σ̂ is an estimate of the residual variance obtained using a minimizer of the unsmoothed equation (3). The selection h = O(n^−0.26) is motivated as that which provides “the quickest rate of convergence while satisfying the bandwidth constraint nh⁴ → 0”. In asymptotic terms, Theorem 2 suggests that such oversmoothing is unnecessary.

3. Methods of variance estimation

3.1. The sandwich variance estimator

The sandwich form of the covariance matrix of n^1/2(β̃_n − β₀) in Theorem 2 suggests a natural estimator provided that suitable estimates of both A and Ω can be found. In the independent data case, Brown & Wang (2006) suggest estimating A with Ã_n = ∇S̃_n(β̃_n); Theorems 1 and 2 imply that this remains a consistent estimator in the clustered data setting. Brown & Wang (2006) further suggest several estimates of Ω, including the asymptotic variance of n^1/2S_n(β₀) provided in Jin et al. (2003) and an estimator of Ω based on the U-statistic structure of the estimating function (4). However, neither estimator of Ω properly accounts for the correlation between observations within a cluster. Lee et al. (1993) show that the asymptotic variance of n^1/2S_n(β₀) in the clustered data case can be consistently estimated via Ω̂_n = Ω̂_n(β̂_n), where

{\hat{Ω}}_{n} (β) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{l = 1}^{K_{i}} {{\hat{ξ}}_{i k} (β)}^{\otimes 2},

v^⊗2 = vv′ for any vector v, and

\begin{array}{l} {\hat{ξ}}_{i k} (β) & = & \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} (\frac{Δ_{i k}}{n} (X_{i k} - X_{j l}) I {e_{i k} (β) < e_{j l} (β)} \\ - \frac{Δ_{j l}}{n} [\frac{\sum_{r = 1}^{n} \sum_{s = 1}^{K_{r}} (X_{i k} - X_{r s}) I {e_{r s} (β) ⩾ e_{j l} (β)}}{\sum_{m = 1}^{n} \sum_{k = 1}^{K_{m}} I {e_{m k} (β) ⩾ e_{j l} (β)}}] I {e_{i k} (β) ⩾ e_{j l} (β)}) . \end{array}

Conditions A1–A5 of the Appendix ensure that Ω̂_n is a consistent estimator of Ω; with the addition of condition A6, Ψ can be consistently estimated in the clustered data setting using

{\hat{Ψ}}_{n} = {\tilde{A}}_{n}^{- 1} {\hat{Ω}}_{n} {\tilde{A}}_{n}^{- 1} .

(11)

3.2. The Brown & Wang (2006) procedure for clustered data

As suggested in Brown & Wang (2006), an iterative procedure can be used to simultaneously estimate the regression parameters and their covariance matrix. Denoting Ã_n(β) = ∇S̃_n(β), the proposed procedure consists of the following steps in the clustered data setting:

Step 1. Set i = 0 and initialize Σ̂₍₀₎ such that ||Σ̂₍₀₎|| = O(1); for example, Σ̂₍₀₎ = I_p.
Step 2. Set i = i + 1 and solve S̃_n(β) = 0 for β̃_(i) using Γ = (Σ̂_(i−1))^1/2 in equation (8).
Step 3. Using β̃_(i), calculate Ã_(i) = Ã_n(β̃_(i)) and Ω̂_(i) = Ω̂(β̃_(i)).
Step 4. Compute ${\hat{Σ}}_{(i)} = {\tilde{A}}_{(i)}^{- 1} {\hat{Ω}}_{(i)} {\tilde{A}}_{(i)}^{- 1}$ .
Step 5. Repeat steps 2–4 until convergence of both β̃_(i) and Σ̂_(i) is achieved to a specified tolerance.

In our experience, convergence of this algorithm typically occurs with relatively few iterations, the value of Σ̂_(*) at convergence being very close to Ψ̂_n in (11).

Remark 3. The above procedure makes use of a data-dependent smoothing parameter. The proofs of Theorems 1 and 2 assume that the matrix Γ is known; however, since ||Γ̂_(*) − (A⁻¹ΩA⁻¹)^1/2|| = O_p(n^−1/2), replacing Γ by Γ̂_(*) does not alter these asymptotic results.

3.3. The resampling variance estimator

Jin et al. (2006b, §4) propose a useful resampling method for estimating Ψ in the presence of correlated data. This method, which can be motivated by the conditional multiplier central limit theorem (e.g. Martinussen & Scheike, 2006, p. 43), involves randomly reweighting the Gehan log-rank objective function (6) and then minimizing the resulting perturbed objective function. Specifically, let

L_{n}^{*} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Z_{i} Z_{j} Δ_{i k} {e_{j l} (β) - e_{i k} (β)} I {e_{i k} (β) - e_{j l} (β) ⩽ 0},

where Z₁, . . . , Z_n are independent positive random variables with E(Z_i) = var(Z_i) = 1 (i = 1, . . . , n). Let ${\hat{β}}_{n}^{*} = {argmin}_{β \in 𝔹} L_{n}^{*} (β)$ . Jin et al. (2006b, Theorem 5) prove that, conditional on the data {W_ik; k = 1, . . . , K_i ; i = 1, . . . , n}, the limiting distribution of $n^{1 / 2} ({\hat{β}}_{n}^{*} - {\hat{β}}_{n})$ converges almost surely to the limiting distribution of n^1/2(β̂_n − β₀). Thus, the distribution of β̂_n can be approximated by repeatedly generating random samples Z₁, . . . , Z_n and then minimizing $L_{n}^{*} (β)$ to obtain realizations of ${\hat{β}}_{n}^{*}$ . The covariance matrix of β̂_n can be approximated directly by the empirical covariance matrix of the realizations of ${\hat{β}}_{n}^{*}$ .

Jin et al. (2006b, §4) work directly with the unsmoothed Gehan objective function and utilize linear programming methods in combination with resampling in order to obtain regression parameter and covariance matrix estimates. Specifically, linear programming is used to minimize L_n(β), obtaining the estimated regression parameter β̂_n; it is then applied repeatedly in minimizing each of the realizations of $L_{n}^{*} (β)$ generated for the purposes of covariance matrix esimation. The use of linear programming methods can be avoided by randomly reweighting the smoothed objective function L̃_n(β) in (9). Such an approach allows for standard numerical methods to be used for minimization, resulting in the potential for computational savings with larger datasets. With Z₁, . . . , Z_n defined as above, let

{\tilde{L}}_{n}^{*} (β) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} \sum_{k = 1}^{K_{i}} \sum_{j = 1}^{n} \sum_{l = 1}^{K_{j}} Z_{i} Z_{j} Δ_{i k} [{e_{j l} (β) - e_{i k} (β)} H_{i k j l}^{(n)} (β) + \frac{r_{i k j l}}{n^{1 / 2}} h_{i k j l}^{(n)} (β)],

where $H_{i k j l}^{(n)} (β)$ and $h_{i k j l}^{(n)} (β)$ are defined in (10), and define ${\tilde{β}}_{n}^{*} = {argmin}_{β \in 𝔹} {\tilde{L}}_{n}^{*} (β)$ . Theorems 1 and 2 imply that an argument identical to the one given in Jin et al. (2006b, Theorem 5) can be used to show that the limiting distribution of $n^{1 / 2} ({\tilde{β}}_{n}^{*} - {\tilde{β}}_{n})$ converges almost surely to the limiting distribution of n^1/2(β̃_n − β₀). The covariance matrix of β̃_n can then be approximated exactly as described above, using the simulated realizations of ${\tilde{β}}_{n}^{*}$ in place of the ${\hat{β}}_{n}^{*} s$ .

4. Simulation study

Simulation studies were carried out to assess the performance of β̃_n as well as to evaluate the covariance matrix estimators described in §3. The proposed simulation studies are modelled after that described in Jin et al. (2006b, §5), allowing for a direct comparison between their simulation results and those to be summarized below.

Specifically, for each cluster, we use the algorithm of Johnson (1987, §10.1) to generate two failure times from the bivariate Gumbel distribution

F (t_{1}, t_{2}) = F_{1} (t_{1}) F_{2} (t_{2}) [1 + θ {1 - F_{1} (t_{1})} {1 - F_{2} (t_{2})}],

where −1 ⩽ θ ⩽ 1, F_k (·) is the cumulative distribution function for an exponential random variable with hazard function λ_k = exp(β₁X₁_k + β₂X₂_k), X₁_k is Ber(0.5), and X₂_k is standard normal truncated at ±2 (k = 1, 2). All covariates are generated independently and the correlation between T̄₁ and T̄₂ is θ/4. The resulting failure time model is a special case of the accelerated failure time model of § 2.1 with true regression parameters β₁ = 1 and β₂ = 0.5. Censoring times are independently generated from a Un(0,τ) distribution, where τ is selected to achieve a desired level of censoring. Similarly to Jin et al. (2006b, §5), we consider the cases θ = 0 and θ = 1, 50 clusters of size two, and censoring percentages of 0%, 25% and 50%.

Two different estimation methods are considered. Method 1 refers to the iterative method of §3.2 for simultaneously estimating the regression parameters and covariance matrix. Method 2 refers to estimating the regression parameter by minimizing the smoothed objective function (9) with the fixed choice Σ = I₂. Within Method 2, we consider estimating the covariance matrix using the resampling-based variance estimate of §3.3 and also using the sandwich variance estimator (11). All simulations were conducted in R and use the routine nlm for optimization (R Development Core Team, 2005); the simulation code is available upon request.

Table 1 summarizes the results of two simulation studies. Each row of the table is based on the same 1000 simulated datasets. In the first simulation study, the semiparametric accelerated failure time model of §2.1 is fitted using the covariates X₁_k and X₂_k. We report the results for the estimation of the regression parameters β₁ = 1 and β₂ = 0.5 and associated standard errors using Methods 1 and 2. The second simulation study repeats the first simulation study, fitting a model that uses the covariates $X_{1 k}^{*} = X_{1 k}$ and $X_{2 k}^{*} = X_{2 k} / 500$ . The underlying failure time model is identical to that used in the first simulation study, the true regression parameters now being $β_{1}^{*} = β_{1} = 1$ , $β_{2}^{*} = 500$ and β₂ = 250. However, in contrast to the first simulation study, the magnitudes of $X_{2 k}^{*}$ and $β_{2}^{*}$ are quite different from those of $X_{1 k}^{*}$ and $β_{1}^{*}$ . The results for $β_{1}^{*}$ , not shown, are very similar to those reported in Table 1 for β₁; hence, we only report the results for $β_{2}^{*}$ . The intent of the second study is to investigate the impact of using the fixed smoothing parameter Σ = I₂ versus the data-dependent smoothing parameter of § 3.2, a choice that better reflects the covariance structure and scaling of the regression parameter.

Table 1.

Simulation results based on 1000 replications and n = 50 pairs for regression parameter and standard error estimates obtained using the induced smoothing methodology. The results for β₁ and β₂ are based on an accelerated failure time model that depends on covariates X₁ and X₂; the results for $β_{2}^{*}$ are based on an accelerated failure time model that depends on covariates $X_{1}^{*} = X_{1}$ and $X_{2}^{*} = X_{2} / 500$

Regression parameter	θ	Censoring %	Method 1			Method 2
Regression parameter	θ	Censoring %	rbias	rse	rsee1	rbias	rse	rsee2a	rsee2b
β₁ = 1	0	0	0.32	0.25	0.25	0.21	0.25	0.25	0.25
		25	3.91	0.28	0.28	3.41	0.28	0.28	0.28
		50	4.30	0.35	0.36	2.22	0.34	0.36	0.35
	1	0	0.91	0.26	0.25	0.78	0.25	0.25	0.25
		25	1.58	0.27	0.28	1.12	0.27	0.27	0.27
		50	5.52	0.37	0.36	3.38	0.36	0.36	0.35
β₂ = 0.5	0	0	0.74	0.28	0.26	0.78	0.28	0.26	0.26
		25	1.89	0.30	0.30	1.58	0.30	0.30	0.30
		50	5.13	0.38	0.38	3.40	0.38	0.38	0.38
	1	0	1.23	0.26	0.26	1.20	0.26	0.26	0.26
		25	1.99	0.30	0.30	1.69	0.28	0.30	0.30
		50	4.40	0.40	0.38	2.70	0.38	0.38	0.38
$β_{2}^{*} = 250$	0	0	0.28	0.27	0.27	0.00	0.27	0.26	0.29
		25	2.99	0.29	0.30	1.96	0.29	0.29	0.35
		50	5.77	0.38	0.38	2.76	0.37	0.38	0.46
	1	0	5.80	0.26	0.27	2.40	0.26	0.26	0.30
		25	1.56	0.29	0.29	5.60	0.29	0.29	0.35
		50	5.25	0.37	0.38	2.20	0.36	0.37	0.46

Open in a new tab

rbias, 1000 × absolute relative bias; rse, empirical standard error, relative to parameter; rsee1, standard error relative to parameter, with standard error estimate obtained using iterative method of § 3.2 with Σ̂₍₀₎ = I₂; rsee2a, standard error relative to parameter, with standard error estimate obtained using the resampling procedure of § 3.3 with 500 random reweightings and regression parameters estimated using the induced smoothing procedure of § 2.3 with the fixed choice Σ = I₂; rsee2b, standard error relative to parameter, with standard error estimate based on (11) and regression parameters estimated as described for rsee2a.

Considering only β₁ and β₂, the relative biases are observed to be small, comparable in magnitude, and generally increase with the censoring percentage. In addition, estimates obtained using Method 1 frequently exhibit greater bias than those obtained using Method 2, with no apparent reduction in standard error. The standard error estimates for β₁ and β₂ are accurate and similar across all estimation methods. Remarkably, the results reported here are also comparable to those summarized in the right panel of Table 1 in Jin et al. (2006b) for the Gehan weight function, where no smoothing is employed.

Turning to the comparison of results for β₂ and $β_{2}^{*}$ , biases generally follow the patterns described above. In addition, all methods of standard error estimation perform well, though some evidence of inflation in the relative standard error rsee2b is now present. Overall, the results suggest that the smoothing parameter has a minimal impact on the bias or actual standard error of the regression parameter estimates. However, given the relative accuracy of both rsee1 and rsee2a, the discrepancy observed in rsee2b suggests that the scaling of the problem, hence choice of smoothing parameter, can adversely impact the accuracy of (11).

On the basis of these results, we recommend using Method 1 as described in §3.2; in comparison with the simulation-based methodology of Jin et al. (2006b), it requires far less computational effort with no evidence of penalty in bias or accuracy of standard error estimation.

5. Concluding remarks and further extensions

The attractive nature of the induced smoothing procedure, in both computational and theoretical terms, stems largely from the convexity of the Gehan-weighted objective function (9). The asymptotic results obtained in this paper make significant use of this convexity. A minor extension of these results can also be used to justify an alternative smoothing methodology for the bounded influence estimator introduced in Heller (2007). Variations on this smoothing methodology may facilitate simpler and more stable estimation procedures for accelerated failure time frailty models; see, for example, Pan (2001); Strawderman (2006) and Zhang & Peng (2007).

The use of the Gehan weight function in (2) has frequently been criticized for the inefficiency of the resulting estimator. The selection of an alternative weight function may result in efficiency improvements at the expense of monotonicity, resulting in weaker asymptotic statements and increased computational challenges. To counteract these drawbacks, Jin et al. (2003) propose to use the Gehan estimator as a starting point for successively solving a sequence of convex optimization problems derived from (1). Jin et al. (2006b) extend these results to the setting of multivariate failure time data. The resulting class of estimation procedures is computationally stable and yields a consistent and asymptotically normal sequence of estimators with reasonably general weight functions. However, it does not lend itself to a simple method of variance estimation. Use of the resampling method described in §3.3 is recommended for this purpose but only amplifies the required computational effort. Jin et al. (2006a) propose a strongly related class of procedures for the Buckley–James estimator. Starting from the Gehan estimator, Strawderman (2005) demonstrates how one may instead use one-step estimation to achieve the same goal and introduces an alternative simulation-based method of variance computation that requires no additional optimization. The results of this paper show that the induced smoothing methodology provides an asymptotically valid and computationally convenient starting point for each of these other methods of estimation. In addition, the methodology itself can be directly incorporated as part of the iterative methods developed in Jin et. al. (2003, 2006a, b); the asymptotic results of this paper guarantee that their results also remain valid for the corresponding smoothed version.

A direct extension of this smoothing methodology is available for general weight functions. However, it lacks the same computational convenience due to important structural differences between the Gehan-weighted estimating equation and those used for general weight functions.

Acknowledgments

The authors thank the referees, associate editor and the editor for helpful comments. This work was supported by a grant from the U.S. National Institutes of Health.

Appendix

Proofs

We impose the following regularity conditions:

Condition A1. The parameter space 𝔹 containing β₀ is a compact subset of ℝ^p.
Condition A2. $\sum_{k = 1}^{K_{i}} ‖ X_{i k} ‖ + K_{i}$ is bounded almost surely by a nonrandom constant (i = 1, . . . , n).
Condition A3. The assumptions of §2.1 hold with var(∊₁₁) < ∞.
Condition A4. The matrix A = ∇S₀(β₀), where S₀(β) = lim_n_→∞ S_n (β), exists and is nonsingular.
Condition A5. Let f₀(·) denote the marginal density associated with model error term ∊₁₁ and let λ₀(·) denote its corresponding hazard function. Then, f₀(·) and f′₀(·) are bounded functions on ℝ with
$\int_{ℝ} {\frac{f_{0}^{'} (t)}{f_{0} (t)}}^{2} f_{0} (t) d t < \infty .$
Condition A6. The marginal distribution of C_rs is absolutely continuous and has a bounded density g_rs (·) on ℝ (r = 1, . . . , n; s = 1, . . . , K_r).

As indicated in the statements of Theorems 1 and 2, Σ = Γ² is assumed to be a symmetric and positive definite matrix with ||Γ|| < ∞. Conditions A1, A2, A4, A5 and A6 are standard and ensure consistency and asymptotic normality of the unsmoothed Gehan estimator (Tsiatis, 1990; Ying, 1993; Jin et al., 2006b, for example). Condition A3 implies |cov(∊_ik, ∊_il)| ⩽ var(∊₁₁) (i = 1, . . . , n; k, l = 1, . . . , K_i); hence, the covariances between all error terms within a cluster are bounded.

The proof of Theorem 1 relies on the following pair of lemmas, both of which hold under conditions A1–A3. The proof of Lemma 1 is a direct consequence of the strong law of large numbers for U-statistics and results in Andersen & Gill (1982, Theorem II.1). The proof of Lemma 2 relies on this result and properties of the normal cumulative distribution and density functions. These proofs are available in a technical report.

Lemma 1. sup_β_∈𝔹 |L_n(β) − L₀(β)| → 0 almost surely, where L₀(β) is convex for β ∈ 𝔹.

Lemma 2. sup_β_∈𝔹 |L̃_n(β) − L₀(β)| → 0 almost surely, where L₀(·) is defined in Lemma 1.

Proof of Theorem 1. Lemmas 1 and 2, respectively, establish the uniform almost sure convergence of L_n(β) and L̃_n(β) to the convex function L₀(β) for β ∈ 𝔹. By condition A4, L₀(β) is strictly convex at β₀ and β₀ is a unique minimizer. The respective minimizers β̂_n and β̃_n of L_n (β) and L̃_n(β) thus converge almost surely to β₀ (Andersen & Gill, 1982, Corollary II.2).

The next lemma is required in order to prove Theorem 2; an abbreviated proof of this result and also Theorem 2 are provided below, with expanded versions of these arguments available in a technical report. A fact used in proving Lemma 3 is that condition A4, in conjunction with (A1), implies that the probability that X_1k ≠ X_2l for at least one (k, l) pair must be positive.

Lemma 3. Under A1–A6 and as n → ∞, ||∇S̃_n (β₀) − A|| → 0 almost surely, where A = ∇S₀(β₀),

\nabla S_{0} (β_{0}) = \frac{1}{2} \sum_{k = 1}^{K_{1}} \sum_{l = 1}^{K_{2}} \int_{- \infty}^{\infty} E {(X_{1 k} - X_{2 l}) {(X_{1 k} - X_{2 l})}^{'} {\bar{G}}_{1 k} (u) {\bar{G}}_{2 l} (u)} {f_{0}^{2} (u) + f_{0}^{'} (u) {\bar{F}}_{0} (u) ​} d u,

(A1)

Ḡ_rs (·) denotes the survivor function of log C_rs − X′_rsβ₀ and ${\bar{F}}_{0} (s) = \int_{s}^{\infty} f_{0} (u) d u$ for every s.

Proof of Lemma 3. Using calculations analogous to those found in Fygenson & Ritov (1994), it can be shown that E{S_n (β)} = S₀(β) + O(n⁻¹), where the O(·) term holds uniformly on β ∈ 𝔹 and

S_{0} (β) = E [\sum_{k = 1}^{K_{1}} \sum_{l = 1}^{K_{2}} {M_{1 k, 2 l} (β) - M_{2 l, 1 k} (β)}]

(A2)

for

M_{a b, c d} (β) = \int_{- \infty}^{\infty} {\bar{G}}_{a b} (u) {\bar{G}}_{c d} {u + {(β_{0} - β)}^{'} (X_{a b} - X_{c d})} {\bar{F}}_{0} {u + {(β_{0} - β)}^{'} (X_{a b} - X_{c d})} f_{0} (u) d u .

The outer expectation in (A2) is understood to be taken over the joint distribution of the covariates. Evidently, S₀(β₀) = 0. Conditions A1–A6 permit us to differentiate (A2) directly; doing so and evaluating the result at β = β₀, we obtain (A1) (Fygenson & Ritov, 1994, p. 737).

Recalling notation from §2.1, let the survival data for cluster i be denoted 𝒲_i = {W_ik, k = 1, . . . , K_i}. The smoothed equation (8) may then be written as

{\tilde{S}}_{n} (β) = \frac{1}{c_{n}} \frac{1}{n^{2}} \sum_{i = 1}^{n} {\tilde{h}}_{β} (𝒲_{i}, 𝒲_{i}) + \frac{1}{(\begin{matrix} n \\ 2 \end{matrix})} \sum_{i < j} {\tilde{ψ}}_{β} (𝒲_{i}, 𝒲_{j}),

where c_n = 1 − n⁻¹, ψ̃_β(𝒲_i, 𝒲_j) = (1/2){h̃_β(𝒲_i, 𝒲_j) + h̃_β(𝒲_j, 𝒲_i)} and

{\tilde{h}}_{β} (𝒲_{i}, 𝒲_{j}) = \sum_{k = 1}^{K_{i}} \sum_{l = 1}^{K_{j}} Δ_{i k} (X_{i k} - X_{j l}) Φ [n^{1 / 2} {\frac{e_{j l} (β) - e_{i k} (β)}{r_{i k j l}}}] .

Differentiating this representation with respect to β, setting β = β₀, and then using both the strong law of large numbers for independent observations and for U-statistics, we find

\nabla {\tilde{S}}_{n} (β_{0}) \to \frac{1}{2} \sum_{k = 1}^{K_{1}} \sum_{l = 1}^{K_{2}} E {(X_{1 k} - X_{2 l}) {(X_{1 k} - X_{2 l})}^{'} (𝒜_{1 k, 2 l} + 𝒜_{2 l, 1 k})}

(A3)

almost surely as n → ∞. The random variable 𝒜_ab,cd is defined to be zero with probability one if X_ab = X_cd; otherwise, one may write

𝒜_{a b, c d} = \int_{- \infty}^{\infty} {\bar{G}}_{a b} (u) f_{0} (u) ξ_{c d} (u) d u + lim_{n \to \infty} {(\frac{m_{n}}{π})}^{1 / 2} \int_{- \infty}^{\infty} τ (w) e^{- m_{n} w^{2}} d w,

where $m_{n} = n / (2 r_{a b c d}^{2})$ , $r_{a b c d}^{2} > 0$ , $τ (w) = \int_{- \infty}^{\infty} {\bar{G}}_{a b} (u) f_{0} (u) {ξ_{c d} (w + u) - ξ_{c d} (u)} d u$ and ξ_cd (s) = f₀(s)Ḡ_cd (s) + F̄₀(s)g_cd(s). Under conditions A1–A6, τ(·) is integrable, continuous and bounded on ℝ with τ (0) = 0 and the second term on the right-hand side therefore vanishes (e.g. Kanwal, 1998, p. 11). Using the resulting formula for 𝒜_ab,cd and integration by parts, it can now be shown that

𝒜_{1 k, 2 l} + 𝒜_{2 l, 1 k} = \int_{- \infty}^{\infty} {\bar{G}}_{1 k} (u) {\bar{G}}_{2 l} (u) {f_{0}^{2} (u) + f_{0}^{'} (u) {\bar{F}}_{0} (u)} d u .

Substituting this last result into (A3), we observe agreement with (A1), proving the result.

Proof of Theorem 2. Using notation introduced in §2.2, we have that A⁻¹n^1/2S_n (β₀) is asymptotically normal with mean zero and variance A⁻¹ΩA⁻¹ under assumptions A1–A5 (Jin et al., 2006b, Theorem 5). Suppose that

n^{1 / 2} ({\tilde{β}}_{n} - β_{0}) + A^{- 1} n^{1 / 2} S_{n} (β_{0}) \to 0

(A4)

in probability. Then, n^1/2(β̃_n − β₀) → N(0, A⁻¹ΩA⁻¹) in distribution, establishing the desired asymptotic result as well as the equality of the limiting distributions of the smoothed and unsmoothed estimators.

To prove that (A4) holds, we can make use of Theorem 3 of Arcones (1998). Using notation from Arcones (1998), define G_n(β) = nL̃_n(β) for all β in 𝔹. For n ⩾ 1, define the sequence of p × 1 random vectors η_n = n^1/2S_n (β₀) and sequences of nonsingular, symmetric p × p matrices M_n = n^1/2I_p and V_n = (1/2)A. The required result (A4) becomes

M_{n} ({\tilde{β}}_{n} - β_{0}) + \frac{1}{2} V_{n}^{- 1} η_{n} \to 0.

(A5)

in probability. The result (A5) follows directly from Arcones (1998, Theorem 3) provided that conditions A1–A6 are sufficient to ensure that the following regularity conditions hold:

Condition B1. G_n(β) is convex and β̃_n is a sequence satisfying G_n(β̃_n) ⩽ inf_β_∈𝔹 G_n(β) + o_p(1).
Condition B2. η_n = O_p(1), lim inf_n_→∞ inf_|_β_|=1 β′V_n β > 0 and lim sup_n_{→ ∞} sup_|_β_|=1 β′V_n β < ∞.
Condition B3. For each β ∈ ℝ^p, $G_{n} (β_{0} + M_{n}^{- 1} β) - G_{n} (β_{0}) - β^{'} η_{n} - β^{'} V_{n} β = o_{p} (1)$ .

Conditions B1 and B2 are immediate consequences of conditions A1–A6. For example, condition B1 follows because G_n (β) is easily shown to be convex, β̃_n = argmin_β_∈𝔹 L̃_n (β) = argmin_β_∈𝔹 G_n(β), G_n (β) is continuous, and 𝔹 is compact. In addition, conditions A1–A5 are sufficient to ensure that η_n = n^1/2S_n (β₀) converges in distribution, hence O_p(1) as required. Since V_n = (1/2)A is a positive definite matrix for every n, condition B2 is also satisfied.

It remains to establish condition B3. Using the definitions of M_n, G_n(·), L̃_n(·) and S̃_n(·), and a Taylor series expansion, we may write

G_{n} (β_{0} + M_{n}^{- 1} β) - G_{n} (β_{0}) - β^{'} {n^{1 / 2} {\tilde{S}}_{n} (β_{0})} - \frac{1}{2} β^{'} {\nabla {\tilde{S}}_{n} (β_{n}^{*})} β = o_{p} (1),

(A6)

where $‖ β_{n}^{*} - β_{0} ‖ ⩽ ‖ M_{n}^{- 1} β ‖$ . The triangle inequality, Lemma 3, and the fact that {S̃_n} form a sequence of bounded, continuously differentiable functions implies that we can replace $\nabla {\tilde{S}}_{n} (β_{n}^{*})$ in (A6) by the matrix A without altering this result. Therefore, if

n^{1 / 2} ‖ {\tilde{S}}_{n} (β_{0}) - S_{n} (β_{0}) ‖ \to 0

(A7)

in probability, the definitions of V_n and η_n imply that condition B3 holds. To see that (A7) holds, write

{\tilde{S}}_{n} (β_{0}) - S_{n} (β_{0}) = \int_{ℝ^{p}} {S_{n} (β_{0} + n^{- 1 / 2} u) - S_{n} (β_{0})} ψ (u) d u,

where ψ(·) denotes the pdf of ΓZ.

Let Θ be a fixed matrix such that ||Θ|| ⩽ M for some M < ∞ and define for suitable u the function K_n(u; β₀, Θ) = ||S_n(β₀ + n^1/2u) − S_n(β₀) − n^−1/2Θu||. Then, since ∫_ℝ^pu ψ(u) du = 0, the triangle inequality implies

‖ {\tilde{S}}_{n} (β_{0}) - S_{n} (β_{0}) ‖ ⩽ \int_{‖ u ‖ ⩽ ∊_{n}} K_{n} (u; β_{0}, Θ) ψ (u) d u + \int_{‖ u ‖ > ∊_{n}} K_{n} (u; β_{0}, Θ) ψ (u) d u

(A8)

for any ∊_n > 0. The result (A7) therefore holds if we can find ∊_n > 0 such that both integrals on the right-hand side of (A8) converge in probability to zero.

Following Ying (1993, Theorem 2) and Jin et al. (2006b, Theorem 5), the matrix A satisfies

sup_{‖ b - β_{0} ‖ ⩽ d_{n}} \frac{‖ S_{n} (b) - S_{n} (β_{0}) - A (b - β_{0}) ‖}{1 + n^{1 / 2} ‖ b - β_{0} ‖} = o_{p} (n^{- 1 / 2})

(A9)

for any positive sequence d_n → 0. Suppose ∊_n = o(n^1/2). Then, taking b = β₀ + n^−1/2u, d_n = n^−1/2∊_n and Θ = A, (A9) implies

sup_{u ⩽ ∊_{n}} \frac{‖ K_{n} (u; β_{0}, A) ‖}{1 + ‖ u ‖} = o_{p} (n^{- 1 / 2}) .

(A10)

An easy calculation, in combination with (A10), now shows that the first integral on the right-hand side of the inequality in (A8) converges in probability to zero, even if ∊_n → ∞. With regard to the second term on the right-hand side of (A8), we may use the definition of K_n(·; β₀, A) and the triangle inequality to write n^1/2∫_{||u||>∊_n} K_n (u; β₀, A)ψ(u) du ⩽ Q₃ + Q₄, where

Q_{3} = {sup_{‖ u ‖ > ∊_{n}} ‖ S_{n} (β_{0} + n^{- 1 / 2} u) - S_{n} (β_{0}) ‖} n^{1 / 2} \int_{‖ u ‖ > ∊_{n}} ψ (u) d u, Q_{4} = ‖ A ‖ \int_{‖ u ‖ > ∊_{n}} ‖ u ‖ ψ (u) d u .

For all β ∈ 𝔹, ||S_n(β)|| ⩽ Q for some constant Q < ∞ by condition A2; hence, Q₃ ⩽ 2Qn^1/2 p(||ΓZ|| > ∊_n). Letting ∊_n → ∞, it follows that n^1/2 p(||ΓZ|| > ∊_n) → 0 as n → ∞. Similarly, ∫_{||u||>∊_n} ||u||ψ(u)du → 0. Thus, provided that n, ∊_n → ∞, the bounds Q₃ and Q₄, hence, the second integral on the right-hand side of the inequality in (A8), also converge in probability to zero. Since we can select a sequence ∊_n = o(n^1/2) such that both n, ∊_n → ∞, it follows that (A8) converges in probability to zero as n → ∞, establishing (A7) and concluding the proof.

Contributor Information

Lynn M. Johnson, Department of Statistical Science, Cornell University, Ithaca, New York 14853, U.S.A. Email: lms86@cornell.edu

Robert L. Strawderman, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, U.S.A. Email: rls54@cornell.edu

References

Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Statist. 1982;10:1100–20. [Google Scholar]
Arcones MA. Asymptotic theory for M-estimators over a convex kernel. Economet. Theory. 1998;14:387–422. [Google Scholar]
Brown BM, Wang Y-G. Standard errors and covariance matrices for smoothed rank estimators. Biometrika. 2005;92:149–58. [Google Scholar]
Brown BM, Wang Y-G. Induced smoothing for rank regression with censored survival times. Statist Med. 2006;26:828–36. doi: 10.1002/sim.2576. [DOI] [PubMed] [Google Scholar]
Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]
Fygenson M, Ritov Y. Monotone estimating equations for censored data. Ann Statist. 1994;22:732–46. [Google Scholar]
Heller G. Smoothed rank regression with censored data. J Am Statist Assoc. 2007;102:552–59. [Google Scholar]
Huang Y. Calibration regression of censored lifetime medical cost. J Am Statist Assoc. 2002;97:318–27. [Google Scholar]
Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–53. [Google Scholar]
Jin Z, Lin DY, Ying Z. On least squares regression with censored data. Biometrika. 2006a;93:147–62. [Google Scholar]
Jin Z, Lin DY, Ying Z. Rank regression analysis of multivariate failure time data based on marginal linear models. Scand J Statist. 2006b;33:1–23. [Google Scholar]
Johnson ME. Multivariate Statistical Simulation. New York: Wiley; 1987. Wiley Series in Probability and Mathematical Statistics. [Google Scholar]
Jones MP. A class of semiparametric regressions for the accelerated failure time model. Biometrika. 1997;84:73–84. [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd Ed. Hoboken, New Jersey: Wiley-Interscience; 2002. [Google Scholar]
Kanwal RP. Generalized Functions: Theory and Technique. Boston: Birkhäuser; 1998. [Google Scholar]
Lee EW, Wei LJ, Ying Z. Linear regression analysis for highly stratified failure time data. J Am Statist Assoc. 1993;88:557–65. [Google Scholar]
Lin JS, Wei LJ. Linear regression analysis for multivariate failure time observations. J Am Statist Assoc. 1992;87:1091–97. [Google Scholar]
Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data Statistics for Biology and Health. New York: Springer; 2006. [Google Scholar]
Pan W. Using frailties in the accelerated failure time model. Lifetime Data Anal. 2001;7:55–64. doi: 10.1023/a:1009625210191. [DOI] [PubMed] [Google Scholar]
R Development Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [Google Scholar]
Song X, Ma S, Huang J, Zhou X. A semiparametric approach for the nonparametric transformation survival model with multiple covariates. Biostatistics. 2007;6:197–211. doi: 10.1093/biostatistics/kxl001. [DOI] [PubMed] [Google Scholar]
Strawderman RL. The accelerated gap times model. Biometrika. 2005;92:647–66. [Google Scholar]
Strawderman RL. A regression model for dependent gap times. Int J Biostatistics. 2006;2 Iss. 1, Article 1. [Google Scholar]
Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Statist. 1990;18:354–72. [Google Scholar]
Ying Z. A large sample study of rank estimation for censored regression data. Ann Statist. 1993;21:76–99. [Google Scholar]
Zhang J, Peng Y. An alternative estimation method for the accelerated failure time frailty model. Comp Statist Data Anal. 2007;51:4413–23. [Google Scholar]

[b1-asp025] Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Statist. 1982;10:1100–20. [Google Scholar]

[b2-asp025] Arcones MA. Asymptotic theory for M-estimators over a convex kernel. Economet. Theory. 1998;14:387–422. [Google Scholar]

[b3-asp025] Brown BM, Wang Y-G. Standard errors and covariance matrices for smoothed rank estimators. Biometrika. 2005;92:149–58. [Google Scholar]

[b4-asp025] Brown BM, Wang Y-G. Induced smoothing for rank regression with censored survival times. Statist Med. 2006;26:828–36. doi: 10.1002/sim.2576. [DOI] [PubMed] [Google Scholar]

[b5-asp025] Cox DR. Regression models and life-tables (with discussion) J. R. Statist. Soc. B. 1972;34:187–220. [Google Scholar]

[b6-asp025] Fygenson M, Ritov Y. Monotone estimating equations for censored data. Ann Statist. 1994;22:732–46. [Google Scholar]

[b7-asp025] Heller G. Smoothed rank regression with censored data. J Am Statist Assoc. 2007;102:552–59. [Google Scholar]

[b8-asp025] Huang Y. Calibration regression of censored lifetime medical cost. J Am Statist Assoc. 2002;97:318–27. [Google Scholar]

[b9-asp025] Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003;90:341–53. [Google Scholar]

[b10-asp025] Jin Z, Lin DY, Ying Z. On least squares regression with censored data. Biometrika. 2006a;93:147–62. [Google Scholar]

[b11-asp025] Jin Z, Lin DY, Ying Z. Rank regression analysis of multivariate failure time data based on marginal linear models. Scand J Statist. 2006b;33:1–23. [Google Scholar]

[b12-asp025] Johnson ME. Multivariate Statistical Simulation. New York: Wiley; 1987. Wiley Series in Probability and Mathematical Statistics. [Google Scholar]

[b13-asp025] Jones MP. A class of semiparametric regressions for the accelerated failure time model. Biometrika. 1997;84:73–84. [Google Scholar]

[b14-asp025] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd Ed. Hoboken, New Jersey: Wiley-Interscience; 2002. [Google Scholar]

[b15-asp025] Kanwal RP. Generalized Functions: Theory and Technique. Boston: Birkhäuser; 1998. [Google Scholar]

[b16-asp025] Lee EW, Wei LJ, Ying Z. Linear regression analysis for highly stratified failure time data. J Am Statist Assoc. 1993;88:557–65. [Google Scholar]

[b17-asp025] Lin JS, Wei LJ. Linear regression analysis for multivariate failure time observations. J Am Statist Assoc. 1992;87:1091–97. [Google Scholar]

[b18-asp025] Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data Statistics for Biology and Health. New York: Springer; 2006. [Google Scholar]

[b19-asp025] Pan W. Using frailties in the accelerated failure time model. Lifetime Data Anal. 2001;7:55–64. doi: 10.1023/a:1009625210191. [DOI] [PubMed] [Google Scholar]

[b20-asp025] R Development Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. [Google Scholar]

[b21-asp025] Song X, Ma S, Huang J, Zhou X. A semiparametric approach for the nonparametric transformation survival model with multiple covariates. Biostatistics. 2007;6:197–211. doi: 10.1093/biostatistics/kxl001. [DOI] [PubMed] [Google Scholar]

[b22-asp025] Strawderman RL. The accelerated gap times model. Biometrika. 2005;92:647–66. [Google Scholar]

[b23-asp025] Strawderman RL. A regression model for dependent gap times. Int J Biostatistics. 2006;2 Iss. 1, Article 1. [Google Scholar]

[b24-asp025] Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Statist. 1990;18:354–72. [Google Scholar]

[b25-asp025] Ying Z. A large sample study of rank estimation for censored regression data. Ann Statist. 1993;21:76–99. [Google Scholar]

[b26-asp025] Zhang J, Peng Y. An alternative estimation method for the accelerated failure time frailty model. Comp Statist Data Anal. 2007;51:4413–23. [Google Scholar]

PERMALINK

Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data

Lynn M Johnson

Robert L Strawderman

Abstract

1. Introduction

2. Methodology and key results

2.1. Notation and assumptions

2.2. Estimation for clustered data using the Gehan weight

2.3. Induced smoothing for clustered data

3. Methods of variance estimation

3.1. The sandwich variance estimator

3.2. The Brown & Wang (2006) procedure for clustered data

3.3. The resampling variance estimator

4. Simulation study

Table 1.

5. Concluding remarks and further extensions

Acknowledgments

Appendix

Proofs

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data

Lynn M Johnson

Robert L Strawderman

Abstract

1. Introduction

2. Methodology and key results

2.1. Notation and assumptions

2.2. Estimation for clustered data using the Gehan weight

2.3. Induced smoothing for clustered data

3. Methods of variance estimation

3.1. The sandwich variance estimator

3.2. The Brown & Wang (2006) procedure for clustered data

3.3. The resampling variance estimator

4. Simulation study

Table 1.

5. Concluding remarks and further extensions

Acknowledgments

Appendix

Proofs

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases